Next

While datamixer is fully functional (apart from recursion, see below), some enhancements would make it more full-featured:

  • Iterators as parameters. Functions and other elements often take a constant value as argument. In many cases, these could be replaced with the next value from an iterator. For example, the foreach function takes an integer as its end value. Rather than use a constant for this value, it could use the next value from a random iterator, to generate a different number of values with each call.


  • Recursion: there is not yet a design for recursion. Currently an element with nested elements needs to be built in-memory, and this dataset can get large. For example, in the university example, a semester includes many classes, each of which includes many students. The entire semester needs to be built in memory, before being output. If recursion were supported, a streaming model could be used, where even nested elements were output as they were generated, with no need to manage them in memory. Note that lack of recursion is a problem with XML output but not CSV, since CSV does not supported nested elements.


  • Configuration would benefit from an expression language such as Jakarta Velocity. As well as providing a standard expression syntax, it would add support for constant values declared in one place instead of throughout the code, arithmetic, boolean, etc. expressions, and method calls on Java objects.


  • Presentation: currently I/O is supported to the console and to files. A useful addition would be to support the JDBC API, so that values could be read from or written directly to a relational database. It would also be useful to read/write values directly from Java objects via reflection.


  • Design: It takes time to write and maintain a datamixer program in Java. The same is true of XML configuration. A GUI might help. One possibility is to generate datamixer code from a UML class diagram. Classes and data members in the diagram coule be associated with datamixer classes to generate their data. Multiplicities in the diagram could be defined as the range of a synthetic collection, or a function.


  • Languages: Datamixer could easily be ported to other languages such as Perl or C++. As well, scripting languages such as rhino or jython can run directly against the datamixer jarfile.


  • Performance: The current implementation is interpreted and so relatively slow. While small datasets (megabytes) can be generated in a reasonable time, it would take forever to generate gigabytes of data, to load into a database for performance tests. Code generation would result in a faster implementation.