Next
While datamixer is fully functional (apart from recursion, see below), some
enhancements would make it more full-featured:
-
Iterators as parameters. Functions and other elements often take a
constant value as argument. In many cases, these could be replaced with
the next value from an iterator. For example, the
foreach
function takes an integer as its end value. Rather than use a
constant for this value, it could use the next value from a random
iterator, to generate a different number of values with each call.
-
Recursion: there is not yet a design for recursion. Currently an
element with nested elements needs to be built in-memory, and this dataset
can get large. For example, in the university example, a semester includes
many classes, each of which includes many students. The entire semester
needs to be built in memory, before being output. If recursion were
supported, a streaming model could be used, where even nested elements
were output as they were generated, with no need to manage them in memory.
Note that lack of recursion is a problem with XML output but not CSV,
since CSV does not supported nested elements.
-
Configuration would benefit from an expression language such as
Jakarta Velocity. As
well as providing a standard expression syntax, it would add support for
constant values declared in one place instead of throughout the code,
arithmetic, boolean, etc. expressions, and method calls on Java objects.
-
Presentation: currently I/O is supported to the console and to
files. A useful addition would be to support the JDBC API, so that values
could be read from or written directly to a relational database. It would
also be useful to read/write values directly from Java objects via
reflection.
-
Design: It takes time to write and maintain a datamixer program in
Java. The same is true of XML configuration. A GUI might help. One
possibility is to generate datamixer code from a UML class
diagram. Classes and data members in the diagram coule be associated with
datamixer classes to generate their data. Multiplicities in the diagram
could be defined as the range of a synthetic collection, or a function.
-
Languages: Datamixer could easily be ported to other languages such
as Perl or C++. As well, scripting languages such as rhino or jython can
run directly against the datamixer jarfile.
-
Performance: The current implementation is interpreted and so
relatively slow. While small datasets (megabytes) can be generated in a
reasonable time, it would take forever to generate gigabytes of data, to
load into a database for performance tests. Code generation would result
in a faster implementation.