Technical Requirements

This section lists high-level technical requirements.

1. Mock data generation.

1.1. Datatypes: Support for primitive datatypes such as integer, double, date, and string.

1.2. Collections: support for populating collections with real or mock values. There should be a simple way to configure the range of values in a mock collection. It should be easy to access the values from a collection many times.
1.3. Iterators: values can be accessed from a collection with a simple and well-known interface (e.g. java.util.Iterator). It should be possible to access the values from a collection repeatedly, without creating a new iterator each time.

1.3. Containers: A variety of containers should be available for grouping collections. The container should define the group's behavior. For example, a simple container simply generates a value from each collection in turn. A sequence container generates all values from the first collection before moving on to the second collection, and so on. A parallel container generates values from each collection concurrently.

1.4. Functions: support for applying various operators to the values generated from one or more iterators. For example, an addition operator might add the values generated by three iterators. A function should itself look like an iterator, by returning the result of its operator.

2. Data presentation.

2.1. Source/Sink: Support for a variety of data sources and sinks, for example: standard input/output/error, files, JDBC, and Java objects through reflection.

2.2. Streaming: Support for input and output streams. For example, a stream of stock quotes is used as input. As another example, a stream of mock CPU load data is generated as output.

2.3. Structure: Support for a variety of data structures, for example tabular and tree structures.

2.4. Format: Support for a variety of formats, for example tab-delimited, CSV, and XML.

2.5. Interaction with mock data: It should be easy to associate data collections with I/O elements. It should be easy to read (real) data values from a data source and mix them with mock data values.

3. Configuration. It should be easy to configure datamixer elements for access in a given language. For example, elements accessed in Java are configured in XML. A configuration script allows element attributes to be defined, and elements to be placed in relationship with one another. For example, a script could allow an iterator to be created from a particular collection. The iterator could be named, and accessed at runtime by a Java program.

3.1. Variables. It should be possible to assign a name to a value defined in a configuration, so that the value can be accessed later by name at runtime.

3.2. Namespaces. It should be possible to define lexically scoped regions in a configuration script, so that names defined in one region can be the same as names in another region.

4. Deployment

4.1. Language support. It should be possible to write a datamixer application in a variety of languages, for example Java, Perl, or Javascript.

4.2. Context. It should be easy to run a datamixer application standalone (for example as a Perl or Java application) or embedded in another application (for example, a JSP page).

5. Performance. It should be possible to generate large, complex datasets in a reasonably short time.