ZigZag Tutorial

Skip to end of metadata
Go to start of metadata

A simple flow example

To introduce some of the ZigZag concepts we will use the following flow

Example flow that concatenate strings and converts the result into upper case

This flow is available on any Meandre engine as part of the /public/services/demo_repository.rdf. If we run the flow from the Meandre Workbench with no modifications, we obtain the following output

Flow output produced by the example flow

This flow contains six instances of five different components

Instance name Component name Language Description
Push String 0 Push String Java Push hello world
Push String 1 Push String Java Push hello world
Concatenate String 0 Concatenate String Java Concatenates two strings
To Uppercase 0 To Uppercase Python Converts and string to upper case
Pass Through 0 Pass Through List Just passes the input to the output
Print Object 0 Print Object Java Prints an object to the console

This flow is expressed by the following RDF descriptor

Flow RDF descriptor expressed using Turtle notation

RDF, despite all its benefits, it is not human friendly. Writing an RDF descriptor would be tedious and error prone. Thus, Meandre provides the ability to express flows in a human-friendly readable using the ZigZag scripting language. This scripting language is intended to manipulate flows without using the Workbench and to create self contained flows that can be run on standalone mode outside the Meandre server.

Our first look at a ZigZag flow

Meandre provides a simple tool to generate ZigZag scripts out of RDF descriptors. We can get the ZigZag script describing the flow by using the rdf2zz facility. For instance, we can generate the ZigZag script running the following command

where http://demo.seasr.org:1714/public/services/demo_repository.ttl is the location containing the flow RDF description, and meandre://test.org/flow/test-hello-world-with-python-and-lisp/ the flow URI we want to convert to a ZigZag script. The script contained on upper.zz is listed below.

This script can be compiled using the following command

This produce a .mau file. Meandre archive units (MAU) package all the resources needed to run the flow in a standalone mode. The produce mau file can be run as follows

ZigZag scripts can also be loaded to the ZigZag console for interactive manipulation. For instance, imagine we can to replace the text of one of the messages being concatenated. We can use the ZigZag console to load the ZigZag script, see what instances form the flow, add an entry that changes the message, save the new ZigZag script, and run the new flow to check that everything works as expected, save a new mau file for the the modified script, and all without having to leave the console.

Anatomy of a ZigZag script

Meandre's success relies on its ability to easily assemble data-intensive flows. Flows may be easily assembled by using the Meandre Workbench (MW), a icon based programming environment for assembling data-intensive flows. ZigZag targets rapid application development by speeding up the flow construction cycle.

ZigZag can accelerate data-intensive flow development cycle. It allows you to easily a describe data-intensive flow using the ZigZag language, which can then be compiled into a self-contained flow task for later execution.

ZigZag is a simple language for describing data-intensive flows; it is modeled after Python's simplicity. ZigZag is declarative language for expressing the directed graphs (DG) that describe flows. A compiler is provided to transform a ZigZag program (.zz) into a Meandre self-contained task—or Meandre archive unit (.mau). Mau(s) can then be executed by a Meandre engine. Command-line tools allow ZigZag files to compile and execute.

The language provides four basic constructs:

  1. Component discovering and aliasing (CDA): Retrieve components from a repository location and create an alias for them.
  2. Component instantiation (CI): Instantiate a component that will be part of the data-intensive flow.
  3. Instance modification (CM): Change the behavior of a instance based on its properties.
  4. Instance invocation (II): Describe the data-intensive component relations with other components in the same flow.

Component Discovery and Aliasing: The example below goes and fetch all the components on the demo repository at demo.seasr.org and makes available five of the components to build the flow.

Component instantiation: A component instance is created by adding () to the name of the alias name of the component (instantiations supports multiple assignments in a single line)

Instance modification: Instance properties can be simply modified by adding . and the name of the property that we want to modify. The modification takes the form of a single assignment and it needs to be wrap in quotes, as shown below.

Instance invocation: Describes the flow. The outputs of a component can be generically assigned a logic name, that then is use to pass to the input ports of another component. This is achieve by using a function call analogy where the name of both, the output and input ports are linked.

Automatic Parallelization

Before digging into the details, imagine the following situation. You have a flow that does a good job, but at a certain point when you keep pushing more and more data through it you realize that you could use multiple instances of the same component in parallel to boost the flow performance. This will also help max out all those cores you have sitting idle. Wouldn't it be great if you could just say, for this component instance give me 4 copies that process data in parallel? Wouldn't it be also great if you didn't need to worry about connecting anything?

Let's assume that in the previous flow example, the conversion to upper case take a really long time governing the overall execution. That would be a perfect example to illustrate the parallelization capabilities that ZigZag provides

Unordered parallelization

Imagine now that you want a parallelized version of the component instance in the middle (the one that does most of the job). We can modify our ZigZag' code to force the parallelization of that component instance. This modification will look like

The [+AUTO] tells the ZigZag' compiler to parallelize the to_upper" instance based of the underlying architecture. You can also specify the number of parallel instance you want, for instance [+4] will create 4 parallel instance. The resulting flow generated by the compiler looks as follows:

Notice that ZigZag has created 4 parallel instances of the component. It has also introduced a mapper instance that is in charge of distributing the incoming data to each of the parallel instance. Each of the parallel instances then push the data straight to the print instance. That's it. The ZigZag compiler has parallelized and connected a new flow, with almost no effort for you. This is called unordered parallelization, since data may be arriving to the print flow out of the original order in which they were generated by the push component instance.

Ordered parallelization

Sometimes applications need to maintain the order of the data being pushed through the flow. ZigZag can also parallelize instances that preserve the order of the data going through the flow (at the cost of a little overhead). The same example presented above can be turned into an order-preserving one as follows

The ! after AUTO (of the number of parallel instances you want) tells the compiler to generate a parallelized flow that maintains the data order. The picture below shows the resulting flow that guaranties the order.

ZigZag introduces a reducer after the parallel instance to guarantee the order.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.