Was es kann

Goal

Provide a very easy way to run one program with different settings on a bunch of computers in parallel and collect the results. Simple configuration and wide applicability is the aim. Fault tolerance in respect to network and adminstration errors. Why not use something like MPI? Well, not all programs are witten in C ;-). It needs a lot of effort to adapt it to a new program and one needs to setup MPI on the slave computers.

Terminology

To make this side easier to understand let's clarify some terms.

Master	is the computer where the server runs
Slave	is one of many computers that do the work
Server	program that coordinates the process
Client	program that runs on slaves
Worker	program that does the computation (nearly any)
Task	a set of parameters/ settings for the Worker
Result	a file with the results of the computation
SessionID	unique number for client-server communication (should be unique over multiple runs)
Ticket	computation identification. Unique within one run.

Features

one master computer with server program (acts as HTTP server). SSH and SCP is needed to get the client program to the slaves and start it.
list of slave computers (host names or IPs).
command specification: commandline pattern with place holders for variables and input file generation
result specification: standart output and/or files (NEW: large file support)
validation of the results on the master
- non_empty test
- one_line test
- custom code possible
list of tasks characterised through parameters.
timeouts and multiply task assigments if necessary
simple online (via Web) and offline statistics: which slave did what and which parameter sets failed.
test mode
NFS aware
platform dependent workers possible ( scheduled for 0.7 )
collecting rules (for normal results):
- plain concat
- concat, but with task ID
- blockwise with parameters and task ID
- custom code possible
collecting rules (for large results):
- plain concat
- size dependent split
- every result in own file