Language German (Deutsch) for this side no available. Fall back to English
Was es kann

Was es kann

Goal

Provide a very easy way to run one program with different settings on a bunch of computers in parallel and collect the results. Simple configuration and wide applicability is the aim. Fault tolerance in respect to network and adminstration errors. Why not use something like MPI? Well, not all programs are witten in C ;-). It needs a lot of effort to adapt it to a new program and one needs to setup MPI on the slave computers.

Terminology

To make this side easier to understand let's clarify some terms.

Master is the computer where the server runs
Slave is one of many computers that do the work
Server program that coordinates the process
Client program that runs on slaves
Worker program that does the computation (nearly any)
Task a set of parameters/ settings for the Worker
Result a file with the results of the computation
SessionID unique number for client-server communication (should be unique over multiple runs)
Ticket computation identification. Unique within one run.

Features

  • one master computer with server program (acts as HTTP server). SSH and SCP is needed to get the client program to the slaves and start it.
  • list of slave computers (host names or IPs).
  • command specification: commandline pattern with place holders for variables and input file generation
  • result specification: standart output and/or files (NEW: large file support)
  • validation of the results on the master
    • non_empty test
    • one_line test
    • custom code possible
  • list of tasks characterised through parameters.
  • timeouts and multiply task assigments if necessary
  • simple online (via Web) and offline statistics: which slave did what and which parameter sets failed.
  • test mode
  • NFS aware
  • platform dependent workers possible ( scheduled for 0.7 )
  • collecting rules (for normal results):
    • plain concat
    • concat, but with task ID
    • blockwise with parameters and task ID
    • custom code possible
  • collecting rules (for large results):
    • plain concat
    • size dependent split
    • every result in own file