User Guide

Overview

Server (on Master) starts client on every slave via SSH. The client becomes a deamon and opens a TCP connection to the server and fetches: Configuration, Worker and Tickets. It creates the commandline and the environment (input files) for the worker and starts it. After it finished it sends the result to the server and fetched a new Task. The communication is done via HTTP. The server is therefore a webserver that can also be accessed with a web browser to get on-line information about the process.

Quick Start

This guide is very brief. You should probably check the Detailed description below, if you run into any trouble.

Configuration

The first time you start simparex it will create a configuration directory at /etc/simparex or ~/.simparex.

Go there and edit cluster.conf and server.conf. Usually you only have to change the default port in server.conf. The comments in the files will guide you. Further detail can be found below.

Worker/Task Setup

You should pick a directory where you want to start simparex. Lets assume it is: ~/runs. Copy the sample worker directory from the tar-ball or from the installation directory to ~/runs/. If you haven't setup differently the following will do:

$> cp -r /usr/local/lib/simparex-0.x/SampleWorkerDir ~/runs/

Edit worker.conf and read the comment in that file. See below for more information.

Starting

The testmode will try to check for some possible faults in the configuration.

~/runs $> simparex -t SampleWorkerDir

You see whether your slaves are accessable and the task file is correct. This test procedure is not complete and can not ensure that everything works.

Start the server with the following command and watch its output.

~/runs $> simparex -v 3 SampleWorkerDir

Results

The results of all tasks will be in SampleWorkerDir/Results/Factors1 and SampleWorkerDir/Results/Factors2

Configuration

Simparex needs a server.conf and a cluster.conf. The seach path for these files is as follows:

specified on command line
per user in ~/.simparex/
system-wide in /etc/simparex/

`server.conf`

Usually you only have to change the default port in server.conf. If your hostname cannot be detected you need to specify this as well.

`cluster.conf`

Please adapt the SlaveBaseDir, which is the directory on slave where the client and the worker is copied to. (It is created of not there.) There sub directories for each task are created. The Slaves variable is used to specify all machines you want to work on. The hosts can be given by either hostname or ip address. A host can be listed more then once, then the client is started that often (reasonable for multiprocessor machines). Example:

Slaves=<<ENDOFLIST Alfons Hans Hans Fred ENDOFLIST

To each of the machine you should be able to perform a ssh login with publickey/privatekey authentication. Please check the ssh documentation for a detailed description. In short version you have to generate a key pair with ssh-keygen -t rsa on you master machine. You can press enter on all questions (i.e. don't give a passphrase for your key). Then copy the file public key to each the slave machine and add it to the authorized keys list. All together

ssh-keygen -t rsa # press enter on all questions scp ~/.ssh/id_rsa.pub SLAVE:.ssh/master.pub ssh SLAVE cat .ssh/master.pub >> authorized_keys logout # try whether it works ssh SLAVE # now you should not be asked for a passwd.

Multiply cluster configurations

If you need different cluster configurations you can create a copy of cluster.conf with a different name, say allmachines.conf. You can then start

~/runs $> simparex -c allmachines.conf SampleWorkerDir

the searchpath for this file is as specified above. (current directory, user cfg, system cfg)

Worker/Task Setup

We are now going to setup a working directory. This directory holds all information needed for a run, e.g. the worker configuration, the task description(s) and later the results.

You should pick a directory where to place the working directory(ies) and where you will start simparex from. Lets assume it is: ~/runs. Copy the sample worker directory from the tar-ball or from the installation directory to ~/runs/. If you haven't setup differently the following will do:

$> cp -r /usr/local/lib/simparex-0.x/SampleWorkerDir ~/runs/

In this directory there is a little program(worker) which calculates prime factors. We will use this example to illustrate the configuration. Rename the directory to anything that hits your purpose. For now we will name it Primefactors:

$> cd ~/runs $> mv SampleWorkerDir Primefactors

`worker.conf`

This file must reside in the working directory. It specifies the worker program and some options for the run. Lets look into more detail:

[Tasks] # name of the task file TaskFile=tasks.csv [Platform Linux] # name of the executeable Worker=primefactors # this will be passed to the worker command as cmd line argument # Use: $p{NAME} to refer to a input parameter NAME (see tasks.csv) Args="$p{Number1} < input.file > result.file" ...

first you can specify the file where to find the task descriptions. Just leave it as it is for now. For every Plattform (currently just Linux) you can specify the name of the worker and the commandline (Args). Here we use the program primefactors. To illustrate the input and output processing of simparex this program reads one number from the commandline and another one from the standard input. The factors of the first number are printed to stdout and the factors of the second number are written to the file factors.txt Let's see how to setup this. The Args variable is particularly important to understand. With which you construct the rest of the command line for the worker. You can use variables to refere to values inside the task file (see tasks.csv below).

Additionally you have to specify whether some input files have to be generated and which output files contain results you like to collect. In case of large result files you should use the LargeResult specification instead.

... # Input files. Every key names a file which is generated. # The value is the contents of the file. You can refere to variables as above. [Inputs] input.file ="$p{Number2}" # Result specification. Key names the file (created in WorkingDirectory/Results) # which will contain the contents of the given file (from each slave). # These contents are processed using the Collect function below. # You can use variables as well. [Results] Factors1="result.file" Factors2="factors.txt" # Result specification for large results (greater than say 10kB). # These are streamed directly and not validated # The key names the file or filestem (created in WorkingDirectory/Results) # see also Collect->LargeFunction which might split the results into several files. # You can use variables as well. #[LargeResults] #File="large.file" ...

`tasks.csv`

This file contains the values for the tasks. The format is a simple ASCII file where fields are seperated with | (bar). Unfortunately there are no comments allowed in this file. The first line contains the header, which names the columns. These names can be used to refer to the value in worker.conf. Every forthcoming line specifies one task. There is no limit for the number of tasks. The following example defines 6 tasks. Each of them must assign a value to all fields (columns)

Number1|Number2 3456789012|4567890123 9876543210|123456789 32767|131071 24242424|9340987259 23232323232323|32323232323232 99999999999999|88888888888888

Starting

Test Mode

The testmode tries to check for some possible faults in the configuration.

~/runs $> simparex -t Primefactors

You see which of your slaves are accessable and whether the configuration files are correct. This test procedure is not complete and can not ensure that everything works of course. Please take a look at the commandline that is printed for the first task.

Options and Observing

Usually you start the server with

~/runs $> simparex SampleWorkerDir

Please type simarex -h for possible options. The server tells you which configuration files it uses and on which address you can access it with a web browser. For example

lynx http://Fred:8080

could look like:

--------------------------------------------------- ------------ Statistic ------------------------ ------------ SLAVES ------------------------ Status: Started clients: 3/3, Dead: 0 Dead Slaves: Crashed Slaves: Successful Slaves: (sorted by number of tickets): Hans: 3 Fred: 3 Alphons: 1 ------------ TIMING ------------------------ Runtime: 7s = 0 Minutes and 7 Seconds (Start: 29.00.2006 14:00:23) Minimal Time per Task: 0s = 0 Minutes and 0 Seconds Average Time per Task: 2.5s = 0 Minutes and 2 Seconds Maximal Time per Task: 5s = 0 Minutes and 5 Seconds ------------ REPORT ------------------------ Outstanding: 2 Finished: 4 Failed: 0

There you have different levels of detail to observe the progress. Often used options are -p PORT -c FILE where PORT is the port to listen on and the FILE specifies a different cluster file.

Results

The results are collected using the Collect function from worker.conf. In this example the results of all tasks successful tasks will go to Primefactors/Results/Factors1 and Primefactors/Results/Factors2. Old results are backuped.