|
program:
autm is a shell script that submits input files for the doppler-imaging
code tempmap
(by john rice, as used by the stellar activity group in potsdam)
to other linux computers at the aip.
autm is very convenient if you have a large number of tempmap
jobs and/or want to minimize the calculation time.
autm estimates the time it would currently take to run your
tempmap job on the various hosts, selects a host accordingly and records
the actual computation speed for later estimations.
it accounts for the number of cpu's, the current load average, the speed
record and the speed average (from previous calculations) of a particular
host. it checks the hosts as required (and not each time you submit a job)
and, in doing so, distinguishes between day time, night time and weekend
use.
autm is nice to other aip computer users by
submitting jobs at nice level 19 and only if the hosts load average is
below a chosable threshold.
autm has currently 90 computers in the host list. if
they are idle, they can all be used simultaneously by typing just one
command. furthermore,
autm is adopted for invokation by other programs:
it can easily be started in the background and buffer thousands of
tempmap jobs for consecutive execution, spread over all available hosts.
autm has a nice little supplement named tempargo which
allows to vary a specific parameter of a given tempmap input file,
calculate all appropriate derivatives simultaneously using autm
and list/plot the resulting normalized chi-squared values and Doppler
maps. tempargo has again a little supplement named
xtempargo for convenient viewing of tempargo results. see
the bottom of this page.
|
|
|
requirements:
zsh !
- the script is run by the zsh shell. it shouldn't be too much
effort to change it to bash, but the advantage is that zsh offers
floating point operations. bash and other shells need a
somewhat inconvenient workaround using the calculator language
bc. however, it's easy to install zsh, above all if you use debian
linux: just type apt-get install zsh as root and you're done.
(and of course, zsh does not need to be installed on the various hosts
you use but just on your computer.)
the hosts you want to use need access to the directory on your computer
were your tempmap files are located. thus, the hosts have to be added to
your /etc/exports [/home soave(rw) marsala(rw) reblaus(rw) etc.]
and /etc/hosts.allow file [ALL: 141.33.56.2 etc.].
attention: if your computer is a member of the aip netgroup, you
only add /home @linux(rw) to your /etc/exports file.
in this case, set the netgroup parameter to 1.
the required tempmap version should sit in your remote ~/bin file.
|
|
|
quick start:
just type autm TempMapFile(s) and you will probably
be doing fine. the detailed description following now is intended to
give you insights in what autm is actually doing and how to
influence it, but for now you can also skip this or read it later and
continue with the download section.
the most important parameter for now is netgroup.
If your computer is a member of the aip netgroup (i.e. you can access
your /home directory from other hosts), set netgroup in
the parameter file autm.param (which is created automatically
upon first invokation) to 1. Otherwise, you access your homedir through
/net/yourcomputer/home with netgroup=0.
make sure to set the abundance and atmosphere files in the parameter file.
|
|
|
description:
the file autm.param defines important parameters and the file
autm.hostlist contains a list of hosts that can be used. just add
the hostname, the number of cpu's and a speed value (one host entry per line;
e.g. "soave 1 0.9"). the fixed speed value is your
guess, how long a typical
tempmap job will take, divided by the reference_speed defined in the
parameter file. Typically, you would set the reference speed value to the
fastest tempmap computation you currently achieve (in seconds, e.g. "500").
Assign a speed value of "1" to your fastest host and lower values for slower
machines. when autm is run, it will add additional values: the
fastest computation so far (best speed), an average comp. speed value, then
(after the bracket ">") the load average (which is the max. value of the
1min [and 5min [and 15min]] load averages, depending on the maxloadavg-mode
parameter), the host status ("bad" means, the host could not be reached or
the host could not access your home directory), a flag saying wheather the
host can be used for tempmapping or not ("n" if the load average exceeds the
loadavg_threshold parameter or if status is "bad") and, finally, the
decision-maker which is determined by averaging the fixed, best and
average speed values and multiplying this with the load average (corrected
for the number of cpu's). the host with the highest decision-maker value will
then be chosen for the first tempmap job, the host with the second-highest
decision-maker value for the second job and so forth.
the fixed speed value (which is not changed by the program) can actually
be used as a weighting factor for the decision-maker. if you prefer a
specific host to be chosen more/less frequently, then increase/decrease
the fixed speed value.
the parameter file autm.param is created automatically with default
values, if it does not exist. the parameter file defines which version of
tempmap to use, the name of the abundance and atmosphere files (which have to
be in the same directory as the tempmap input file), the reference speed, the
load-average threshold, the check-again time (which determines how often the
hosts are checked, instead of using the recorded values from the previous
check), it contains parameters for day time, night time and weekend
distinction, defines the mode of determining the maximum load-average
level and some more.
for heavy tempmapping, you might want to add all available linux hosts to the host
list. unfortunately, checking a large number of hosts requires a
displeasingly high
amount of time (about one minute per 50 hosts). there are several ways to work
around this: the check frequency can be automatically reduced for night time and
weekend use and increased for peak times. if you are about to
submit just a small number of jobs, you can restrict the number of hosts
checked with the --hosts option. therefore, it's a good idea to keep
the fastest machines on the top of your host list. alternatively, you can use
a quick and dirty synchronous checking mode, which sometimes produces
errors (depending on your computer) but usually works fine with a
synch_delay value of 0.2-0.4 (seconds; using sleep). if your
sleep command does not understand floating point numbers, then use a
synch_delay value larger than 1. a short delay is then looped
synch_delay^2 times. (try 15 for a start.)
in the slow regular mode, bad computers are checked only one in six times
(normally, bad computers stay bad anyway). in synchronous mode,
the bad hosts are checked every time.
if all checked or available (below the specified threshold) computers are
already in use, autm exits. if a specific tempmap inputfile
is currently being tempmapped, it will not be submitted again.
(so feel free to enter autm *.in repeatedly
:-)
with the --watch option (preferably started in a different
xterm), you can monitor the currently running tempmap jobs.
autm is always started from the directory where your current tempmap
inputfiles are located. upon submission to a host, autm
prompts four numbers and a start message. the numbers are:
[ current time / decision-maker value / expected run time / expected end time ].
when finished, another message is prompted including the effective run
time, so it's better not to use this xterm otherwise, except for
submitting further tempmap jobs. (set quiet in the parameter
file to 1 if you want to avoid the end message.)
nice level for the computations is 19, the lowest. it's strongly
recommended to leave it at that value!
|
|
|
new features:
version 1.1: updates for version 1.1 are mostly related to
invokation of autm by other programs.
it is now possible to keep still-to-run jobs in a buffer if all hosts
are currently occupied:
by turning the buffering parameter on or setting the
--buffer option, autm does not exit if there is
no usable host left but waits buffering_wait minutes (or
--buffer N minutes; default is 2) until it starts another
attempt. hosts are then checked again if required.
further new options:
|
-x, --no-exit | don't exit after job
submission but wait until all jobs that were submitted by this
invokation of autm finished; does not work if max. host
number is exceeded and --buffer option is not given;
usefull for invokation from other scripts;
| |
-X | is the same as -x (--no-exit) but
gives in combination with --quiet a short one-line status
summary: number of jobs started / maximum expected run-time / number of
jobs left in buffer / time of next attempt;
| |
-q, --quiet | suppresses all messages except error and warnings;
| |
-d, --dry-run | simulation mode; hosts are not checked and
tempmap not started; tempmap pretends to run for 30 seconds;
| |
-W, --watch-forever [N] | monitor all currently running
tempmap jobs at a refresh rate of N seconds, default is 5, and don't exit
when all jobs are finished; try the hidden feature -WW;
| |
-K, --kill-tempmap | kill all remote tempmap jobs; nifty feature!
| |
-m, --maxloadavg-mode | invoked without any other
argument, the current mode is shown.
|
version 1.2: the most important changes are a number of internal
improvements and bug fixes. new options: -k or --kill-synch
kills all currently running synchronous checking processes; useful if a host
is hung. killing the running tempmap jobs is now done with a capital -K
or --kill-tempmap. there is an additional, hidden -KK
option that sends a kill-signal to all hosts in the hostlist file, no matter
wheather tempmap is running there or not. most useful:
-r N or --maxtime N restricts the hostslist to hosts with
a maximum expected runtime of N minutes. you can, alternatively, set this in the
parameter file with the maximum_runtime parameter.
a dont_believe parameter was added that allows so set the threshold for
runtime-record and -average determination. if a tempmap job is faster than this
value (in seconds), it will not affect the record and average entry.
if your computer is a member of the linux netgroup and you want to
access your tempmap files from the various hosts via your /home
directory, set the newly introduced netgroup parameter in the
parameter file to 1. if the hosts access your working directory via
/net/yourcomputer, leave it at 0. it's recommended to use the first
method if possible, as it is a smaler burden to the local network.
for synchronous checking it is necessary to add a little synch_delay
between requests (otherwise your computers job table might
overrun). this is typically achieved with a synch_delay value
of 0.2 - 0.4 seconds. however, your computer's sleep command might not
understand fraction numbers. in this case, any value larger than
1 will cause a value^2 loop delay. try 15.
|
|
|
syntax:
Usage: autm [OPTION(s)] [InputFile(s)]
-c, --check force checking of hosts (printing long output)
-n, --no-check suppress checking of hosts
-y, --synchron synchronous checking mode (quick+dirty)
+y, -Y, --no-synchron regular checking mode (slow+safe; default)
-h, --hosts N restrict regular check to the first N hosts
-a, --all-hosts don't restrict number of hosts (default)
-r, --maxtime N restrict hosts to max. expected runtime of N min.
-m, --maxloadavg-mode [N] show or set max loadavg determination mode (1-3)
-t, --time show time since/until check and exit
-l, --list [N] list computers from last check and exit
-s, --sort sort hosts by their anticipated speed (default)
+s, -S, --dont-sort don't sort hosts (default for -l)
-b, --buffer [N] if all hosts used, wait (N min.) and try again
-w, --watch [N] monitor files currently being tempmapped (N in sec)
-W, --watch-forever [N] don't exit if no job is running
-k, --kill-synch kill all running synchronous checking processes
-K, --kill-tempmap kill all remote tempmap jobs
-x, --no-exit don't exit until all jobs finished
-q, --quiet decrease verbosity
-d, --dry-run tempmap simulation for 30 seconds
-v, --version prompt version and exit
-h, --help print this help page and exit
|
|
|
limitations:
the abundance and atmosphere files are defined in the parameter file,
which makes it inconvenient to submit a larger number of jobs with
varying abundance/atmosphere values. this will be included in the next
version.
|
|
|
download:
autm.tar contains four scripts: autm,
autm_start, autm_start_remote and autm_synch. put
them to your ~/bin directory and copy autm_start_remote to the ~/bin
directory(s) of the hosts you are going to use (if it's not the one on your
computer). that's it!
this list
of linux computers might be helpful. and here is my current
autm.hostlist.
my tempmap jobs typically run 8 minutes (reference_speed=500 seconds) on the
fastest machines (tempmap version 3.2 with
typically 10 spectra and 5-10 line blends). if yours is different, adjust the
reference_speed parameter, and then you should probably do fine with
this list. most likely, it's the best thing to copy this
cleaned autm.hostlist to your ~/bin
and let your computer determine your best speed values.
you can add the appropriate
hostkeys
to your ~/.ssh/known_hosts file (otherwise you have to type yes
for each host when accessing it for the first time).
|
|
|
bugs:
bugs and insects will show up once the temperatures rise. now it's still too
cold.
but there's a small hidden shakespeare feature included ;-)
|
|
tempargo
and
xtempargo:
as mentioned above, tempargo allows to vary a specific parameter
of a given tempmap input file, calculate all appropriate derivatives
simultaneously using autm and list/plot the resulting
normalized chi-squared values.
you can download tempargo here. it's a
pre-release version as i'm currently working on it, written in python.
put it to ~/bin.
you need to have python installed on your computer and preferably the
gnuplot.py module. if gnuplot.py is not found, grace is used for
plotting instead. you also need
tempar
by michi weber which is part of the stellar activity tempmap package.
after using tempargo on different parameters, try tempargo ""
(with two single or double quotes). this will give you a list of
all previous tempargo results located in the current directory.
for xtempargo, you definitely need the python
gnuplot.py module. - to start the program, just type
xtempargo after a tempargo run.
|
|
|