Running advanced jobs on the HPC
The following sections highlight some of the more useful aspects of the Sun Grid Engine commands. For an exhaustive list of all the features offered, please see the relevant online manual pages by logging on to the HPC and typing:
man command
This will display the man page for the specified command.
Large memory jobs
Quick Guide:
For jobs greater than 500 megabytes, calculate the amount of memory your job requires as closely as possible (in 'M'egabytes or 'G'igabytes), then for memory size x submit the job with:
Large memory jobs submitted without accurate memory resource requests will be deleted.
Free memory:
The amount of physical memory available in each execution node can vary from 8 to 16 gigabytes, with 8 gigabyte machines being very much in the majority(1). As each execution node runs up to four jobs simultaneously, the job scheduler needs to be given a fairly accurate estimate of the job's size in order not to oversubscribe physical memory on a given node. When memory is oversubscribed, all jobs on that node will start to use virtual memory, which is much slower. To prevent this type of disruption, jobs exceeding 1/2 a gigabyte in size must be submitted with a valid memory resource request. Users are required to understand the memory and other resource requirements of the jobs they run on the HPC. If in doubt, start up a single test job and use the job monitoring tools.
If you're launching just one large memory job, then you can simply follow the instructions in this section. If you wish to launch multiple large memory jobs, see the "memory tokens" section below.
For a single large memory job, you can use the single resource requirement mem_free to ensure that the job is launched on a node with sufficient free memory:
This example will ensure myjob.com will only be run on a machine with x gigabytes of physical memory free. The unit 'M' for megabytes can be used in place of 'G', for example:
This will ensure a job launches only a node with 700 megabytes of physical memory currently free. The memory resource request commands also accept real values, e.g. 2.7G for 2.7 gigabytes.
Memory tokens:
When submitting a number of large memory jobs at once, it is possible for jobs to oversubscribe the memory on a node due to the delay between the job starting and it finally claiming its full amount of memory - which is what mem_free reports. In the few seconds a job takes to claim its full amount of memory, a second job may launch on that node, unaware that the free memory it sees reported is soon to be claimed by the first job. To avoid such cases, it is necessary to add a request for memory 'tokens' - these are subtracted immediately from a node's available memory token count, avoiding the delayed oversubscription problem. An example command for an x gigabyte job:
Note: It is still necessary to use mem_free even when using mem_token.
Job arrays: Submitting multiple jobs
[Migrating users: Please note that $COD_TASK_ID has been replaced by $SGE_TASK_ID]
For tasks such as Monte Carlo simulations and parameter studies, it is often necessary to run the same program multiple times, often with slightly different input parameters. Rather than create a unique job file for each run and submit each of them separately, SGE offers a job array option. When combined with a tailored job script, this allows you to submit multiple similar jobs with a single command.
Job arrays can be submitted by adding the -t option to the qsub command. For example, the following command:
Submits the job script myjob.com a number of times, setting its task id to a different value each time. This task id can be used by the job script to perform slightly different tasks each time, e.g. reading from a different input file, or passing a different set of parameters to your program.
The number of submissions, and the values of the task ids, are controlled by the extra argument after the -t. The format is x-y:z, where x is the first task id, y the last, and the optional :z gives the step increment. The above example submits the job script 4 times, with task id values of 4, 6, 8 and 10 (ie, first = 4, last = 10, step = 2).
The task id is available to the job script via the environment variable $SGE_TASK_ID and the pseudo environment variable $TASK_ID. The latter should be used in any SGE directive (lines beginning #$, e.g. the -o and -e options for setting the output filenames - see the basic example). The former variable - $SGE_TASK_ID - should be used in lines which are not SGE directives.
As an example, the following qsub command and job script will run myprogram 10 times, with values 10, 20, ..., 100. Each task will read input from its own data file, and will write to its own data file.
#$ -S /bin/bash
#$ -o $HOME/myoutput.$TASK_ID.stdout
#$ -e $HOME/myoutput.$TASK_ID.stderr
. /etc/profile
module add dot
cd my_program_directory
my_program < my_program.$SGE_TASK_ID.input
Hint: Be careful not to mix up where to use $TASK_ID and $SGE_TASK_ID.
If you don't explictly set the names of output files, SGE will create files in your home directory with the format job_name.job_id.task_id. If you are explictly setting the output file names, make sure that each task id gets its own output file. The results of directing output from multiple tasks to the same file is described in the manual as `undefined'. In practice, this means some task output might simply disappear.
Managing job arrays
Once you get the hang of writing flexible job scripts, job arrays make job submission much easier. They also make job management easier too. All tasks within the same job are given a different task id, but all have the same job id. An example output from qstat for a job array is given below. The extra number at the end of each line, under the ja-task-id field is the task id for that job.
job-ID prior name user state submit/start at queue slots ja-task-ID
----------------------------------------------------------------------------------------------
881 0.55500 tasktest.c pacey r 08/15/2006 13:35:58 serial.q@sun012 1 3
881 0.55500 tasktest.c pacey r 08/15/2006 13:35:58 serial.q@sun013 1 4
881 0.55500 tasktest.c pacey t 08/15/2006 13:35:58 serial.q@sun014 1 1
881 0.55500 tasktest.c pacey r 08/15/2006 13:35:58 serial.q@sun016 1 2
881 0.55500 tasktest.c pacey t 08/15/2006 13:35:58 serial.q@sun022 1 5
If you want to stop all the tasks at once, you can use the normal qdel command:
If you want to stop individual jobs, you can suffix the job id with the individual task id. To stop just task 4, we do:
To stop the first three jobs, do:
Footnotes
| (1) | The cluster currently has 15 nodes with 16 Gbytes memory |
©Lancaster University Computer User Agreement Privacy Statement
©Lancaster University ISS Governance Computer User Agreement Privacy & Cookies Notice