Running advanced jobs on the HPC 

Running advanced jobs on the HPC

The following sections highlight some of the more useful aspects of the Sun Grid Engine commands. For an exhaustive list of all the features offered, please see the relevant online manual pages by logging on to the HPC and typing:

man command

This will display the man page for the specified command.



Large memory jobs

Quick Guide:

For jobs greater than 500 megabytes, calculate the amount of memory your job requires as closely as possible (in 'M'egabytes or 'G'igabytes), then for memory size x submit the job with:

qsub -l mem_free=xG -l mem_token=xG myjob.com

Large memory jobs submitted without accurate memory resource requests will be deleted.

Free memory:

The amount of physical memory available in each execution node can vary from 8 to 16 gigabytes, with 8 gigabyte machines being very much in the majority(1). As each execution node runs up to four jobs simultaneously, the job scheduler needs to be given a fairly accurate estimate of the job's size in order not to oversubscribe physical memory on a given node. When memory is oversubscribed, all jobs on that node will start to use virtual memory, which is much slower. To prevent this type of disruption, jobs exceeding 1/2 a gigabyte in size must be submitted with a valid memory resource request. Users are required to understand the memory and other resource requirements of the jobs they run on the HPC. If in doubt, start up a single test job and use the job monitoring tools.

If you're launching just one large memory job, then you can simply follow the instructions in this section. If you wish to launch multiple large memory jobs, see the "memory tokens" section below.

For a single large memory job, you can use the single resource requirement mem_free to ensure that the job is launched on a node with sufficient free memory:

qsub -l mem_free=xG myjob.com

This example will ensure myjob.com will only be run on a machine with x gigabytes of physical memory free. The unit 'M' for megabytes can be used in place of 'G', for example:

qsub -l mem_free=700M myjob.com

This will ensure a job launches only a node with 700 megabytes of physical memory currently free. The memory resource request commands also accept real values, e.g. 2.7G for 2.7 gigabytes.

Memory tokens:

When submitting a number of large memory jobs at once, it is possible for jobs to oversubscribe the memory on a node due to the delay between the job starting and it finally claiming its full amount of memory - which is what mem_free reports. In the few seconds a job takes to claim its full amount of memory, a second job may launch on that node, unaware that the free memory it sees reported is soon to be claimed by the first job. To avoid such cases, it is necessary to add a request for memory 'tokens' - these are subtracted immediately from a node's available memory token count, avoiding the delayed oversubscription problem. An example command for an x gigabyte job:

qsub -l mem_free=xG -l mem_token=xG

Note: It is still necessary to use mem_free even when using mem_token.


Job arrays: Submitting multiple jobs

[Migrating users: Please note that $COD_TASK_ID has been replaced by $SGE_TASK_ID]

For tasks such as Monte Carlo simulations and parameter studies, it is often necessary to run the same program multiple times, often with slightly different input parameters. Rather than create a unique job file for each run and submit each of them separately, SGE offers a job array option. When combined with a tailored job script, this allows you to submit multiple similar jobs with a single command.

Job arrays can be submitted by adding the -t option to the qsub command. For example, the following command:

qsub -t 4-10:2 myjob.com

Submits the job script myjob.com a number of times, setting its task id to a different value each time. This task id can be used by the job script to perform slightly different tasks each time, e.g. reading from a different input file, or passing a different set of parameters to your program.

The number of submissions, and the values of the task ids, are controlled by the extra argument after the -t. The format is x-y:z, where x is the first task id, y the last, and the optional :z gives the step increment. The above example submits the job script 4 times, with task id values of 4, 6, 8 and 10 (ie, first = 4, last = 10, step = 2).

The task id is available to the job script via the environment variable $SGE_TASK_ID and the pseudo environment variable $TASK_ID. The latter should be used in any SGE directive (lines beginning #$, e.g. the -o and -e options for setting the output filenames - see the basic example). The former variable - $SGE_TASK_ID - should be used in lines which are not SGE directives.

As an example, the following qsub command and job script will run myprogram 10 times, with values 10, 20, ..., 100. Each task will read input from its own data file, and will write to its own data file.

qsub -t 10-100:10 myjob.com

#!/bin/bash

#$ -S /bin/bash
#$ -o $HOME/myoutput.$TASK_ID.stdout
#$ -e $HOME/myoutput.$TASK_ID.stderr

. /etc/profile
module add dot

cd my_program_directory
my_program < my_program.$SGE_TASK_ID.input

Hint: Be careful not to mix up where to use $TASK_ID and $SGE_TASK_ID.

If you don't explictly set the names of output files, SGE will create files in your home directory with the format job_name.job_id.task_id. If you are explictly setting the output file names, make sure that each task id gets its own output file. The results of directing output from multiple tasks to the same file is described in the manual as `undefined'. In practice, this means some task output might simply disappear.

Managing job arrays

Once you get the hang of writing flexible job scripts, job arrays make job submission much easier. They also make job management easier too. All tasks within the same job are given a different task id, but all have the same job id. An example output from qstat for a job array is given below. The extra number at the end of each line, under the ja-task-id field is the task id for that job.

job-ID  prior   name       user     state submit/start at     queue           slots ja-task-ID 
----------------------------------------------------------------------------------------------

    881 0.55500 tasktest.c pacey    r     08/15/2006 13:35:58 serial.q@sun012     1 3
    881 0.55500 tasktest.c pacey    r     08/15/2006 13:35:58 serial.q@sun013     1 4
    881 0.55500 tasktest.c pacey    t     08/15/2006 13:35:58 serial.q@sun014     1 1
    881 0.55500 tasktest.c pacey    r     08/15/2006 13:35:58 serial.q@sun016     1 2
    881 0.55500 tasktest.c pacey    t     08/15/2006 13:35:58 serial.q@sun022     1 5

If you want to stop all the tasks at once, you can use the normal qdel command:

qdel 881

If you want to stop individual jobs, you can suffix the job id with the individual task id. To stop just task 4, we do:

qdel 881.4

To stop the first three jobs, do:

qdel 881.1-3


Footnotes

(1) The cluster currently has 15 nodes with 16 Gbytes memory


Lancaster HPC home page | ISS home page | University home page

  
To the Top

©Lancaster University   Computer User Agreement   Privacy Statement  

©Lancaster University   ISS Governance   Computer User Agreement   Privacy & Cookies Notice  

Lancaster University
Bailrigg
LancasterLA1 4YW United Kingdom
+44 (0) 1524 65201