Crane, Sandhills and Tusker are managed by the SLURM resource manager. In order to run processing on Crane, Sandhills or Tusker, you must create a SLURM script that will run your processing. After submitting the job, SLURM will schedule your processing on an available worker node.
Before writing a submit file, you may need to compile your application.
Ensure proper working directory for job output
All SLURM job output should be directed to your /work path.
The environment variable $WORK can also be used.
Review how /work differs from /home here.
Creating a SLURM Submit File
The below example is for a serial job. For submitting MPI jobs, please look at the MPI Submission Guide.
A SLURM submit file is broken into 2 sections, the job description and the processing. SLURM job description are prepended with
#SBATCH in the submit file.
Maximum walltime the job can run. After this time has expired, the job will be stopped.
Memory that is allocated per core for the job. If you exceed this memory limit, your job will be stopped.
Specify the real memory required per node in MegaBytes. If you exceed this limit, your job will be stopped. Note that for you should ask for less memory than each node actually has. For instance, Tusker has 1TB, 512GB and 256GB of RAM per node. You may only request 1000GB of RAM for the 1TB node, 500GB of RAM for the 512GB nodes, and 250GB of RAM for the 256GB nodes. For Crane, the max is 500GB.
The name of the job. Will be reported in the job listing.
The partition the job should run in. Partitions determine the job's priority and on what nodes the partition can run on. See Available Partitions on Sandhills, and Available Partitions on Crane and Tusker for a list of possible partitions.
Location of the stderr will be written for the job.
[username]should be replaced your group name and username. Your username can be retrieved with the command
id -unand your group with
Location of the stdout will be written for the job.
Submitting the job
Submitting the SLURM job is done by command
sbatch. SLURM will read the submit file, and schedule the job according to the description in the submit file.
Submitting the job described above is:
The job was successfully submitted.
Checking Job Status
Job status is found with the command
squeue. It will provide information such as:
- The State of the job:
- R - Running
- PD - Pending - Job is awaiting resource allocation.
- Additional codes are available on the squeue page.
- Job Name
- Run Time
- Nodes running the job
Checking the status of the job is easiest by filtering by your username, using the
-u option to squeue.
Additionally, if you want to see the status of a specific partition, for example if you are part of a partition, you can use the
-p option to
Checking Job Start
You may view the start time of your job with the command
squeue --start. The output of the command will show the expected start time of the jobs.
The output shows the expected start time of the jobs, as well as the reason that the jobs are currently idle (in this case, low priority of the user due to running numerous jobs already).
Removing the Job
Removing the job is done with the
scancel command. The only argument to the
scancel command is the job id. For the job above, the command is: