This quick start demonstrates how to run multiple copies of Fortran/C program using Condor on HCC supercomputers. The sample codes and submit scripts can be downloaded from <condor_dir.zip>.
Login to Sandhills
Log in to Sandhills through PuTTY (For Windows Users) or Terminal (For Mac/Linux Users) and make a subdirectory called condor_dir under the
$WORK directory. In the subdirectory
condor_dir, create job subdirectories that host the input data files. Here we create two job subdirectories,
job_1, and put a data file (
data.dat) in each subdirectory. The data file in
job_0 has a column of data listing the integers from 1 to 5. The data file in
job_1 has a integer list from 6 to 10.
In the subdirectory condor
_dir, save all the relevant codes. Here we include two demo programs,
demo_c_condor.c, that compute the sum of the data stored in each job subdirectory (
job_1). The parallelization scheme here is as the following. First, the master computer node send out many copies of the executable from the
condor_dir subdirectory and a copy of the data file in each job subdirectories. The number of executable copies is specified in the submit script (
queue), and it usually matches with the number of job subdirectories. Next, the workload is distributed among a pool of worker computer nodes. At any given time, the number of available worker nodes may vary. Each worker node executes the jobs independent of other worker nodes. The output files are separately stored in the job subdirectory. No additional coding are needed to make the serial code turned "parallel". Parallelization here is achieved through the submit script.
Compiling the Code
The compiled executable needs to match the "standard" environment of the worker node. The easies way is to directly use the compilers installed on the HCC supercomputer without loading extra modules. The standard compiler of the HCC supercomputer is GNU Compier Collection. The version can be looked up by the command lines
gcc -v or
Creating a Submit Script
Create a submit script to request 2 jobs (queue). The name of the job subdirectories is specified in the line
$(process) macro assigns integer numbers to the job subdirectory name
job_. The numbers run form 0 to queue-1. The name of the input data file is specified in the line
Submit the Job
The job can be submitted through the command
condor_submit. The job status can be monitored by entering
condor_q followed by the username.
In the job subdirectory
job_0, the sum from 1 to 5 is computed and printed to the
.out file. In the job subdirectory
job_1, the sum from 6 to 10 is computed and printed to the