Skip to end of metadata
Go to start of metadata

This quick start demonstrates how to run multiple copies of Fortran/C program using Condor on HCC supercomputers. The sample codes and submit scripts can be downloaded from <condor_dir.zip>. 

Login to Sandhills

Log in to Sandhills through PuTTY (For Windows Users) or Terminal (For Mac/Linux Users) and make a subdirectory called condor_dir under the $WORK directory. In the subdirectory condor_dir, create job subdirectories that host the input data files. Here we create two job subdirectories, job_0 and job_1, and put a data file (data.dat) in each subdirectory. The data file in job_0 has a column of data listing the integers from 1 to 5. The data file in job_1 has a integer list from 6 to 10. 

$ cd $WORK
$ mkdir condor_dir
$ cd condor_dir
$ mkdir job_0
$ mkdir job_1

In the subdirectory condor_dir, save all the relevant codes. Here we include two demo programs, demo_f_condor.f90 and demo_c_condor.c, that compute the sum of the data stored in each job subdirectory (job_0 and job_1). The parallelization scheme here is as the following. First, the master computer node send out many copies of the executable from the condor_dir subdirectory and a copy of the data file in each job subdirectories. The number of executable copies is specified in the submit script (queue), and it usually matches with the number of job subdirectories. Next, the workload is distributed among a pool of worker computer nodes. At any given time, the number of available worker nodes may vary. Each worker node executes the jobs independent of other worker nodes. The output files are separately stored in the job subdirectory. No additional coding are needed to make the serial code turned "parallel". Parallelization here is achieved through the submit script. 

demo_f_condor.f90
Program demo_f_condor
	implicit none
	integer, parameter :: N = 5
	real*8 w
	integer i
	common/sol/ x
	real*8 x
	real*8, dimension(N) :: y_local
	real*8, dimension(N) :: input_data
	
	open(10, file='data.dat')
	
	do i = 1,N
		read(10,*) input_data(i)
	enddo
	
	do i = 1,N
		w = input_data(i)*1d0
		call proc(w)
		y_local(i) = x		
		write(6,*) 'i,x = ', i, y_local(i)
	enddo
	write(6,*) 'sum(y) =',sum(y_local)
Stop
End Program
Subroutine proc(w)
	real*8, intent(in) :: w
	common/sol/ x
	real*8 x
	
	x = w
	
Return
End Subroutine
demo_c_condor.c
//demo_c_condor
#include <stdio.h>

double proc(double w){
		double x;		
		x = w;	
		return x;
}

int main(int argc, char* argv[]){
	int N=5;
	double w;
	int i;
	double x;
	double y_local[N];
	double sum;	
	double input_data[N];
	FILE *fp;
	fp = fopen("data.dat","r");
	for (i = 1; i<= N; i++){
	fscanf(fp, "%lf", &input_data[i-1]);
	}
	
	for (i = 1; i <= N; i++){        
		w = input_data[i-1]*1e0;
		x = proc(w);
		y_local[i-1] = x;
		printf("i,x= %d %lf\n", i, y_local[i-1]) ;
	}
	
	sum = 0e0;
	for (i = 1; i<= N; i++){
		sum = sum + y_local[i-1];	
	}
	
	printf("sum(y)= %lf\n", sum);    
return 0;
}

Compiling the Code

The compiled executable needs to match the "standard" environment of the worker node. The easies way is to directly use the compilers installed on the HCC supercomputer without loading extra modules. The standard compiler of the HCC supercomputer is GNU Compier Collection. The version can be looked up by the command lines gcc -v or gfortran -v.

$ gfortran demo_f_condor.f90 -o demo_f_condor.x
$ gcc demo_c_condor.c -o demo_c_condor.x

Creating a Submit Script

Create a submit script to request 2 jobs (queue). The name of the job subdirectories is specified in the line initialdir. The $(process) macro assigns integer numbers to the job subdirectory name job_. The numbers run form 0 to queue-1. The name of the input data file is specified in the line transfer_input_files.

submit_f.condor
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_f_condor.x
output = Fortran_$(process).out
error = Fortran_$(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
submit_c.condor
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_c_condor.x
output = C_$(process).out
error = C_$(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2

Submit the Job

The job can be submitted through the command condor_submit. The job status can be monitored by entering condor_q followed by the username. 

$ condor_submit submit_f.condor
$ condor_submit submit_c.condor
$ condor_q <username>

Sample Output

In the job subdirectory job_0, the sum from 1 to 5 is computed and printed to the .out file. In the job subdirectory job_1, the sum from 6 to 10 is computed and printed to the .out file. 

Fortran_0.out
 i,x =            1   1.0000000000000000     
 i,x =            2   2.0000000000000000     
 i,x =            3   3.0000000000000000     
 i,x =            4   4.0000000000000000     
 i,x =            5   5.0000000000000000     
 sum(y) =   15.000000000000000     
Fortran_1.out
 i,x =            1   6.0000000000000000     
 i,x =            2   7.0000000000000000     
 i,x =            3   8.0000000000000000     
 i,x =            4   9.0000000000000000     
 i,x =            5   10.000000000000000     
 sum(y) =   40.000000000000000     
  • No labels