What is HTCondor ?
The HTCondor project is designed to enable High Throughput Computing (HTC) for users. The HTCondor project provides software to schedule individual applications, workflows, and for sites to manage resources. There are three interesting use cases of HTCondor at HCC:
- HTCondor as a batch system: The Red cluster uses HTCondor as its primary batch system, in a manner similar to Torque on Tusker.
- HTCondor as a cycle scavenger: HTCondor is run alongside SLURM on Sandhills. User jobs are run on nodes not otherwise in use. Once SLURM schedules jobs on those processors HTCondor either checkpoints its jobs or moves them out of the way. This way machines that are reserved can be used for computation. Currently, HTCondor can only handle jobs run on one node.
- HTCondor as an overlay: When user jobs are submitted to glidein.unl.edu, HCC allocates batch slots on other campus's clusters. These show up as batch slots in a virtual HTCondor pool; normal HTCondor jobs can run on this pool, even though they may be running off-campus.
What is the Open Science Grid (OSG)?
The Open Science Grid (OSG) advances science through open distributed computing. The OSG is a multi-disciplinary partnership to federate local, regional, community and national cyber infrastructures to meet the needs of research and academic communities at all scales. HCC participates in the OSG as a resource provider and a resource user. We host a node glidein.unl.edu which provides a gateway to running jobs on the OSG using HTCondor.
The map above shows the Open Science Grid sites located across the U.S.
This help document is divided into four sections, namely:
- Setting up environment for submitting HTCondor jobs to use OSG.
- Running HTCondor jobs on the OSG from HCC.
- A simple example of submitting a HTCondor job
- Advanced HTCondor Commands
- Characteristics of an HTCondor friendly job