Management of Resources
The batch system manages resources, such as CPUs and memory, allocates these to waiting jobs, and uses a method called fairshare scheduling to ensure that computing time is shared fairly among all the users and contributes to the priority of jobs. Because the memory on the nodes is shared amoung the jobs, it is very important to estimate memory requirements as accurately as possible. A further mechanism, called Backfill, is employed to allow short-running jobs to fill in gaps in the scheduling plan and thus optimise resource usage.
The priority of a job is mainly determined by the following factors, in order of importance
- User Shares - The shares a user has in the sense of fairshare scheduling constitutes the most important factor. The more shares a users has, the higher the priority of his or her job will be. Consuming CPU time, GPU time or RAM reduces the number of shares a user has and thus lowers the priority of any waiting jobs.
- Age of Job - How long the job has been waiting in the queue is another important factor. However the value is limited and reaches a maximum after a certain time.
- Size of Job - Larger jobs in term of number of nodes requested will have their priority increased slightly.
The factors affecting the priority of the currently waiting jobs can be viewed with the following command, which sorts the results according to total priority:
sprio -Sy