PBS Jobs

The Basics

Every PBS script is just a shell script with special #PBS lines at the top. The scheduler reads those lines and decides where and how to run your job.

The Simplest Possible Script

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1

cd $PBS_O_WORKDIR
echo "Hello from the cluster!"

Three Things Every Script Needs

Line	What it does	Can I skip it?
`#!/bin/bash`	Tells the system to use bash. Always the very first line.	❌ No
`#PBS -q default`	Which queue to run in. If skipped, scheduler picks one.	⚠️ Not recommended
`#PBS -l nodes=1:ppn=1`	How many nodes and CPU cores your job needs.	⚠️ Not recommended
`cd $PBS_O_WORKDIR`	Goes to the folder where you ran `qsub`. Without this your script won't find your files.	❌ Always include this

One Rule to Remember

All #PBS lines must come before any real command. Any directive placed after a command is silently ignored by the scheduler — no error, it just won't work.

#!/bin/bash
#PBS -N MyJob       ← ✅ this works
#PBS -q default     ← ✅ this works

cd $PBS_O_WORKDIR    ← first real command

#PBS -l nodes=1:ppn=4  ← ❌ too late, ignored!

Want the Full Reference?

Every possible PBS option is documented in the built-in manual. Run this on the login node:

man qsub

It lists every valid #PBS directive, all options, and environment variables available to your job.

Queues

A queue determines how long your job can run and how many cores you can use. Pick the right one — using a longer queue than you need wastes shared resources.

Available Queues on IISERM HPC

Queue	Max Walltime	Max Cores per User	Use For
`default`	8 hours	200 cores	Quick jobs, testing, short calculations
`short`	72 hours (3 days)	100 cores	Medium length production jobs
`long`	1080 hours (45 days)	100 cores	Long running simulations
`infinity`	4380 hours (~6 months)	50 cores	Very long calculations — use sparingly

How to Set a Queue

#PBS -q default     # quick tests
#PBS -q short       # up to 3 days
#PBS -q long        # up to 45 days
#PBS -q infinity    # up to ~6 months

Tips

Always test with default first before submitting long jobs.
The default queue has the most cores available — good for parallel work.
Jobs that exceed their queue's walltime limit are killed automatically.
To check queue status live: qstat -q

Walltime

Walltime is the maximum real-world time your job is allowed to run. When it runs out, your job is killed — even if it's not finished.

How to Set It

#PBS -l walltime=HH:MM:SS

# Examples:
#PBS -l walltime=00:30:00    # 30 minutes
#PBS -l walltime=02:00:00    # 2 hours
#PBS -l walltime=24:00:00    # 1 day
#PBS -l walltime=72:00:00    # 3 days (short queue max)

Script with Walltime

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=02:00:00

cd $PBS_O_WORKDIR
echo "Start: $(date)"

# your work here

echo "End: $(date)"

Why Walltime Matters

Situation	What happens
No walltime set	Scheduler uses the queue's default limit — you have no control
Walltime too short	Job gets killed before finishing — you lose all progress
Walltime too long	Job waits longer in the queue — others are prioritised
Walltime just right	Job runs, finishes, output is saved ✅

💡 A good rule: estimate how long your job takes, then add 20–30% extra as buffer.

Output & Logs — Explained Simply

When your job runs, it produces two types of output:

Type	What it is	Example
stdout (standard output)	Normal output from your program — results, progress messages, print statements	`print("Hello")`, `echo "Done"`
stderr (standard error)	Warning messages, errors, or diagnostic info — things that went wrong or need attention	`FileNotFoundError`, `Warning: low memory`

By default, PBS saves both to the folder where you ran qsub. You can control whether they go to one file or two.

Default Behaviour (no flags)

If you don't specify anything, PBS creates a file named <jobname>.o<jobid> containing both stdout and stderr merged together.

# You submit:
qsub myscript.sh
# → Job ID: 465118.iisermhpc1
# → Output file created: MyJob.o465118

Option 1 — One Merged File (recommended for beginners)

Use -j oe to merge stdout and stderr into a single file. Add -o to give it a clean, readable name.

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe              # join stdout + stderr
#PBS -o MyJob.log        # name the output file

cd $PBS_O_WORKDIR
echo "everything goes into MyJob.log"

✅ One file, easy to read. Both normal output and any errors appear together in chronological order.

Option 2 — Separate stdout and stderr (advanced)

Leave out -j oe and specify both -o and -e separately. Useful if you want to check errors without digging through all the output.

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -o MyJob_out.log   # stdout only
#PBS -e MyJob_err.log   # stderr only

cd $PBS_O_WORKDIR
echo "this goes to MyJob_out.log"
echo "this is an error" >&2   # goes to MyJob_err.log

Quick Summary

What you write	Files created
Nothing	`MyJob.o465118` — auto-named, merged
`-j oe -o job.log`	`job.log` — one clean file, your name
`-o out.log -e err.log`	Two files — stdout and stderr separated

Watching Logs in Real Time

While a job is running you can follow its output live:

tail -f MyJob.log

Press Ctrl + C to stop following.

CPUs — Requesting Cores

Requesting CPUs tells the scheduler how many cores to give your job. There are two valid syntaxes on IISERM HPC — both work, they mean the same thing.

Two Syntaxes, Same Meaning

# Old style — Torque syntax
#PBS -l nodes=1:ppn=4

# New style — PBS Pro syntax
#PBS -l select=1:ncpus=4

	Old style `ppn`	New style `ncpus`
Full form	Processors Per Node	Number of CPUs
Used with	`nodes=1:ppn=4`	`select=1:ncpus=4`
Works on IISERM?	✅ Yes	✅ Yes
Meaning	Exactly the same — CPUs per node

⚠️ Never mix the two styles in the same script. Pick one and stick with it.

Common CPU Requests

# Single core — default, simplest
#PBS -l nodes=1:ppn=1

# 4 cores on one node
#PBS -l nodes=1:ppn=4

# 16 cores on one node
#PBS -l nodes=1:ppn=16

# Same using new syntax
#PBS -l select=1:ncpus=16

How to Know How Many Cores to Request

Inside your running job you can check:

nproc                  # cores assigned to this job
cat $PBS_NODEFILE      # lists the node(s) assigned

VERY IMPORTANT: Only use ppn > 1 if your code actually uses parallel processing!

Most Python/R/Matlab scripts run on a single core by default. If you request ppn=16 but your code only uses 1 core, the other 15 cores sit idle — wasting shared resources and making your job wait longer in the queue.

Rule of thumb: If you're not sure whether your code uses parallel processing, always start with ppn=1. Only increase it if:

You're using libraries like mpi4py, multiprocessing, OpenMP, or numba.prallel
Your software documentation explicitly says it supports multi-core execution
You've tested and confirmed speedup with more cores

When in doubt, stick to ppn=1. You can always request more later once you know your code benefits from it.

Loading Software

Software on the cluster is not loaded by default. You use the module system to load what you need inside your job script.

See What's Available

On the login node, type module load then press Tab Tab. When prompted, press y:

module load     # press Tab Tab, then y

Display all 131 possibilities? (y or n) y

anaconda3                 codes/gromacs/2023        tools/root/6.28
codes/anaconda3/23.3.1    codes/orca/5.0.4          codes/python/3.11.4
codes/cp2k/8.1.0          codes/R/4.3.1             compilers/gnu/8.3.0
codes/gaussian16/G16      tools/deeptools/3.5.1     compilers/openmpi/4.1.1
codes/geant4/11.1         tools/espresso/7.0        libs/fftw/3.3.10
                          ... and more

Then load whatever you need by typing its exact name:

module load anaconda3
module load codes/python/3.11.4
module load codes/R/4.3.1
module load codes/gromacs/2023
module load tools/root/6.28

Other Module Commands

module list                    # what is currently loaded
module show anaconda3          # preview what a module does
module unload anaconda3        # unload one module
module purge                   # unload everything

Script with Module Load

#!/bin/bash
#PBS -N PythonJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o PythonJob.log

cd $PBS_O_WORKDIR

module load anaconda3

echo "Start: $(date)"
python -u script.py
echo "End  : $(date)"

💡 Always put module load after cd $PBS_O_WORKDIR and before your actual commands.

Using a Conda Environment

If your packages are inside a conda environment, activate it like this:

#!/bin/bash
#PBS -N CondaJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o CondaJob.log

cd $PBS_O_WORKDIR

source ~/.bashrc
conda activate myenv

echo "Start: $(date)"
python -u script.py
echo "End  : $(date)"

💡 source ~/.bashrc is required first so the conda command is available inside the PBS job environment.

Templates

Copy any of these, change the job name and your command, and submit.

1. Minimal — Just a Shell Command

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o MyJob.log

cd $PBS_O_WORKDIR
echo "Start: $(date)"

# your command here

echo "End: $(date)"

2. Python Script — with Module Load

#!/bin/bash
#PBS -N PythonJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -o PythonJob.log

cd $PBS_O_WORKDIR
module load anaconda3

echo "Start: $(date)"
python -u script.py
echo "End  : $(date)"

3. Python Inline — Code Inside the Script

#!/bin/bash
#PBS -N InlineJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o InlineJob.log

cd $PBS_O_WORKDIR
module load anaconda3

echo "Start: $(date)"

python -u << 'PYEOF'
for i in range(1, 6):
    print(f"Count: {i}")
print("Done!")
PYEOF

echo "End: $(date)"

💡 The -u flag makes Python output appear immediately in the log without buffering.

4. Multi-Core — Parallel Job

#!/bin/bash
#PBS -N ParallelJob
#PBS -q short
#PBS -l nodes=1:ppn=8
#PBS -l walltime=12:00:00
#PBS -j oe
#PBS -o ParallelJob.log

cd $PBS_O_WORKDIR
module load anaconda3

echo "Cores: $(nproc)"
echo "Start: $(date)"
python -u parallel.py --cores 8
echo "End  : $(date)"

5. Separate stdout and stderr

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -o MyJob_out.log
#PBS -e MyJob_err.log

cd $PBS_O_WORKDIR
module load anaconda3

python -u script.py

Submitting & Monitoring

How to submit your script, check its status, and cancel it if needed.

Submit a Job

qsub script.sh
# → 465118.iisermhpc1   (this is your job ID)

Check Job Status

qstat                       # all jobs in the queue
qstat -u ms21080            # only your jobs
qstat -f 465118.iisermhpc1  # full details of one job

Example output from qstat -u ms21080:

Job ID                 Name        User       Time   S  Queue
---------------------  ----------  ---------  -----  -  -------
465118.iisermhpc1      count_job   ms21080    00:05  R  default

Status Codes

Code	Meaning	What to do
`Q`	Queued	Waiting for a free node — just wait
`R`	Running	Job is running — check log with `tail -f`
`H`	Held	Paused — use `qrls` to release
`E`	Exiting	Finishing up, output being written
`C`	Completed	Done — check your log file

Cancel / Delete a Job

qdel 465118.iisermhpc1     # cancel a specific job

⚠️ Use the full job ID including .iisermhpc1 — just the number alone may not work.

Other Useful Commands

qhold 465118.iisermhpc1    # pause a queued job
qrls 465118.iisermhpc1     # release a held job
qstat -q                   # see all queues and their limits
pbsnodes -a               # see all compute nodes and status
tail -f MyJob.log          # watch log output live
watch -n 5 qstat -u ms21080  # refresh job list every 5 seconds

Useful PBS Variables Inside Your Script

Variable	What it contains
`$PBS_JOBID`	Your job's full ID e.g. `465118.iisermhpc1`
`$PBS_JOBNAME`	Job name you set with `-N`
`$PBS_O_WORKDIR`	Directory where you ran `qsub`
`$PBS_NODEFILE`	Lists nodes assigned to your job
`$PBS_QUEUE`	Queue the job is running in

Advanced Tips: Check Free Cores & Target Specific Nodes

On our cluster, some nodes may appear "up" but are actually down or overloaded. To save time and avoid jobs stuck in queue, you can add a helper function to your ~/.bashrc that shows free cores per node in any queue.

Step 1: Add the `checkfree()` Function to ~/.bashrc

Open your bash configuration file:

nano ~/.bashrc

Scroll to the bottom and paste this function:

~/.bashrc

# === HPC Helper: Check free cores in any queue === checkfree() { local queue=${1:-long} echo "=== Free cores in '$queue' queue ===" pbsnodes -a | awk -v q="$queue" ' /^[a-z]/ {node=$1} /queue =/ {if($3==q) inqueue=1} /pcpus/ {if(inqueue) total=$3} /resources_assigned.ncpus/ { if(inqueue){ used=$3; free=total-used; if(free>0) printf "%-8s : %3d free cores\n", node, free; sum+=free; if(free==total) fullfree++; inqueue=0 }} END { print "------------------------"; print "Total free cores =", sum; print "Fully free nodes =", fullfree+0; }' }

Save with Ctrl+O → Enter, then exit with Ctrl+X.

Step 2: Reload Your ~/.bashrc

Apply the changes by sourcing the file:

source ~/.bashrc

Now the checkfree command is available in your terminal.

Step 3: Use checkfree to See Available Cores

Run checkfree followed by any queue name:

checkfree default
checkfree short
checkfree long
checkfree infinity
checkfree gpushort
checkfree gpulong

Example output:

login1

(base) [ms21080@login1 ~]$ checkfree default === Free cores in 'default' queue === gpc2 : 12 free cores gpc3 : 12 free cores gpc4 : 12 free cores gpc5 : 32 free cores gpc6 : 32 free cores gpc7 : 52 free cores gpc8 : 52 free cores gpc9 : 52 free cores gpc10 : 104 free cores gpc11 : 52 free cores gpc12 : 52 free cores gpc13 : 3 free cores ------------------------ Total free cores = 467 Fully free nodes = 6 (base) [ms21080@login1 ~]$

Bonus: Target a Specific Node

If you see a node with many free cores (e.g., gpc11 has 52 free), you can target it directly in your PBS script:

#PBS -l nodes=gpc11:ppn=4   # request 4 cores on node gpc11 only

This can help your job start faster if that node is lightly loaded.

⚠️ Use node targeting sparingly. Only do this if:

You've confirmed the node is healthy and has resources
Your job has specific hardware needs (e.g., GPU, large RAM)
You're debugging and need consistent node behavior

For most jobs, let the scheduler pick the node automatically.

Pro Workflow Summary

Before submitting, run checkfree default (or your target queue)
Look for nodes with high "free cores" count
Optionally target one: #PBS -l nodes=gpc11:ppn=4
Submit with qsub script.sh
Verify it's running: qstat -u $USER → look for status R
Watch output: tail -f MyJob.log

This simple habit saves hours of waiting for jobs stuck on "dead" nodes.

The Basics

The Simplest Possible Script

Three Things Every Script Needs

One Rule to Remember

Want the Full Reference?

Queues

Available Queues on IISERM HPC

How to Set a Queue

Tips

Walltime

How to Set It

Script with Walltime

Why Walltime Matters

Output & Logs — Explained Simply

Default Behaviour (no flags)

Option 1 — One Merged File (recommended for beginners)

Option 2 — Separate stdout and stderr (advanced)

Quick Summary

Watching Logs in Real Time

CPUs — Requesting Cores

Two Syntaxes, Same Meaning

Common CPU Requests

How to Know How Many Cores to Request

Loading Software

See What's Available

Other Module Commands

Script with Module Load

Using a Conda Environment

Templates

1. Minimal — Just a Shell Command

2. Python Script — with Module Load

3. Python Inline — Code Inside the Script

4. Multi-Core — Parallel Job

5. Separate stdout and stderr

Submitting & Monitoring

Submit a Job

Check Job Status

Status Codes

Cancel / Delete a Job

Other Useful Commands

Useful PBS Variables Inside Your Script

Advanced Tips: Check Free Cores & Target Specific Nodes

Step 1: Add the checkfree() Function to ~/.bashrc

Step 2: Reload Your ~/.bashrc

Step 3: Use checkfree to See Available Cores

Bonus: Target a Specific Node

Pro Workflow Summary

Step 1: Add the `checkfree()` Function to ~/.bashrc