IISER Mohali Logo
IISER Mohali Unofficial HPC Guide

PBS Jobs

The Basics

Every PBS script is just a shell script with special #PBS lines at the top. The scheduler reads those lines and decides where and how to run your job.

The Simplest Possible Script

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1

cd $PBS_O_WORKDIR
echo "Hello from the cluster!"

Three Things Every Script Needs

LineWhat it doesCan I skip it?
#!/bin/bashTells the system to use bash. Always the very first line.❌ No
#PBS -q defaultWhich queue to run in. If skipped, scheduler picks one.⚠️ Not recommended
#PBS -l nodes=1:ppn=1How many nodes and CPU cores your job needs.⚠️ Not recommended
cd $PBS_O_WORKDIRGoes to the folder where you ran qsub. Without this your script won't find your files.❌ Always include this

One Rule to Remember

All #PBS lines must come before any real command. Any directive placed after a command is silently ignored by the scheduler — no error, it just won't work.

#!/bin/bash
#PBS -N MyJob       ← ✅ this works
#PBS -q default     ← ✅ this works

cd $PBS_O_WORKDIR    ← first real command

#PBS -l nodes=1:ppn=4  ← ❌ too late, ignored!

Want the Full Reference?

Every possible PBS option is documented in the built-in manual. Run this on the login node:

man qsub

It lists every valid #PBS directive, all options, and environment variables available to your job.

Queues

A queue determines how long your job can run and how many cores you can use. Pick the right one — using a longer queue than you need wastes shared resources.

Available Queues on IISERM HPC

QueueMax WalltimeMax Cores per UserUse For
default8 hours200 coresQuick jobs, testing, short calculations
short72 hours (3 days)100 coresMedium length production jobs
long1080 hours (45 days)100 coresLong running simulations
infinity4380 hours (~6 months)50 coresVery long calculations — use sparingly

How to Set a Queue

#PBS -q default     # quick tests
#PBS -q short       # up to 3 days
#PBS -q long        # up to 45 days
#PBS -q infinity    # up to ~6 months

Tips

  • Always test with default first before submitting long jobs.
  • The default queue has the most cores available — good for parallel work.
  • Jobs that exceed their queue's walltime limit are killed automatically.
  • To check queue status live: qstat -q

Walltime

Walltime is the maximum real-world time your job is allowed to run. When it runs out, your job is killed — even if it's not finished.

How to Set It

#PBS -l walltime=HH:MM:SS

# Examples:
#PBS -l walltime=00:30:00    # 30 minutes
#PBS -l walltime=02:00:00    # 2 hours
#PBS -l walltime=24:00:00    # 1 day
#PBS -l walltime=72:00:00    # 3 days (short queue max)

Script with Walltime

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=02:00:00

cd $PBS_O_WORKDIR
echo "Start: $(date)"

# your work here

echo "End: $(date)"

Why Walltime Matters

SituationWhat happens
No walltime setScheduler uses the queue's default limit — you have no control
Walltime too shortJob gets killed before finishing — you lose all progress
Walltime too longJob waits longer in the queue — others are prioritised
Walltime just rightJob runs, finishes, output is saved ✅
💡 A good rule: estimate how long your job takes, then add 20–30% extra as buffer.

Output & Logs — Explained Simply

When your job runs, it produces two types of output:

TypeWhat it isExample
stdout (standard output) Normal output from your program — results, progress messages, print statements print("Hello"), echo "Done"
stderr (standard error) Warning messages, errors, or diagnostic info — things that went wrong or need attention FileNotFoundError, Warning: low memory

By default, PBS saves both to the folder where you ran qsub. You can control whether they go to one file or two.

Default Behaviour (no flags)

If you don't specify anything, PBS creates a file named <jobname>.o<jobid> containing both stdout and stderr merged together.

# You submit:
qsub myscript.sh
# → Job ID: 465118.iisermhpc1
# → Output file created: MyJob.o465118

Option 1 — One Merged File (recommended for beginners)

Use -j oe to merge stdout and stderr into a single file. Add -o to give it a clean, readable name.

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe              # join stdout + stderr
#PBS -o MyJob.log        # name the output file

cd $PBS_O_WORKDIR
echo "everything goes into MyJob.log"
✅ One file, easy to read. Both normal output and any errors appear together in chronological order.

Option 2 — Separate stdout and stderr (advanced)

Leave out -j oe and specify both -o and -e separately. Useful if you want to check errors without digging through all the output.

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -o MyJob_out.log   # stdout only
#PBS -e MyJob_err.log   # stderr only

cd $PBS_O_WORKDIR
echo "this goes to MyJob_out.log"
echo "this is an error" >&2   # goes to MyJob_err.log

Quick Summary

What you writeFiles created
NothingMyJob.o465118 — auto-named, merged
-j oe -o job.logjob.log — one clean file, your name
-o out.log -e err.logTwo files — stdout and stderr separated

Watching Logs in Real Time

While a job is running you can follow its output live:

tail -f MyJob.log

Press Ctrl + C to stop following.

CPUs — Requesting Cores

Requesting CPUs tells the scheduler how many cores to give your job. There are two valid syntaxes on IISERM HPC — both work, they mean the same thing.

Two Syntaxes, Same Meaning

# Old style — Torque syntax
#PBS -l nodes=1:ppn=4

# New style — PBS Pro syntax
#PBS -l select=1:ncpus=4
Old style ppnNew style ncpus
Full formProcessors Per NodeNumber of CPUs
Used withnodes=1:ppn=4select=1:ncpus=4
Works on IISERM?✅ Yes✅ Yes
MeaningExactly the same — CPUs per node
⚠️ Never mix the two styles in the same script. Pick one and stick with it.

Common CPU Requests

# Single core — default, simplest
#PBS -l nodes=1:ppn=1

# 4 cores on one node
#PBS -l nodes=1:ppn=4

# 16 cores on one node
#PBS -l nodes=1:ppn=16

# Same using new syntax
#PBS -l select=1:ncpus=16

How to Know How Many Cores to Request

Inside your running job you can check:

nproc                  # cores assigned to this job
cat $PBS_NODEFILE      # lists the node(s) assigned
VERY IMPORTANT: Only use ppn > 1 if your code actually uses parallel processing!

Most Python/R/Matlab scripts run on a single core by default. If you request ppn=16 but your code only uses 1 core, the other 15 cores sit idle — wasting shared resources and making your job wait longer in the queue.

Rule of thumb: If you're not sure whether your code uses parallel processing, always start with ppn=1. Only increase it if:
  • You're using libraries like mpi4py, multiprocessing, OpenMP, or numba.prallel
  • Your software documentation explicitly says it supports multi-core execution
  • You've tested and confirmed speedup with more cores
When in doubt, stick to ppn=1. You can always request more later once you know your code benefits from it.

Loading Software

Software on the cluster is not loaded by default. You use the module system to load what you need inside your job script.

See What's Available

On the login node, type module load then press Tab Tab. When prompted, press y:

module load     # press Tab Tab, then y

Display all 131 possibilities? (y or n) y

anaconda3                 codes/gromacs/2023        tools/root/6.28
codes/anaconda3/23.3.1    codes/orca/5.0.4          codes/python/3.11.4
codes/cp2k/8.1.0          codes/R/4.3.1             compilers/gnu/8.3.0
codes/gaussian16/G16      tools/deeptools/3.5.1     compilers/openmpi/4.1.1
codes/geant4/11.1         tools/espresso/7.0        libs/fftw/3.3.10
                          ... and more

Then load whatever you need by typing its exact name:

module load anaconda3
module load codes/python/3.11.4
module load codes/R/4.3.1
module load codes/gromacs/2023
module load tools/root/6.28

Other Module Commands

module list                    # what is currently loaded
module show anaconda3          # preview what a module does
module unload anaconda3        # unload one module
module purge                   # unload everything

Script with Module Load

#!/bin/bash
#PBS -N PythonJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o PythonJob.log

cd $PBS_O_WORKDIR

module load anaconda3

echo "Start: $(date)"
python -u script.py
echo "End  : $(date)"
💡 Always put module load after cd $PBS_O_WORKDIR and before your actual commands.

Using a Conda Environment

If your packages are inside a conda environment, activate it like this:

#!/bin/bash
#PBS -N CondaJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o CondaJob.log

cd $PBS_O_WORKDIR

source ~/.bashrc
conda activate myenv

echo "Start: $(date)"
python -u script.py
echo "End  : $(date)"
💡 source ~/.bashrc is required first so the conda command is available inside the PBS job environment.

Templates

Copy any of these, change the job name and your command, and submit.

1. Minimal — Just a Shell Command

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o MyJob.log

cd $PBS_O_WORKDIR
echo "Start: $(date)"

# your command here

echo "End: $(date)"

2. Python Script — with Module Load

#!/bin/bash
#PBS -N PythonJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -o PythonJob.log

cd $PBS_O_WORKDIR
module load anaconda3

echo "Start: $(date)"
python -u script.py
echo "End  : $(date)"

3. Python Inline — Code Inside the Script

#!/bin/bash
#PBS -N InlineJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -o InlineJob.log

cd $PBS_O_WORKDIR
module load anaconda3

echo "Start: $(date)"

python -u << 'PYEOF'
for i in range(1, 6):
    print(f"Count: {i}")
print("Done!")
PYEOF

echo "End: $(date)"
💡 The -u flag makes Python output appear immediately in the log without buffering.

4. Multi-Core — Parallel Job

#!/bin/bash
#PBS -N ParallelJob
#PBS -q short
#PBS -l nodes=1:ppn=8
#PBS -l walltime=12:00:00
#PBS -j oe
#PBS -o ParallelJob.log

cd $PBS_O_WORKDIR
module load anaconda3

echo "Cores: $(nproc)"
echo "Start: $(date)"
python -u parallel.py --cores 8
echo "End  : $(date)"

5. Separate stdout and stderr

#!/bin/bash
#PBS -N MyJob
#PBS -q default
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -o MyJob_out.log
#PBS -e MyJob_err.log

cd $PBS_O_WORKDIR
module load anaconda3

python -u script.py

Submitting & Monitoring

How to submit your script, check its status, and cancel it if needed.

Submit a Job

qsub script.sh
# → 465118.iisermhpc1   (this is your job ID)

Check Job Status

qstat                       # all jobs in the queue
qstat -u ms21080            # only your jobs
qstat -f 465118.iisermhpc1  # full details of one job

Example output from qstat -u ms21080:

Job ID                 Name        User       Time   S  Queue
---------------------  ----------  ---------  -----  -  -------
465118.iisermhpc1      count_job   ms21080    00:05  R  default

Status Codes

CodeMeaningWhat to do
QQueuedWaiting for a free node — just wait
RRunningJob is running — check log with tail -f
HHeldPaused — use qrls to release
EExitingFinishing up, output being written
CCompletedDone — check your log file

Cancel / Delete a Job

qdel 465118.iisermhpc1     # cancel a specific job
⚠️ Use the full job ID including .iisermhpc1 — just the number alone may not work.

Other Useful Commands

qhold 465118.iisermhpc1    # pause a queued job
qrls 465118.iisermhpc1     # release a held job
qstat -q                   # see all queues and their limits
pbsnodes -a               # see all compute nodes and status
tail -f MyJob.log          # watch log output live
watch -n 5 qstat -u ms21080  # refresh job list every 5 seconds

Useful PBS Variables Inside Your Script

VariableWhat it contains
$PBS_JOBIDYour job's full ID e.g. 465118.iisermhpc1
$PBS_JOBNAMEJob name you set with -N
$PBS_O_WORKDIRDirectory where you ran qsub
$PBS_NODEFILELists nodes assigned to your job
$PBS_QUEUEQueue the job is running in

Advanced Tips: Check Free Cores & Target Specific Nodes

On our cluster, some nodes may appear "up" but are actually down or overloaded. To save time and avoid jobs stuck in queue, you can add a helper function to your ~/.bashrc that shows free cores per node in any queue.

Step 1: Add the checkfree() Function to ~/.bashrc

Open your bash configuration file:

nano ~/.bashrc

Scroll to the bottom and paste this function:

~/.bashrc
# === HPC Helper: Check free cores in any queue === checkfree() { local queue=${1:-long} echo "=== Free cores in '$queue' queue ===" pbsnodes -a | awk -v q="$queue" ' /^[a-z]/ {node=$1} /queue =/ {if($3==q) inqueue=1} /pcpus/ {if(inqueue) total=$3} /resources_assigned.ncpus/ { if(inqueue){ used=$3; free=total-used; if(free>0) printf "%-8s : %3d free cores\n", node, free; sum+=free; if(free==total) fullfree++; inqueue=0 }} END { print "------------------------"; print "Total free cores =", sum; print "Fully free nodes =", fullfree+0; }' }

Save with Ctrl+OEnter, then exit with Ctrl+X.

Step 2: Reload Your ~/.bashrc

Apply the changes by sourcing the file:

source ~/.bashrc

Now the checkfree command is available in your terminal.

Step 3: Use checkfree to See Available Cores

Run checkfree followed by any queue name:

checkfree default
checkfree short
checkfree long
checkfree infinity
checkfree gpushort
checkfree gpulong

Example output:

login1
(base) [ms21080@login1 ~]$ checkfree default === Free cores in 'default' queue === gpc2 : 12 free cores gpc3 : 12 free cores gpc4 : 12 free cores gpc5 : 32 free cores gpc6 : 32 free cores gpc7 : 52 free cores gpc8 : 52 free cores gpc9 : 52 free cores gpc10 : 104 free cores gpc11 : 52 free cores gpc12 : 52 free cores gpc13 : 3 free cores ------------------------ Total free cores = 467 Fully free nodes = 6 (base) [ms21080@login1 ~]$

Bonus: Target a Specific Node

If you see a node with many free cores (e.g., gpc11 has 52 free), you can target it directly in your PBS script:

#PBS -l nodes=gpc11:ppn=4   # request 4 cores on node gpc11 only

This can help your job start faster if that node is lightly loaded.

⚠️ Use node targeting sparingly. Only do this if:
  • You've confirmed the node is healthy and has resources
  • Your job has specific hardware needs (e.g., GPU, large RAM)
  • You're debugging and need consistent node behavior
For most jobs, let the scheduler pick the node automatically.

Pro Workflow Summary

  1. Before submitting, run checkfree default (or your target queue)
  2. Look for nodes with high "free cores" count
  3. Optionally target one: #PBS -l nodes=gpc11:ppn=4
  4. Submit with qsub script.sh
  5. Verify it's running: qstat -u $USER → look for status R
  6. Watch output: tail -f MyJob.log

This simple habit saves hours of waiting for jobs stuck on "dead" nodes.