HPC Quick Start Guide - IISER Mohali

📋 Table of Contents

1. Introduction
2. HPC Basics
3. Resource Ethics & Responsibilities
4. Job Queues
5. Writing PBS Job Scripts
6. Submitting Jobs
7. Monitoring Jobs
8. Checking Available Resources
9. Complete Examples
10. Troubleshooting
11. Quick Reference Commands

1. Introduction

High-Performance Computing (HPC) enables researchers to solve complex computational problems that would be impractical on regular computers. The IISER Mohali HPC cluster provides shared computing resources for the research community.

📌 Key Points

The cluster uses PBS (Portable Batch System) as the job scheduler
Jobs are submitted from login nodes and run on compute nodes
Resources are shared among all users — use them responsibly
Never run heavy computations directly on login nodes

2. HPC Basics

What is a Node?

A node is an individual computer within the cluster. Think of each node as a separate, powerful workstation. The cluster consists of multiple nodes connected via a high-speed network.

┌─────────────────────────────────────────────────────────────────┐
│                         HPC CLUSTER                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌──────────┐    ┌──────────┐         ┌──────────┐             │
│   │  Login   │    │  Login   │         │  Master  │             │
│   │  Node 1  │    │  Node 2  │         │  Nodes   │             │
│   └────┬─────┘    └────┬─────┘         └────┬─────┘             │
│        │               │                    │                   │
│        └───────────────┴────────────────────┘                   │
│                        │                                        │
│            ┌───────────┴───────────┐                            │
│            │    High-Speed Network │                            │
│            └───────────┬───────────┘                            │
│                        │                                        │
│   ┌────────────────────┼────────────────────┐                   │
│   │                    │                    │                   │
│   ▼                    ▼                    ▼                   │
│ ┌──────────┐      ┌──────────┐        ┌──────────┐              │
│ │ Compute  │      │ Compute  │  ...   │ Compute  │              │
│ │ Node     │      │ Node     │        │ Node     │              │
│ │ (gpc1)   │      │ (gpc2)   │        │ (gpc32)  │              │
│ │          │      │          │        │          │              │
│ │ 52 cores │      │ 52 cores │        │ 52 cores │              │
│ └──────────┘      └──────────┘        └──────────┘              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Node Type	Names	Purpose
Login Nodes	login1, login2	Where you login, edit files, submit jobs
CPU Compute Nodes	gpc1-gpc32, bmc1-bmc7	Where your CPU jobs run
GPU Compute Nodes	gpu1-gpu3	For GPU-accelerated computations

What is a Core?

A core (also called CPU core or processor) is an individual processing unit within a node. Each node contains multiple cores that can work independently or together.

🔢 Understanding Cores

Single-threaded program: Uses only 1 core
Multi-threaded program: Can use multiple cores on the same node
MPI program: Can use cores across multiple nodes

Example: If a node has 52 cores and you request ppn=4, your job will use 4 cores, leaving 48 cores available for other users.

💀 CRITICAL: Use Only What You Need!

If your program is NOT parallelized (single-threaded), you MUST use:

#PBS -l nodes=1:ppn=1

Requesting more cores than your program can use is RESOURCE THEFT from other researchers!

Cluster Architecture Summary

Component	Specification
Total Compute Nodes	~39 nodes (gpc1-gpc32, bmc1-bmc7, gpu1-gpu3)
Cores per Node (typical)	52 cores (2 × Intel Xeon Gold 6230R)
Total CPU Cores	~1872 cores
Memory per Node	~384 GB (CPU nodes)
GPU Nodes	4 × Tesla T4 16GB PCIe per GPU node

3. Resource Ethics & Responsibilities

☠️ STOP! READ THIS BEFORE USING HPC

The HPC cluster is a SHARED RESOURCE funded by the institute for the entire research community. Wasting resources directly impacts your colleagues' research and is considered a SERIOUS VIOLATION of usage policy.

Misusing resources may result in account suspension!

❌ What NOT to Do — RESOURCE CRIMES

Violation	Problem	Correct Approach
Requesting more cores than your program uses	Idle cores are blocked from other users	Request only what you need
Running single-threaded code with `ppn=40`	39 cores sit completely idle while blocked!	Use `ppn=1` for serial jobs
Requesting entire nodes unnecessarily	Blocks all 52 cores on that node	Request specific core count needed
Setting excessive walltime	Resources reserved but unused	Estimate realistic runtime + small buffer
Running jobs on login nodes	Slows down system for everyone	Always use job submission (qsub)
Not cleaning up scratch space	Fills shared storage	Delete temporary files after job completes

🚨 THE GOLDEN RULE

If your code is NOT parallelized → Use nodes=1:ppn=1

Most Python scripts, serial Fortran/C programs, and simple simulations are NOT parallel!

✅ Best Practices

Responsible HPC Usage

Know your code: Understand if it's serial, multithreaded, or MPI-parallel
Test first: Run short tests to determine actual resource needs
Request accurately: Only request cores your code can actually utilize
Monitor jobs: Check if resources are being used efficiently
Clean up: Remove temporary files and data you no longer need
Be considerate: During high-demand periods, limit resource requests

How to Check if Your Code Uses Multiple Cores

Before requesting multiple cores, TEST your program to see actual CPU usage:

# Step 1: Get an interactive session
qsub -I -l nodes=1:ppn=4 -q short

# Step 2: Once on the compute node, run your program in background
cd /path/to/your/project
./your_program &

# Step 3: Check CPU usage with 'top'
top -u $USER

# Step 4: Look at %CPU column:
#   ~100%  = using 1 core   → use ppn=1
#   ~200%  = using 2 cores  → use ppn=2
#   ~400%  = using 4 cores  → use ppn=4
#   ~5200% = using all 52   → use ppn=52

# Step 5: If %CPU is ~100%, YOUR CODE IS NOT PARALLEL!
# Use ppn=1 only!

⚠️ Common Misconception

Requesting more cores does NOT make your program faster!

If your program is not written to use multiple cores (parallel programming), it will only use 1 core regardless of how many you request. The other cores will sit idle while being blocked from other users.

4. Job Queues

Jobs are submitted to queues based on their expected runtime and resource requirements. Each queue has different time limits, core limits, and priorities.

CPU Queues

Queue Name	Max Walltime	Max Cores/User	Typical Use Case
`default`	8 hours	200 cores	Quick jobs, testing, short calculations
`short`	72 hours (3 days)	100 cores	Medium-length production jobs
`long`	1080 hours (45 days)	100 cores	Long-running simulations
`infinity`	4380 hours (~6 months)	50 cores	Extended calculations (use sparingly!)

GPU Queues

Queue Name	Description
`gpushort`	Short GPU jobs
`gpulong`	Long GPU jobs

Check Queue Details

# View detailed information about all queues
qstat -Qf

# Example output (partial):
Queue: long
    queue_type = Execution
    total_jobs = 48
    state_count = Transit:0 Queued:1 Held:12 Waiting:0 Running:35 Exiting:0
    resources_max.ncpus = 416
    resources_max.walltime = 1080:00:00
    max_user_run = 100
    max_user_res.ncpus = 100
    enabled = True
    started = True

💡 Queue Selection Tips

Always choose the shortest queue that fits your job
Shorter queues often have higher priority and start faster
If unsure, start with default queue for testing
Use infinity only when absolutely necessary
Each queue has different nodes assigned to it

5. Writing PBS Job Scripts

Basic Structure

A PBS job script is a shell script with special #PBS directives that tell the scheduler what resources you need.

#!/bin/bash

#===============================================
# PBS DIRECTIVES - Resource requests
#===============================================
#PBS -N my_job_name          # Job name (appears in qstat)
#PBS -l nodes=1:ppn=1        # 1 node, 1 core (for serial jobs!)
#PBS -l walltime=08:00:00    # Maximum runtime: 8 hours
#PBS -q default              # Queue name
#PBS -o output.log           # Standard output file
#PBS -e error.log            # Standard error file

#===============================================
# CHANGE TO WORKING DIRECTORY
#===============================================
cd $PBS_O_WORKDIR

#===============================================
# LOAD REQUIRED MODULES
#===============================================
module load anaconda3

#===============================================
# RUN YOUR PROGRAM
#===============================================
python my_script.py

⚠️ IMPORTANT: Default to ppn=1

Unless you KNOW your code is parallelized, always use:

#PBS -l nodes=1:ppn=1

PBS Directives Explained

Directive	Description	Example
`#PBS -N`	Job name (max 15 characters recommended)	`#PBS -N simulation01`
`#PBS -l nodes=X:ppn=Y`	Request X nodes with Y processors per node	`#PBS -l nodes=1:ppn=1`
`#PBS -l walltime=`	Maximum job runtime (HH:MM:SS)	`#PBS -l walltime=08:00:00`
`#PBS -q`	Queue selection	`#PBS -q long`
`#PBS -o`	Output file path	`#PBS -o logs/output.log`
`#PBS -e`	Error file path	`#PBS -e logs/error.log`
`#PBS -j oe`	Join output and error into one file	`#PBS -j oe`
`#PBS -M`	Email address for notifications	`#PBS -M user@iisermohali.ac.in`
`#PBS -m`	When to send email (a=abort, b=begin, e=end)	`#PBS -m abe`

Setting Walltime

Walltime is the maximum time your job is allowed to run. Format: HH:MM:SS

# Examples of walltime settings

#PBS -l walltime=01:00:00      # 1 hour
#PBS -l walltime=08:00:00      # 8 hours (max for 'default' queue)
#PBS -l walltime=72:00:00      # 72 hours / 3 days (max for 'short' queue)
#PBS -l walltime=168:00:00     # 168 hours (7 days)
#PBS -l walltime=720:00:00     # 720 hours (30 days)
#PBS -l walltime=1080:00:00    # 1080 hours / 45 days (max for 'long' queue)

⚠️ Walltime Warning

If your job exceeds the walltime, it will be killed immediately without saving any progress. Always add a buffer to your estimated runtime, and implement checkpointing for long jobs.

Log Files Configuration

# Method 1: Separate output and error files
#PBS -o output.log
#PBS -e error.log

# Method 2: Combined output and error
#PBS -o combined.log
#PBS -j oe

# Method 3: With full path (recommended)
#PBS -o /persistent/data1/username/project/logs/job_output.log
#PBS -e /persistent/data1/username/project/logs/job_error.log

# Method 4: Include job ID in filename (in your script)
exec > "$PBS_O_WORKDIR/output_$PBS_JOBID.log" 2>&1

Targeting Specific Nodes

⚠️ Important: Avoid Targeting Specific Nodes Unless Necessary

Targeting specific nodes is generally NOT recommended because:

If that node is down or has issues, your job will fail repeatedly
Your job may get stuck in queue waiting for that specific node
Some nodes may appear "free" but have underlying problems

Let the scheduler choose a node automatically unless you have a specific reason (e.g., GPU nodes, specific software installed on certain nodes).

# RECOMMENDED: Let scheduler choose automatically
#PBS -l nodes=1:ppn=1

# Target a specific node (use only if necessary)
#PBS -l nodes=gpc30:ppn=4

# Request multiple specific nodes
#PBS -l nodes=gpc30:ppn=4+gpc31:ppn=4

# Request GPU nodes
#PBS -l nodes=gpu1:ppn=4

🚨 Real Example: Node Issues

Sometimes a node may show as "free" but still have problems. For example, gpc25 in the long queue has been known to cause job failures even when pbsnodes shows it as available.

If your job keeps failing on a specific node:

Check available resources to find working nodes
Either let the scheduler choose, or target a different node
Report problematic nodes to HPC admin

6. Submitting Jobs

Basic Job Submission

# Submit a job script
qsub job.sh

# Output example:
430105.iisermhpc1

Interactive Jobs

Interactive jobs give you a shell on a compute node for testing and debugging.

# Basic interactive job (1 core)
qsub -I -l nodes=1:ppn=1 -q default

# Interactive job with specific walltime
qsub -I -l nodes=1:ppn=4 -l walltime=02:00:00 -q short

# Interactive job on specific node (if needed)
qsub -I -l nodes=gpc30:ppn=1 -q long

Deleting Jobs

# Delete a specific job
qdel 430105.iisermhpc1

# Delete multiple jobs
qdel 430105.iisermhpc1 430106.iisermhpc1

# Delete all your jobs (be careful!)
qdel $(qstat -u $USER | grep $USER | awk '{print $1}')

7. Monitoring Jobs

Check Your Jobs

# View all your jobs
qstat -u $USER

# Example:
qstat -u ms21080

# Sample output:
Job ID            Name       User      Time  S  Queue
----------------  ---------  --------  ----  -  ------
430035.iisermhpc1 simulation ms21080  03:52  R  long
430036.iisermhpc1 test_job   ms21080  02:45  R  default
430045.iisermhpc1 analysis   ms21080  00:00  Q  short
430103.iisermhpc1 e04        ms21080  00:00  H  long

# Status codes:
# R = Running
# Q = Queued (waiting to start)
# H = Held (job has been held, check why)
# E = Exiting
# C = Completed

Detailed Job Information

# Get detailed information about a specific job
qstat -f 430035.iisermhpc1

# Example output:
Job Id: 430104.iisermhpc1
    Job_Name = e04
    Job_Owner = ms21080@login1
    job_state = H
    queue = long
    server = iisermhpc1
    ...
    Resource_List.nodes = gpc25:ppn=2
    Resource_List.walltime = 1080:00:00
    ...
    comment = job held, too many failed attempts to run
    run_count = 21
    Exit_status = -3
    ...

# Key fields to look for:
# - job_state: Current status (R, Q, H, etc.)
# - exec_host: Which node(s) the job is running on
# - resources_used: CPU time, memory, walltime used
# - Resource_List: What was requested
# - comment: Error messages or hold reasons
# - Exit_status: Exit code (0 = success, non-zero = error)
# - run_count: How many times job tried to run

View All Jobs in Queue

# View all jobs in the cluster
qstat

# View all queues with their status
qstat -Q

# View detailed queue information
qstat -Qf

8. Checking Available Resources

Before submitting jobs (especially to specific nodes), you should check what resources are available. This helps you choose the right queue and avoid problematic nodes.

Check Specific Node Status

# View status of a specific node
pbsnodes gpc25

# Example output:
gpc25
     Mom = gpc25
     ntype = PBS
     state = free
     pcpus = 52
     resources_available.arch = linux
     resources_available.host = gpc25
     resources_available.mem = 394860832kb
     resources_available.ncpus = 52
     resources_available.vnode = gpc25
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.hbmem = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.vmem = 0kb
     queue = long
     resv_enable = True
     sharing = default_shared
     last_state_change_time = Thu Jan 29 00:43:49 2026
     last_used_time = Fri Jan  9 12:07:53 2026

# Key fields:
# state = free          → Node is available
# state = job-exclusive → All cores in use
# state = offline       → Node is not available
# state = down          → Node has problems
# pcpus = 52            → Total cores on node
# resources_assigned.ncpus = 0  → Currently used cores
# queue = long          → Which queue this node belongs to

⚠️ Warning: "free" Doesn't Always Mean Working!

A node may show state = free but still have issues. For example, network problems or filesystem mount issues can cause jobs to fail even on "free" nodes. If your job keeps failing on a specific node, try a different one!

Check All Nodes

# View status of all nodes
pbsnodes -a

Check Free Cores in Each Queue

Use this script to find which nodes have free cores in a specific queue. This is very useful before submitting jobs!

Script for Checking Free Cores (Long Queue Example):

pbsnodes -a | awk '
/^[a-z]/ {node=$1}
/queue = long/ {inqueue=1}
/pcpus/ {if(inqueue) total=$3}
/resources_assigned.ncpus/ {
  if(inqueue){
    used=$3;
    free=total-used;
    printf "%-8s : %3d free cores\n", node, free;
    sum+=free;
    if(free==total) fullfree++;
    inqueue=0
  }}
END {
  print "------------------------";
  print "Total free cores  =", sum;
  print "Fully free nodes  =", fullfree;
}'

Example Output:

gpc27    :  46 free cores
gpc28    :  20 free cores
gpc29    :  12 free cores
gpc30    :  49 free cores
gpc31    :  52 free cores
gpc32    :  20 free cores
gpc25    :  52 free cores
------------------------
Total free cores  = 251
Fully free nodes  = 2

✅ How to Use This Information

Once you have this output, you can:

See total available cores: 251 cores available in long queue
Find fully free nodes: 2 nodes have all 52 cores free (gpc31, gpc25)
Choose a working node: If gpc25 is not working, use gpc31 or others
Target specific node: #PBS -l nodes=gpc31:ppn=4

For Other Queues (Change the queue name):

# For 'short' queue - change "long" to "short":
pbsnodes -a | awk '
/^[a-z]/ {node=$1}
/queue = short/ {inqueue=1}
/pcpus/ {if(inqueue) total=$3}
/resources_assigned.ncpus/ {
  if(inqueue){
    used=$3;
    free=total-used;
    printf "%-8s : %3d free cores\n", node, free;
    sum+=free;
    if(free==total) fullfree++;
    inqueue=0
  }}
END {
  print "------------------------";
  print "Total free cores  =", sum;
  print "Fully free nodes  =", fullfree;
}'

# For 'default' queue - change to: /queue = default/
# For 'infinity' queue - change to: /queue = infinity/

Add This Function to Your ~/.bashrc (Recommended)

# Add this to your ~/.bashrc file for easy access
checkfree() {
    local queue=${1:-long}
    echo "=== Free cores in '$queue' queue ==="
    pbsnodes -a | awk -v q="$queue" '
    /^[a-z]/ {node=$1}
    /queue =/ {if($3==q) inqueue=1}
    /pcpus/ {if(inqueue) total=$3}
    /resources_assigned.ncpus/ {
      if(inqueue){
        used=$3;
        free=total-used;
        if(free>0) printf "%-8s : %3d free cores\n", node, free;
        sum+=free;
        if(free==total) fullfree++;
        inqueue=0
      }}
    END {
      print "------------------------";
      print "Total free cores  =", sum;
      print "Fully free nodes  =", fullfree+0;
    }'
}

# After adding to ~/.bashrc, reload it:
source ~/.bashrc

# Now you can use:
checkfree long
checkfree short
checkfree default
checkfree infinity
checkfree gpushort
checkfree gpulong

🚨 Important: After Checking Free Cores

Once you identify free nodes, if you know a specific node is NOT working (like gpc25 in the example), target a different node or let the scheduler choose automatically:

# Option 1: Let scheduler choose (SAFEST)
#PBS -l nodes=1:ppn=1

# Option 2: Target a known working node
#PBS -l nodes=gpc31:ppn=1

Remember: Use ppn=1 unless your code is actually parallelized!

9. Complete Examples

Example 1: Simple Python Job (Serial - Single Core)

#!/bin/bash
#PBS -N python_analysis
#PBS -l nodes=1:ppn=1           # ONLY 1 CORE for serial job!
#PBS -l walltime=04:00:00
#PBS -q default
#PBS -o python_out.log
#PBS -e python_err.log

cd $PBS_O_WORKDIR

module load anaconda3

echo "Job started at: $(date)"
echo "Running on node: $(hostname)"

python analysis.py

echo "Job finished at: $(date)"

Example 2: Multi-threaded Job (OpenMP)

#!/bin/bash
#PBS -N openmp_sim
#PBS -l nodes=1:ppn=16          # 16 cores on 1 node
#PBS -l walltime=48:00:00
#PBS -q short
#PBS -o omp_output.log
#PBS -e omp_error.log

cd $PBS_O_WORKDIR

# Set OpenMP threads to match requested cores
export OMP_NUM_THREADS=16

echo "Using $OMP_NUM_THREADS threads"
./my_openmp_program

Example 3: MPI Job (Multiple Nodes)

#!/bin/bash
#PBS -N mpi_simulation
#PBS -l nodes=4:ppn=20          # 4 nodes, 20 cores each = 80 total
#PBS -l walltime=72:00:00
#PBS -q short
#PBS -o mpi_output.log
#PBS -e mpi_error.log

cd $PBS_O_WORKDIR

module load openmpi-4.1.0

# Calculate total processes
NPROCS=$(wc -l < $PBS_NODEFILE)
echo "Running on $NPROCS processors"

mpirun -np $NPROCS ./my_mpi_program

Example 4: Geant4 Simulation

#!/bin/bash
#PBS -N geant4_sim
#PBS -l nodes=1:ppn=2
#PBS -l walltime=168:00:00
#PBS -q long
#PBS -o geant4_out.log
#PBS -e geant4_err.log

cd $PBS_O_WORKDIR

# Load Geant4
module load codes/geant4/11.1

# Set Geant4 data paths
export G4ENSDFSTATEDATA=/gscratch/apps/root/geant4/install/share/Geant4/data/G4ENSDFSTATE2.3
export G4LEVELGAMMADATA=/gscratch/apps/root/geant4/install/share/Geant4/data/PhotonEvaporation5.7
export G4LEDATA=/gscratch/apps/root/geant4/install/share/Geant4/data/G4EMLOW8.2
export G4PARTICLEXSDATA=/gscratch/apps/root/geant4/install/share/Geant4/data/G4PARTICLEXS4.0

# Record timing
START=$(date +%s)
echo "Started at: $(date)"

./sim run.mac

END=$(date +%s)
echo "Finished at: $(date)"
echo "Duration: $((END-START)) seconds"

Example 5: Targeting a Specific Working Node

#!/bin/bash
# Use this when you've checked available nodes and want to avoid a problematic one
# First run: checkfree long (see Section 8)
# Then choose a working node from the output

#PBS -N my_simulation
#PBS -l nodes=gpc31:ppn=1       # Targeting gpc31 specifically
#PBS -l walltime=720:00:00
#PBS -q long
#PBS -o output.log
#PBS -e error.log

cd $PBS_O_WORKDIR

echo "Running on node: $(hostname)"
./my_program

10. Troubleshooting

Job Stuck in Queue (Q status)

Possible Cause	Solution
Requested resources not available	Use the free cores checking script to find available resources, then reduce core/node request
Targeting a busy node	Remove specific node requirement, let scheduler choose
Queue is full	Wait, or try a different queue
Exceeded user limits	Check `qstat -Qf` for max_user_res.ncpus limits

Job Held (H status)

# Check why job is held
qstat -f JOB_ID | grep -E "(comment|run_count|Exit_status)"

# Example output:
comment = job held, too many failed attempts to run
run_count = 21
Exit_status = -3

# This means the node has issues! Steps to fix:
# 1. Delete the held job
qdel JOB_ID

# 2. Check which nodes are free (see Section 8)
# 3. Either let scheduler choose or pick a different node
# 4. Resubmit
qsub job.sh

Job Failed (Exit_status ≠ 0)

# Check exit status
qstat -f JOB_ID | grep Exit_status

# Common exit codes:
# 0    = Success
# 1    = General error in your program
# -3   = Job couldn't start (node/environment issue)
# 137  = Killed (memory limit exceeded or SIGKILL)
# 265  = Walltime exceeded

# Check error log
cat error.log

Node Shows "Free" But Job Keeps Failing

# Check the node status
pbsnodes gpc25

# Even if state = free, the node might have issues!
# Check last_used_time - if it's old, node might be problematic

# Solution: Use a different node or let scheduler choose
#PBS -l nodes=1:ppn=1           # Let scheduler choose
# OR
#PBS -l nodes=gpc30:ppn=1       # Target known working node

💡 Pro Tip: Test Before Long Runs

Before submitting a long job, always test with a short run first:

Submit to default queue with short walltime
Check if it starts and runs correctly
Then submit the full job to long or infinity

11. Quick Reference Commands

Job Submission & Control

Command	Description
`qsub job.sh`	Submit a job script
`qsub -I -l nodes=1:ppn=1 -q default`	Start interactive session
`qdel JOB_ID`	Delete/cancel a job
`qhold JOB_ID`	Hold a queued job
`qrls JOB_ID`	Release a held job

Job Monitoring

Command	Description
`qstat`	Show all jobs in queue
`qstat -u $USER`	Show only your jobs
`qstat -f JOB_ID`	Detailed job information
`qstat -Q`	Show queue summary
`qstat -Qf`	Detailed queue information

Node & Resource Information

Command	Description
`pbsnodes -a`	Show all nodes status
`pbsnodes NODE_NAME`	Show specific node status
`checkfree long`	Check free cores in long queue (after adding function to .bashrc)

Module Management

Command	Description
`module avail`	List available modules
`module load NAME`	Load a module
`module unload NAME`	Unload a module
`module list`	Show loaded modules
`module purge`	Unload all modules

PBS Environment Variables

Variable	Description
`$PBS_O_WORKDIR`	Directory where qsub was executed
`$PBS_JOBID`	Unique job identifier
`$PBS_NODEFILE`	File containing list of assigned nodes
`$PBS_JOBNAME`	Name of the job
`$PBS_QUEUE`	Queue the job is running in

Final Tips

✅ Checklist Before Submitting

Is my executable compiled and working?
Are all input files in place?
Is my code parallel? If NOT, use ppn=1!
Did I request the correct number of cores for my program?
Is my walltime estimate realistic (with buffer)?
Did I choose the appropriate queue?
Are my log file paths correct?
Did I include cd $PBS_O_WORKDIR in my script?
Did I check that the target node (if specified) is working?

💀 Remember: Don't Waste Resources!

Use ppn=1 for serial (non-parallel) jobs
Only request cores your program actually uses
Choose the shortest queue that fits your job
Clean up temporary files after jobs complete
Be considerate of other researchers

📧 Need Help?

For technical issues, contact the HPC support team at: helpdesk-hpc@iisermohali.ac.in

For community discussions: hpc-community@iisermohali.ac.in