🖥️ HPC Quick Start Guide

IISER Mohali High Performance Computing Cluster

1. Introduction

High-Performance Computing (HPC) enables researchers to solve complex computational problems that would be impractical on regular computers. The IISER Mohali HPC cluster provides shared computing resources for the research community.

📌 Key Points

  • The cluster uses PBS (Portable Batch System) as the job scheduler
  • Jobs are submitted from login nodes and run on compute nodes
  • Resources are shared among all users — use them responsibly
  • Never run heavy computations directly on login nodes

2. HPC Basics

What is a Node?

A node is an individual computer within the cluster. Think of each node as a separate, powerful workstation. The cluster consists of multiple nodes connected via a high-speed network.

┌─────────────────────────────────────────────────────────────────┐
│                         HPC CLUSTER                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌──────────┐    ┌──────────┐         ┌──────────┐             │
│   │  Login   │    │  Login   │         │  Master  │             │
│   │  Node 1  │    │  Node 2  │         │  Nodes   │             │
│   └────┬─────┘    └────┬─────┘         └────┬─────┘             │
│        │               │                    │                   │
│        └───────────────┴────────────────────┘                   │
│                        │                                        │
│            ┌───────────┴───────────┐                            │
│            │    High-Speed Network │                            │
│            └───────────┬───────────┘                            │
│                        │                                        │
│   ┌────────────────────┼────────────────────┐                   │
│   │                    │                    │                   │
│   ▼                    ▼                    ▼                   │
│ ┌──────────┐      ┌──────────┐        ┌──────────┐              │
│ │ Compute  │      │ Compute  │  ...   │ Compute  │              │
│ │ Node     │      │ Node     │        │ Node     │              │
│ │ (gpc1)   │      │ (gpc2)   │        │ (gpc32)  │              │
│ │          │      │          │        │          │              │
│ │ 52 cores │      │ 52 cores │        │ 52 cores │              │
│ └──────────┘      └──────────┘        └──────────┘              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                
Node Type Names Purpose
Login Nodes login1, login2 Where you login, edit files, submit jobs
CPU Compute Nodes gpc1-gpc32, bmc1-bmc7 Where your CPU jobs run
GPU Compute Nodes gpu1-gpu3 For GPU-accelerated computations

What is a Core?

A core (also called CPU core or processor) is an individual processing unit within a node. Each node contains multiple cores that can work independently or together.

🔢 Understanding Cores

  • Single-threaded program: Uses only 1 core
  • Multi-threaded program: Can use multiple cores on the same node
  • MPI program: Can use cores across multiple nodes

Example: If a node has 52 cores and you request ppn=4, your job will use 4 cores, leaving 48 cores available for other users.

💀 CRITICAL: Use Only What You Need!

If your program is NOT parallelized (single-threaded), you MUST use:

#PBS -l nodes=1:ppn=1

Requesting more cores than your program can use is RESOURCE THEFT from other researchers!

Cluster Architecture Summary

Component Specification
Total Compute Nodes ~39 nodes (gpc1-gpc32, bmc1-bmc7, gpu1-gpu3)
Cores per Node (typical) 52 cores (2 × Intel Xeon Gold 6230R)
Total CPU Cores ~1872 cores
Memory per Node ~384 GB (CPU nodes)
GPU Nodes 4 × Tesla T4 16GB PCIe per GPU node

3. Resource Ethics & Responsibilities

☠️ STOP! READ THIS BEFORE USING HPC

The HPC cluster is a SHARED RESOURCE funded by the institute for the entire research community. Wasting resources directly impacts your colleagues' research and is considered a SERIOUS VIOLATION of usage policy.

Misusing resources may result in account suspension!

❌ What NOT to Do — RESOURCE CRIMES

Violation Problem Correct Approach
Requesting more cores than your program uses Idle cores are blocked from other users Request only what you need
Running single-threaded code with ppn=40 39 cores sit completely idle while blocked! Use ppn=1 for serial jobs
Requesting entire nodes unnecessarily Blocks all 52 cores on that node Request specific core count needed
Setting excessive walltime Resources reserved but unused Estimate realistic runtime + small buffer
Running jobs on login nodes Slows down system for everyone Always use job submission (qsub)
Not cleaning up scratch space Fills shared storage Delete temporary files after job completes

🚨 THE GOLDEN RULE

If your code is NOT parallelized → Use nodes=1:ppn=1

Most Python scripts, serial Fortran/C programs, and simple simulations are NOT parallel!

✅ Best Practices

Responsible HPC Usage

  1. Know your code: Understand if it's serial, multithreaded, or MPI-parallel
  2. Test first: Run short tests to determine actual resource needs
  3. Request accurately: Only request cores your code can actually utilize
  4. Monitor jobs: Check if resources are being used efficiently
  5. Clean up: Remove temporary files and data you no longer need
  6. Be considerate: During high-demand periods, limit resource requests

How to Check if Your Code Uses Multiple Cores

Before requesting multiple cores, TEST your program to see actual CPU usage:

# Step 1: Get an interactive session
qsub -I -l nodes=1:ppn=4 -q short

# Step 2: Once on the compute node, run your program in background
cd /path/to/your/project
./your_program &

# Step 3: Check CPU usage with 'top'
top -u $USER

# Step 4: Look at %CPU column:
#   ~100%  = using 1 core   → use ppn=1
#   ~200%  = using 2 cores  → use ppn=2
#   ~400%  = using 4 cores  → use ppn=4
#   ~5200% = using all 52   → use ppn=52

# Step 5: If %CPU is ~100%, YOUR CODE IS NOT PARALLEL!
# Use ppn=1 only!

⚠️ Common Misconception

Requesting more cores does NOT make your program faster!

If your program is not written to use multiple cores (parallel programming), it will only use 1 core regardless of how many you request. The other cores will sit idle while being blocked from other users.

4. Job Queues

Jobs are submitted to queues based on their expected runtime and resource requirements. Each queue has different time limits, core limits, and priorities.

CPU Queues

Queue Name Max Walltime Max Cores/User Typical Use Case
default 8 hours 200 cores Quick jobs, testing, short calculations
short 72 hours (3 days) 100 cores Medium-length production jobs
long 1080 hours (45 days) 100 cores Long-running simulations
infinity 4380 hours (~6 months) 50 cores Extended calculations (use sparingly!)

GPU Queues

Queue Name Description
gpushort Short GPU jobs
gpulong Long GPU jobs

Check Queue Details

# View detailed information about all queues
qstat -Qf

# Example output (partial):
Queue: long
    queue_type = Execution
    total_jobs = 48
    state_count = Transit:0 Queued:1 Held:12 Waiting:0 Running:35 Exiting:0
    resources_max.ncpus = 416
    resources_max.walltime = 1080:00:00
    max_user_run = 100
    max_user_res.ncpus = 100
    enabled = True
    started = True

💡 Queue Selection Tips

  • Always choose the shortest queue that fits your job
  • Shorter queues often have higher priority and start faster
  • If unsure, start with default queue for testing
  • Use infinity only when absolutely necessary
  • Each queue has different nodes assigned to it

5. Writing PBS Job Scripts

Basic Structure

A PBS job script is a shell script with special #PBS directives that tell the scheduler what resources you need.

#!/bin/bash

#===============================================
# PBS DIRECTIVES - Resource requests
#===============================================
#PBS -N my_job_name          # Job name (appears in qstat)
#PBS -l nodes=1:ppn=1        # 1 node, 1 core (for serial jobs!)
#PBS -l walltime=08:00:00    # Maximum runtime: 8 hours
#PBS -q default              # Queue name
#PBS -o output.log           # Standard output file
#PBS -e error.log            # Standard error file

#===============================================
# CHANGE TO WORKING DIRECTORY
#===============================================
cd $PBS_O_WORKDIR

#===============================================
# LOAD REQUIRED MODULES
#===============================================
module load anaconda3

#===============================================
# RUN YOUR PROGRAM
#===============================================
python my_script.py

⚠️ IMPORTANT: Default to ppn=1

Unless you KNOW your code is parallelized, always use:

#PBS -l nodes=1:ppn=1

PBS Directives Explained

Directive Description Example
#PBS -N Job name (max 15 characters recommended) #PBS -N simulation01
#PBS -l nodes=X:ppn=Y Request X nodes with Y processors per node #PBS -l nodes=1:ppn=1
#PBS -l walltime= Maximum job runtime (HH:MM:SS) #PBS -l walltime=08:00:00
#PBS -q Queue selection #PBS -q long
#PBS -o Output file path #PBS -o logs/output.log
#PBS -e Error file path #PBS -e logs/error.log
#PBS -j oe Join output and error into one file #PBS -j oe
#PBS -M Email address for notifications #PBS -M user@iisermohali.ac.in
#PBS -m When to send email (a=abort, b=begin, e=end) #PBS -m abe

Setting Walltime

Walltime is the maximum time your job is allowed to run. Format: HH:MM:SS

# Examples of walltime settings

#PBS -l walltime=01:00:00      # 1 hour
#PBS -l walltime=08:00:00      # 8 hours (max for 'default' queue)
#PBS -l walltime=72:00:00      # 72 hours / 3 days (max for 'short' queue)
#PBS -l walltime=168:00:00     # 168 hours (7 days)
#PBS -l walltime=720:00:00     # 720 hours (30 days)
#PBS -l walltime=1080:00:00    # 1080 hours / 45 days (max for 'long' queue)

⚠️ Walltime Warning

If your job exceeds the walltime, it will be killed immediately without saving any progress. Always add a buffer to your estimated runtime, and implement checkpointing for long jobs.

Log Files Configuration

# Method 1: Separate output and error files
#PBS -o output.log
#PBS -e error.log

# Method 2: Combined output and error
#PBS -o combined.log
#PBS -j oe

# Method 3: With full path (recommended)
#PBS -o /persistent/data1/username/project/logs/job_output.log
#PBS -e /persistent/data1/username/project/logs/job_error.log

# Method 4: Include job ID in filename (in your script)
exec > "$PBS_O_WORKDIR/output_$PBS_JOBID.log" 2>&1

Targeting Specific Nodes

⚠️ Important: Avoid Targeting Specific Nodes Unless Necessary

Targeting specific nodes is generally NOT recommended because:

  • If that node is down or has issues, your job will fail repeatedly
  • Your job may get stuck in queue waiting for that specific node
  • Some nodes may appear "free" but have underlying problems

Let the scheduler choose a node automatically unless you have a specific reason (e.g., GPU nodes, specific software installed on certain nodes).

# RECOMMENDED: Let scheduler choose automatically
#PBS -l nodes=1:ppn=1

# Target a specific node (use only if necessary)
#PBS -l nodes=gpc30:ppn=4

# Request multiple specific nodes
#PBS -l nodes=gpc30:ppn=4+gpc31:ppn=4

# Request GPU nodes
#PBS -l nodes=gpu1:ppn=4

🚨 Real Example: Node Issues

Sometimes a node may show as "free" but still have problems. For example, gpc25 in the long queue has been known to cause job failures even when pbsnodes shows it as available.

If your job keeps failing on a specific node:

  1. Check available resources to find working nodes
  2. Either let the scheduler choose, or target a different node
  3. Report problematic nodes to HPC admin

6. Submitting Jobs

Basic Job Submission

# Submit a job script
qsub job.sh

# Output example:
430105.iisermhpc1

Interactive Jobs

Interactive jobs give you a shell on a compute node for testing and debugging.

# Basic interactive job (1 core)
qsub -I -l nodes=1:ppn=1 -q default

# Interactive job with specific walltime
qsub -I -l nodes=1:ppn=4 -l walltime=02:00:00 -q short

# Interactive job on specific node (if needed)
qsub -I -l nodes=gpc30:ppn=1 -q long

Deleting Jobs

# Delete a specific job
qdel 430105.iisermhpc1

# Delete multiple jobs
qdel 430105.iisermhpc1 430106.iisermhpc1

# Delete all your jobs (be careful!)
qdel $(qstat -u $USER | grep $USER | awk '{print $1}')

7. Monitoring Jobs

Check Your Jobs

# View all your jobs
qstat -u $USER

# Example:
qstat -u ms21080

# Sample output:
Job ID            Name       User      Time  S  Queue
----------------  ---------  --------  ----  -  ------
430035.iisermhpc1 simulation ms21080  03:52  R  long
430036.iisermhpc1 test_job   ms21080  02:45  R  default
430045.iisermhpc1 analysis   ms21080  00:00  Q  short
430103.iisermhpc1 e04        ms21080  00:00  H  long

# Status codes:
# R = Running
# Q = Queued (waiting to start)
# H = Held (job has been held, check why)
# E = Exiting
# C = Completed

Detailed Job Information

# Get detailed information about a specific job
qstat -f 430035.iisermhpc1

# Example output:
Job Id: 430104.iisermhpc1
    Job_Name = e04
    Job_Owner = ms21080@login1
    job_state = H
    queue = long
    server = iisermhpc1
    ...
    Resource_List.nodes = gpc25:ppn=2
    Resource_List.walltime = 1080:00:00
    ...
    comment = job held, too many failed attempts to run
    run_count = 21
    Exit_status = -3
    ...

# Key fields to look for:
# - job_state: Current status (R, Q, H, etc.)
# - exec_host: Which node(s) the job is running on
# - resources_used: CPU time, memory, walltime used
# - Resource_List: What was requested
# - comment: Error messages or hold reasons
# - Exit_status: Exit code (0 = success, non-zero = error)
# - run_count: How many times job tried to run

View All Jobs in Queue

# View all jobs in the cluster
qstat

# View all queues with their status
qstat -Q

# View detailed queue information
qstat -Qf

8. Checking Available Resources

Before submitting jobs (especially to specific nodes), you should check what resources are available. This helps you choose the right queue and avoid problematic nodes.

Check Specific Node Status

# View status of a specific node
pbsnodes gpc25

# Example output:
gpc25
     Mom = gpc25
     ntype = PBS
     state = free
     pcpus = 52
     resources_available.arch = linux
     resources_available.host = gpc25
     resources_available.mem = 394860832kb
     resources_available.ncpus = 52
     resources_available.vnode = gpc25
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.hbmem = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.vmem = 0kb
     queue = long
     resv_enable = True
     sharing = default_shared
     last_state_change_time = Thu Jan 29 00:43:49 2026
     last_used_time = Fri Jan  9 12:07:53 2026

# Key fields:
# state = free          → Node is available
# state = job-exclusive → All cores in use
# state = offline       → Node is not available
# state = down          → Node has problems
# pcpus = 52            → Total cores on node
# resources_assigned.ncpus = 0  → Currently used cores
# queue = long          → Which queue this node belongs to

⚠️ Warning: "free" Doesn't Always Mean Working!

A node may show state = free but still have issues. For example, network problems or filesystem mount issues can cause jobs to fail even on "free" nodes. If your job keeps failing on a specific node, try a different one!

Check All Nodes

# View status of all nodes
pbsnodes -a

Check Free Cores in Each Queue

Use this script to find which nodes have free cores in a specific queue. This is very useful before submitting jobs!

Script for Checking Free Cores (Long Queue Example):

pbsnodes -a | awk '
/^[a-z]/ {node=$1}
/queue = long/ {inqueue=1}
/pcpus/ {if(inqueue) total=$3}
/resources_assigned.ncpus/ {
  if(inqueue){
    used=$3;
    free=total-used;
    printf "%-8s : %3d free cores\n", node, free;
    sum+=free;
    if(free==total) fullfree++;
    inqueue=0
  }}
END {
  print "------------------------";
  print "Total free cores  =", sum;
  print "Fully free nodes  =", fullfree;
}'

Example Output:

gpc27    :  46 free cores
gpc28    :  20 free cores
gpc29    :  12 free cores
gpc30    :  49 free cores
gpc31    :  52 free cores
gpc32    :  20 free cores
gpc25    :  52 free cores
------------------------
Total free cores  = 251
Fully free nodes  = 2

✅ How to Use This Information

Once you have this output, you can:

  1. See total available cores: 251 cores available in long queue
  2. Find fully free nodes: 2 nodes have all 52 cores free (gpc31, gpc25)
  3. Choose a working node: If gpc25 is not working, use gpc31 or others
  4. Target specific node: #PBS -l nodes=gpc31:ppn=4

For Other Queues (Change the queue name):

# For 'short' queue - change "long" to "short":
pbsnodes -a | awk '
/^[a-z]/ {node=$1}
/queue = short/ {inqueue=1}
/pcpus/ {if(inqueue) total=$3}
/resources_assigned.ncpus/ {
  if(inqueue){
    used=$3;
    free=total-used;
    printf "%-8s : %3d free cores\n", node, free;
    sum+=free;
    if(free==total) fullfree++;
    inqueue=0
  }}
END {
  print "------------------------";
  print "Total free cores  =", sum;
  print "Fully free nodes  =", fullfree;
}'

# For 'default' queue - change to: /queue = default/
# For 'infinity' queue - change to: /queue = infinity/

Add This Function to Your ~/.bashrc (Recommended)

# Add this to your ~/.bashrc file for easy access
checkfree() {
    local queue=${1:-long}
    echo "=== Free cores in '$queue' queue ==="
    pbsnodes -a | awk -v q="$queue" '
    /^[a-z]/ {node=$1}
    /queue =/ {if($3==q) inqueue=1}
    /pcpus/ {if(inqueue) total=$3}
    /resources_assigned.ncpus/ {
      if(inqueue){
        used=$3;
        free=total-used;
        if(free>0) printf "%-8s : %3d free cores\n", node, free;
        sum+=free;
        if(free==total) fullfree++;
        inqueue=0
      }}
    END {
      print "------------------------";
      print "Total free cores  =", sum;
      print "Fully free nodes  =", fullfree+0;
    }'
}

# After adding to ~/.bashrc, reload it:
source ~/.bashrc

# Now you can use:
checkfree long
checkfree short
checkfree default
checkfree infinity
checkfree gpushort
checkfree gpulong

            

🚨 Important: After Checking Free Cores

Once you identify free nodes, if you know a specific node is NOT working (like gpc25 in the example), target a different node or let the scheduler choose automatically:

# Option 1: Let scheduler choose (SAFEST)
#PBS -l nodes=1:ppn=1

# Option 2: Target a known working node
#PBS -l nodes=gpc31:ppn=1

Remember: Use ppn=1 unless your code is actually parallelized!

9. Complete Examples

Example 1: Simple Python Job (Serial - Single Core)

#!/bin/bash
#PBS -N python_analysis
#PBS -l nodes=1:ppn=1           # ONLY 1 CORE for serial job!
#PBS -l walltime=04:00:00
#PBS -q default
#PBS -o python_out.log
#PBS -e python_err.log

cd $PBS_O_WORKDIR

module load anaconda3

echo "Job started at: $(date)"
echo "Running on node: $(hostname)"

python analysis.py

echo "Job finished at: $(date)"

Example 2: Multi-threaded Job (OpenMP)

#!/bin/bash
#PBS -N openmp_sim
#PBS -l nodes=1:ppn=16          # 16 cores on 1 node
#PBS -l walltime=48:00:00
#PBS -q short
#PBS -o omp_output.log
#PBS -e omp_error.log

cd $PBS_O_WORKDIR

# Set OpenMP threads to match requested cores
export OMP_NUM_THREADS=16

echo "Using $OMP_NUM_THREADS threads"
./my_openmp_program

Example 3: MPI Job (Multiple Nodes)

#!/bin/bash
#PBS -N mpi_simulation
#PBS -l nodes=4:ppn=20          # 4 nodes, 20 cores each = 80 total
#PBS -l walltime=72:00:00
#PBS -q short
#PBS -o mpi_output.log
#PBS -e mpi_error.log

cd $PBS_O_WORKDIR

module load openmpi-4.1.0

# Calculate total processes
NPROCS=$(wc -l < $PBS_NODEFILE)
echo "Running on $NPROCS processors"

mpirun -np $NPROCS ./my_mpi_program

Example 4: Geant4 Simulation

#!/bin/bash
#PBS -N geant4_sim
#PBS -l nodes=1:ppn=2
#PBS -l walltime=168:00:00
#PBS -q long
#PBS -o geant4_out.log
#PBS -e geant4_err.log

cd $PBS_O_WORKDIR

# Load Geant4
module load codes/geant4/11.1

# Set Geant4 data paths
export G4ENSDFSTATEDATA=/gscratch/apps/root/geant4/install/share/Geant4/data/G4ENSDFSTATE2.3
export G4LEVELGAMMADATA=/gscratch/apps/root/geant4/install/share/Geant4/data/PhotonEvaporation5.7
export G4LEDATA=/gscratch/apps/root/geant4/install/share/Geant4/data/G4EMLOW8.2
export G4PARTICLEXSDATA=/gscratch/apps/root/geant4/install/share/Geant4/data/G4PARTICLEXS4.0

# Record timing
START=$(date +%s)
echo "Started at: $(date)"

./sim run.mac

END=$(date +%s)
echo "Finished at: $(date)"
echo "Duration: $((END-START)) seconds"

Example 5: Targeting a Specific Working Node

#!/bin/bash
# Use this when you've checked available nodes and want to avoid a problematic one
# First run: checkfree long (see Section 8)
# Then choose a working node from the output

#PBS -N my_simulation
#PBS -l nodes=gpc31:ppn=1       # Targeting gpc31 specifically
#PBS -l walltime=720:00:00
#PBS -q long
#PBS -o output.log
#PBS -e error.log

cd $PBS_O_WORKDIR

echo "Running on node: $(hostname)"
./my_program

10. Troubleshooting

Job Stuck in Queue (Q status)

Possible Cause Solution
Requested resources not available Use the free cores checking script to find available resources, then reduce core/node request
Targeting a busy node Remove specific node requirement, let scheduler choose
Queue is full Wait, or try a different queue
Exceeded user limits Check qstat -Qf for max_user_res.ncpus limits

Job Held (H status)

# Check why job is held
qstat -f JOB_ID | grep -E "(comment|run_count|Exit_status)"

# Example output:
comment = job held, too many failed attempts to run
run_count = 21
Exit_status = -3

# This means the node has issues! Steps to fix:
# 1. Delete the held job
qdel JOB_ID

# 2. Check which nodes are free (see Section 8)
# 3. Either let scheduler choose or pick a different node
# 4. Resubmit
qsub job.sh

Job Failed (Exit_status ≠ 0)

# Check exit status
qstat -f JOB_ID | grep Exit_status

# Common exit codes:
# 0    = Success
# 1    = General error in your program
# -3   = Job couldn't start (node/environment issue)
# 137  = Killed (memory limit exceeded or SIGKILL)
# 265  = Walltime exceeded

# Check error log
cat error.log

Node Shows "Free" But Job Keeps Failing

# Check the node status
pbsnodes gpc25

# Even if state = free, the node might have issues!
# Check last_used_time - if it's old, node might be problematic

# Solution: Use a different node or let scheduler choose
#PBS -l nodes=1:ppn=1           # Let scheduler choose
# OR
#PBS -l nodes=gpc30:ppn=1       # Target known working node

💡 Pro Tip: Test Before Long Runs

Before submitting a long job, always test with a short run first:

  1. Submit to default queue with short walltime
  2. Check if it starts and runs correctly
  3. Then submit the full job to long or infinity

11. Quick Reference Commands

Job Submission & Control

Command Description
qsub job.sh Submit a job script
qsub -I -l nodes=1:ppn=1 -q default Start interactive session
qdel JOB_ID Delete/cancel a job
qhold JOB_ID Hold a queued job
qrls JOB_ID Release a held job

Job Monitoring

Command Description
qstat Show all jobs in queue
qstat -u $USER Show only your jobs
qstat -f JOB_ID Detailed job information
qstat -Q Show queue summary
qstat -Qf Detailed queue information

Node & Resource Information

Command Description
pbsnodes -a Show all nodes status
pbsnodes NODE_NAME Show specific node status
checkfree long Check free cores in long queue (after adding function to .bashrc)

Module Management

Command Description
module avail List available modules
module load NAME Load a module
module unload NAME Unload a module
module list Show loaded modules
module purge Unload all modules

PBS Environment Variables

Variable Description
$PBS_O_WORKDIR Directory where qsub was executed
$PBS_JOBID Unique job identifier
$PBS_NODEFILE File containing list of assigned nodes
$PBS_JOBNAME Name of the job
$PBS_QUEUE Queue the job is running in

Final Tips

✅ Checklist Before Submitting

  1. Is my executable compiled and working?
  2. Are all input files in place?
  3. Is my code parallel? If NOT, use ppn=1!
  4. Did I request the correct number of cores for my program?
  5. Is my walltime estimate realistic (with buffer)?
  6. Did I choose the appropriate queue?
  7. Are my log file paths correct?
  8. Did I include cd $PBS_O_WORKDIR in my script?
  9. Did I check that the target node (if specified) is working?

💀 Remember: Don't Waste Resources!

  • Use ppn=1 for serial (non-parallel) jobs
  • Only request cores your program actually uses
  • Choose the shortest queue that fits your job
  • Clean up temporary files after jobs complete
  • Be considerate of other researchers

📧 Need Help?

For technical issues, contact the HPC support team at: helpdesk-hpc@iisermohali.ac.in

For community discussions: hpc-community@iisermohali.ac.in