login1 or login2. GPU node access only works from login1. If you're on login2 and try to request a GPU node, the command will hang or fail.
hostnameexit and SSH again until you land on login1.
Use an interactive PBS session to get a shell on a GPU node. This works only from login1.
# From login1 only — will NOT work from login2
[user@login1 ~]$ qsub -I -q gpulong
| Queue | GPU Nodes | Current Status | Use For |
|---|---|---|---|
gpushort | gpu1 | ⚠️ Currently down (may be revived) | Training jobs, production runs |
gpulong | gpu2, gpu3 | ✅ Active | Training jobs, production runs |
gpu2 or gpu3). Your prompt will change to something like [user@gpu2 ~]$.
If you want to request a specific GPU node (e.g., gpu2 or gpu3), add the -l host= flag:
# Request gpu2 specifically
[user@login1 ~]$ qsub -I -q gpulong -l host=gpu2
# Request gpu3 specifically
[user@login1 ~]$ qsub -I -q gpulong -l host=gpu3
nvidia-smi first) or if you need to resume work on the same node.
| Job Type | Recommended Approach |
|---|---|
| Small/quick jobs (< 30 min, testing, debugging) |
Just run directly in the interactive shell. Keep your terminal open. If it disconnects, you can restart quickly. |
| Long jobs (training, production runs) |
Always use tmux so your job survives SSH disconnects, terminal closes, or network drops. |
tmux is a terminal multiplexer. It lets you create sessions that keep running even after you disconnect. You can reattach later to check progress or interact with your job.
tmux is not available on your account, you'll need to install it yourself in your home directory. Search online for "install tmux without root" or ask ChatGPT for step-by-step help.
[user@gpu2 ~]$ tmux new -s my_training
You'll see a green status bar at the bottom. You're now inside the tmux session.
[user@gpu2 ~]$ python train.py
# or whatever command you need
Press Ctrl + B, release both, then press D.
# You'll see: [detached]
[user@gpu2 ~]$ # back to normal shell, job still runs in tmux
[user@gpu2 ~]$ tmux attach -t my_training
tmux ls # list all your tmux sessions
tmux kill-session -t my_training # end a session when done
# Step 1: From login1, get interactive GPU access
[user@login1 ~]$ qsub -I -q gpulong
qsub: waiting for job 12345.iisermhpc1 to start
qsub: job 12345.iisermhpc1 ready
# Step 2: You're now on gpu2
[user@gpu2 ~]$ tmux new -s resnet_train
# Step 3: Inside tmux, run your job
[user@gpu2 ~]$ python train_resnet.py --epochs 100
Epoch 1/100 - loss: 2.341 - acc: 0.12
Epoch 2/100 - loss: 1.987 - acc: 0.24
...
# Step 4: Detach (Ctrl+B, then D)
# [detached]
# Step 5: Later, reattach to check
[user@login1 ~]$ ssh user@hpc.iisermohali.ac.in
[user@login1 ~]$ qsub -I -q gpulong # get back to same gpu node
[user@gpu2 ~]$ tmux attach -t resnet_train
Epoch 47/100 - loss: 0.412 - acc: 0.86
...
| Command | What it does |
|---|---|
tmux new -s name | Create new session named "name" |
tmux ls | List all your tmux sessions |
tmux attach -t name | Reattach to session "name" |
tmux kill-session -t name | End session "name" |
| Ctrl+B then D | Detach from current session |
| Ctrl+B then % | Split pane vertically |
| Ctrl+B then " | Split pane horizontally |
| Ctrl+B then Arrow | Navigate between panes |
Before starting your job, always check if GPUs are already in use. Use nvidia-smi to see real-time GPU status.
[user@gpu3 ~]$ nvidia-smi
| Column | What to look for |
|---|---|
| GPU-Util | Percentage of GPU currently in use. 0% = idle, 90%+ = heavily used. |
| Memory-Usage | How much VRAM is occupied. If near 15360MiB, GPU is full. |
| Processes | Lists active jobs. "No running processes" = GPU is free. |
| Temp / Pwr | High temps (>80°C) or power caps may indicate heavy load. |
Use watch to refresh nvidia-smi every few seconds:
watch -n 2 nvidia-smi
Press Ctrl + C to stop watching.
GPU nodes are a shared, limited resource. This interactive access method gives you direct control — use it thoughtfully.
nvidia-smi and ps aux to check before starting work.nvidia-smi before starting a jobtmux for long jobs so they survive disconnectionsmyproject_train)tmux kill-session when your job is doneexit when finishednvidia-smi and process lists next time.Despite the queue names gpushort and gpulong, GPU nodes do not enforce walltime limits in the current configuration. Your interactive session will continue until:
This is why using tmux is critical — your job won't be auto-killed by a timer, but it will stop if your SSH connection drops without tmux.