HPC stands for High Performance Computing. It is a cluster of powerful computers connected over a fast network. Researchers use it to run jobs that are too big or too slow to run on a personal laptop.
You connect to the cluster over SSH, write a PBS script that describes your job (how many cores, how much memory, how long it will run), and submit it. The PBS scheduler then places your job on a suitable compute node and runs it.
| Node Type | Names | Purpose |
|---|---|---|
| Login Nodes | login1, login2 |
SSH in, edit files, submit jobs |
| CPU Compute | gpc1–gpc32, bmc1–bmc7 |
CPU batch jobs |
| GPU Compute | gpu1–gpu3 |
GPU accelerated workloads |
A node is one computer inside the cluster. A core is a single processing unit inside that computer. Each node has many cores that can work at the same time.
ppn=4 in your PBS script, your job uses 4 of those cores. The other 48 cores remain free for other users jobs running on the same node.
When you SSH into the cluster, you land on the login node. It is a regular computer shared by everyone who is logged in at that moment. It is strictly meant for light tasks only.
The login node is not meant for running actual computations. For that, you use the PBS scheduler to submit your job to a compute node. You will learn more about this in the PBS section.
| ✅ Allowed on Login Node | ❌ Not Allowed on Login Node |
|---|---|
Editing scripts with nano or vim | Running Python training scripts |
Submitting jobs with qsub | Running simulations directly |
Checking jobs with qstat | Compiling large codebases |
| Creating folders and small files | Processing large datasets |
Transferring files with scp | Running any heavy program |
topYou can run top on the login node to see the current CPU and memory usage. If the load is very high, it means someone is running something heavy on the login node and you should inform the admin.
In the output above, the load average is 168 and CPU usage is 99.7%. This is very high. You can also see that four users are running jobs directly on the login node with CPU usage above 1000% each. This is wrong and will slow down the entire system.
top:Compute nodes are dedicated machines reserved for running jobs. Each node has its own CPUs, large RAM, and in the case of GPU nodes, graphics cards. You do not SSH into them directly. The PBS scheduler sends your job there automatically after you submit it from the login node.
You can choose which queue to submit your job to depending on how long your job will run and how many cores it needs. Each queue has different limits.
| Queue | Max Walltime | Max Cores per User | Use For |
|---|---|---|---|
default |
8 hours | 200 cores | Quick jobs, testing, short calculations |
short |
72 hours (3 days) | 100 cores | Medium length production jobs |
long |
1080 hours (45 days) | 100 cores | Long running simulations |
infinity |
4380 hours (about 6 months) | 50 cores | Very long calculations, use sparingly |
default queue when testing your script. Only move to long or infinity once you are sure the job runs correctly. Submitting a broken job to a long queue wastes your allocation and blocks resources for others.