Overview

Login Node

When you login to Komondor, you arrive at the login node (v01). Apart from being the gate to the supercomputer system, the login node can be used to manage your projects on the supercomputer. Management tasks include:

uploading programs and data to the storage;

compiling and installing software;

preparing your application to be run on the supercomputer;

submitting computation jobs to the compute nodes using the Slurm scheduler;

monitoring and handling submitted jobs;

checking job efficiency to improve subsequent jobs;

arranging, downloading and backuping computation results.

Important

Though small management tasks that don’t require large resources can be executed on the login node, resource-intensive computation tasks are subject to interruption without prior notice.

Compute Nodes

Actual computation tasks must be run on the compute nodes. These are the nodes that provide the computing power of the supercomputer. According to the amount and type of resources, there are four different types of compute nodes in Komondor, arranged in four units called partitions (or queues (using terminology of other queuing systems)). Partitions serve different type of computing requirements.

Job Queues (Partitions)

Four, non-overlapping job queues (partitions) are available on the Komondor supercomputer: the “CPU”, the “GPU”, the “AI” and the “BigData”. There is no testing partition, all four partitions are for actual computations.

Partition	Compute nodes	CPUs / node	CPU cores / node	GPUs / node	Memory / node
“CPU” (default)	184 x	2 AMD CPUs	128 CPU cores	N/A	256 GB RAM
“GPU”	58 x	1 AMD CPU	64 CPU cores	4 A100 GPUs	256 GB RAM
“AI”	4 x	2 AMD CPUs	128 CPU cores	8 A100 GPUs	512 GB RAM
“BigData”	1 x	16 Intel CPUs	288 CPU cores	N/A	12 TB RAM

You can instruct the Slurm scheduler to allocate the necessary resources and launch your tasks on the compute nodes using Slurm commands and special directives in your job script. Slurm queues all submitted jobs (from all users) according to their calculated priority and starts them on a schedule based on available resources.