Profiling hardware utilization

Let’s suppose, the job has 4 GPUs allocated, but only 1 GPU is used according to jobstats. In cases like this, it’s a wise choice to inspect the job’s GPU usage by other means than jobstats, and nvidia-smi. Here, we show how to use different tools, to profile hardware utilization.

The following examples are working with AI_env/v2 (module load AI_env/v2).

Torch profiler

  • Import tools for profiler:

from torch.profiler import profile, record_function, ProfilerActivity
  • To record the GPU and CPU activities, write a context manager as the following:

...
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, pin_memory=True)

# Profile the first training batch passing forward and backward, and the optimizer separately from the whole epoch.
activities = [ProfilerActivity.CPU, ProfilerActivity.CUDA]
with profile(activities=activities, record_shapes=True) as prof:
    images, labels = next(iter(train_loader))
    images = images.to(f'cuda:{dev[0]}', non_blocking=True)
    labels = labels.to(f'cuda:{dev[1]}', non_blocking=True)
    outputs = model(images)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
prof.export_chrome_trace('trace.json') # save the profiler results in a json file

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, pin_memory=True)

# Continue training as always.
start = datetime.now()
for epoch in range(num_epochs):
   ...
torch_tracing

Code example available here: https://git.einfra.hu/hpc-public/AI_examples/-/tree/main/multi_GPU_tracing?ref_type=heads

Torch Tensorboard Profiler

  • Import tools for profiler:

from torch.profiler import profile, record_function, ProfilerActivity
  • To record GPU anc CPU activities, write a context manager as the following:

...
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, pin_memory=True)

# profile the first training batch
activities = [ProfilerActivity.CPU, ProfilerActivity.CUDA]
with profile(activities=activities,
        on_trace_ready=torch.profiler.tensorboard_trace_handler('log'), # save the log files in the 'log' folder
        record_shapes=True,
        profile_memory=True,
        with_stack=True) as prof:
    images, labels = next(iter(train_loader))
    images = images.to(f'cuda:{dev[0]}', non_blocking=True)
    labels = labels.to(f'cuda:{dev[1]}', non_blocking=True)
    outputs = model(images)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, pin_memory=True)

start = datetime.now()
for epoch in range(num_epochs): # continue training as usual
...
  • To visualize the results
    1. Forward port: ssh -L 6006:localhost:6006 <yourusername>@komondor.hpc.kifu.hu

    2. After connecting, load the AI_env module: module load AI_env/v2

    3. Launch Tensorboard on Komondor: tensorboard --logdir=log

    4. On your local computer, type the following URL in the browser: localhost:6006

torch_tracing

Code example available here: https://git.einfra.hu/hpc-public/AI_examples/-/tree/main/multi_GPU_TB_tracing?ref_type=heads

Complete documentation of Pytorch Tensorboard Profiler: https://docs.pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html

Official Pytorch documentation: https://pytorch.org/docs/stable/index.html