Debug MPI+CUDA Apps with CUDA-GDB

1 minute read

Compiling with host -g and device debug symbols -G is slow. Also, the launcher command would be very complex. The workaround is attaching the debugger later:

#include <unistd.h>
if (rank == 0) {
    char name[255]; 
    gethostname(name, sizeof(name));
    bool attached;
    printf("rank %d: pid %d on %s ready to attach\n", rank, getpid(), name);
    while (!attached) { sleep(5); }

Launch process, sleep on particular rank:

$ srun -n 4 ./jacobi -niter 10
rank 0: pid 28920 on jwb0001.juwels ready to attac

Then attach from another terminal (may need more flags)

srun -n 1 --jobid ${JOBID} --pty bash -i
cuda-gdb --attach 28920

Wake up the sleeping process and continue debugging normally

(cuda-gdb) set var attached=true

Automatically wait for attach on exception without code changes, use option CUDA_DEVICE_WAITS_ON_EXCEPTION:

$ CUDA_DEVICE_WAITS_ON_EXCEPTION=1 srun ./jacobi -niter 10
Single GPU jacobi relaxation: 10 iterations on 16384 x 16384 mesh with norm check every 1 iterations
jwb0129.juwels: The application encountered a device error and CUDA_DEVICE_WAITS_ON_EXCEPTION is set. You 
can now attach a debugger to the application (PID 31562) for inspection.

Then, attach the debugger as before:

$ cuda-gdb --pid 31562
CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x508ca70 (
Thread 1 "jacobi" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 4, block (0,0,0), thread (0,20,0), device 0, sm 0, warp 21, lane 0]
0x000000000508ca80 in jacobi_kernel<32, 32><<<(512,512,1),(32,32,1)>>> (/*...*/) at
88 real foo = *((real*)nullptr);

For more info, see here and there