Debug MPI+CUDA Apps with CUDA-GDB
Compiling with host -g
and device debug symbols -G
is slow. Also, the launcher command would be very complex. The workaround is attaching the debugger later:
#include <unistd.h>
if (rank == 0) {
char name[255];
gethostname(name, sizeof(name));
bool attached;
printf("rank %d: pid %d on %s ready to attach\n", rank, getpid(), name);
while (!attached) { sleep(5); }
}
Launch process, sleep on particular rank:
$ srun -n 4 ./jacobi -niter 10
rank 0: pid 28920 on jwb0001.juwels ready to attac
Then attach from another terminal (may need more flags)
srun -n 1 --jobid ${JOBID} --pty bash -i
cuda-gdb --attach 28920
Wake up the sleeping process and continue debugging normally
(cuda-gdb) set var attached=true
Automatically wait for attach on exception without code changes, use option CUDA_DEVICE_WAITS_ON_EXCEPTION
:
$ CUDA_DEVICE_WAITS_ON_EXCEPTION=1 srun ./jacobi -niter 10
Single GPU jacobi relaxation: 10 iterations on 16384 x 16384 mesh with norm check every 1 iterations
jwb0129.juwels: The application encountered a device error and CUDA_DEVICE_WAITS_ON_EXCEPTION is set. You
can now attach a debugger to the application (PID 31562) for inspection.
Then, attach the debugger as before:
$ cuda-gdb --pid 31562
CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x508ca70 (jacobi_kernels.cu:88)
Thread 1 "jacobi" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 4, block (0,0,0), thread (0,20,0), device 0, sm 0, warp 21, lane 0]
0x000000000508ca80 in jacobi_kernel<32, 32><<<(512,512,1),(32,32,1)>>> (/*...*/) at jacobi_kernels.cu:88
88 real foo = *((real*)nullptr);