OpenMPI Tips and Tricks

Published: March 23, 2022
Updated: June 12, 2023
3 minute read

Building MPI applications

It is a good idea to run commands like mpicc --showme:compile in a dynamic fashion to find out what is required for building and linking. For instance, GNU Make allows running commands and assigning their results to variables:

CC=mpicc
MPI_COMPILE_FLAGS = $(shell mpicc --showme:compile)
MPI_LINK_FLAGS = $(shell mpicc --showme:link)

my_app: my_app.c
        $(CC) $(MPI_COMPILE_FLAGS) app.c $(MPI_LINK_FLAGS) -o app

Debugging multiple processes

If you want to start the application using GDB:

$ xterm -e gdb \
  ./program args

# and using mpirun
$ mpirun -n 4 xterm -e gdb my_mpi_application

If you want to debug multi-node applications, you should specify $DISPLAY variable prior to running the app. See here

If you want to attach to the process using GDB, you can add the following code snippet to the source and attach GDB to the process(es) you would like:

// At the beginning of the code, or anywhere before the section you want to debug.
{ 
    volatile int i = 0;
    char hostname[256];
    gethostname(hostname, sizeof(hostname));
    printf("PID %d on %s ready for attach\n", getpid(), hostname);
    fflush(stdout);
    while (0 == i)
        sleep(5);
}

And after running the app, use gdb --pid [pid] to attach to the process.

Once, attached, set variable i to something else, so it can break the while.

(gdb) set var i = 7

Profile with Nsight Systems

If MPI build is CUDA-enabled, profiling with nsys is available:

nsys profile --gpu-metrics-device=0 --trace=mpi,ucx,cuda -o reportName.%q{SLURM_PROCID} \
  ./program args

Find and use the topology of the system

nvidia-smi topo -m

Use this Case to bind processes to cores correctly, e.g.:

case ${SLURM_LOCALID} in
0)
    export CUDA_VISIBLE_DEVICES=0
    export UCX_NET_DEVICES=mlx5_1:1
    CPU_BIND=18-23
    ;;
1)
    export CUDA_VISIBLE_DEVICES=1
    export UCX_NET_DEVICES=mlx5_0:1
    CPU_BIND=6-11
    ;;
2)
    export CUDA_VISIBLE_DEVICES=2
    export UCX_NET_DEVICES=mlx5_3:1
    CPU_BIND=42-47
    ;;
3)
    export CUDA_VISIBLE_DEVICES=3
    export UCX_NET_DEVICES=mlx5_2:1
    CPU_BIND=30-35
    ;;
esac

# Run with numactl to bind processes to cores
numactl --physcpubind=${CPU_BIND} $*

Another way to bind application and memory to the NUMA node:

numactl --cpunodebind=0 --membind=0 ./app

Use compute-sanitizer

Options --log-file and --save do the same thing.

mpirun -np 4 compute-sanitizer \
  --log-file report.%q{SLURM_PROCID}.log \
  --save report.%q{SLURM_PROCID}.compute-sanitizer \
  ./program arguments

Compile with -lineinfo to get generate line correlation for device code

Read the log file using

compute-sanitizer --read <save file>

Useful runtime options

# Open MPI options
--mca btl_openib_want_cuda_gdr 1 \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 \
--mca btl_openib_want_cuda_gdr 1 --mca btl_openib_if_include mlx5_1 \
--mca btl_smcuda_cuda_ipc_verbose 100
--mca mpi_common_cuda_verbose 100 \
--mca btl_smcuda_cuda_ipc_verbose 100 \
--mca btl_base_verbose 100 \
--mca pml_base_verbose 100 \
--mca mtl_base_verbose 10 \
--mca pml_ucx_verbose 10 \
--mca opal_cuda_verbose 100 \
--mca coll_base_verbose 100 \
--mca mpi_common_cuda_verbose 100 \
export OMPI_MCA_coll_cuda_priority=80 \
--mca opal_common_ucx_tls all \
--mca opal_common_ucx_devices all \
--mca pml_ucx_tls any \
--mca pml_ucx_devices any \

# UCX options
-x UCX_TLS=rc,sm,cuda_copy,gdr_copy,cuda_ipc \
-x UCX_LOG_LEVEL=trace -x UCX_LOG_FILE=ucx.%p.log \
-x UCX_RNDV_THRESH=0 -x UCX_ZCOPY_THRESH=0 --validation
-x UCX_LOG_LEVEL=info -x UCX_PROTO_ENABLE=y -x UCX_PROTO_INFO=y \
--mca pml_monitoring_enable x (0,1,2)

Share on

Twitter Facebook LinkedIn

AmirHossein Sojoodi

OpenMPI Tips and Tricks

Building MPI applications

Debugging multiple processes

Profile with Nsight Systems

Find and use the topology of the system

Use compute-sanitizer

Useful runtime options

Share on

You May Also Enjoy

Some Tips to Release Binaries

Utilize PTX Just-In-Time (JIT) Compilation in CUDA

Collection of my dotfiles (public version)

Tmux Tips and Tricks