Enabling MPS

Published:
Updated:
5 minute read

MPS

MPS (Multi-Process Service) is a feature of the NVIDIA driver that allows multiple CUDA applications to share a single GPU. This feature is useful when individual applications do not fully utilize the GPU. MPS can improve GPU utilization and reduce the number of context switches between applications.

As the man page of nvidia-cuda-mps-control suggests:

When CUDA is first initialized in a program, the CUDA driver attempts to connect to the MPS control daemon. If the connection attempt fails, the program continues to run as it normally would without MPS. If however, the connection attempt to the control daemon succeeds, the CUDA driver then requests the daemon to start an MPS server on its behalf. If there’s an MPS server already running, and the user id of that server process matches that of the requesting client process, the control daemon simply notifies the client process of it, which then proceeds to connect to the server. If there’s no MPS server already running on the system, the control daemon launches an MPS server with the same user id (UID) as that of the requesting client process. If there’s an MPS server already running, but with a different user id than that of the client process, the control daemon requests the existing server to shutdown as soon as all its clients are done. Once the existing server has terminated, the control daemon launches a new server with the user id same as that of the queued client process.

To enable MPS, run the following commands:

nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-smi -i 0 -r # 

nvidia-cuda-mps-control -d

As NVIDIA document suggests, the first command sets the GPU to exclusive mode in order to make sure that other processes access the GPU through the MPS daemon. The second command resets the GPU with the new setting. I am still not sure if the settings are applied without a reset or not! The third command starts the MPS server. All commands should be run as root.

To see the status of MPS, run the following command:

ps -ef | grep mps # See if the MPS daemon is running.

To shutdown MPS, run the following command:

echo quit | nvidia-cuda-mps-control
# Make sure to change the GPU mode back to default.
nvidia-smi -i 0 -c DEFAULT
nvidia-smi -i 0 -r

CUDA Visible Devices

Some notes on CUDA_VISIBLE_DEVICES, from here:

CUDA_VISIBLE_DEVICES is used to specify which GPU’s should be visible to a CUDA application. Only the devices whose index or UUID is present in the sequence are visible to CUDA applications and they are enumerated in the order of the sequence.

When CUDA_VISIBLE_DEVICES is set before launching the control daemon, the devices will be remapped by the MPS server. This means that if your system has devices 0, 1 and 2, and if CUDA_VISIBLE_DEVICES is set to “0,2”, then when a client connects to the server it will see the remapped devices - device 0 and a device 1. Therefore, keeping CUDA_VISIBLE_DEVICES set to “0,2” when launching the client would lead to an error.

Enabling MPS separately on each Device

To enable MPS on each device separately you can do as following:

#!/bin/bash
set -eux

# Set the number of available GPUs
GPUs=4

for ((i=0;i<$GPUs;i++)); do

  nvidia-smi -i $i -c EXCLUSIVE_PROCESS
  mkdir /tmp/mps_$i
  mkdir /tmp/mps_log_$i
  export CUDA_VISIBLE_DEVICES=$i
  export CUDA_MPS_PIPE_DIRECTORY=/tmp/mps_$i
  export CUDA_MPS_LOG_DIRECTORY=/tmp/mps_log_$i
  nvidia-cuda-mps-control -d

done

ps -ef | grep mps

To run an application on behind a specific MPS client, use this:

# Selecting MPS server on device 0 
CUDA_MPS_PIPE_DIRECTORY=/tmp/mps_0 CUDA_MPS_LOG_DIRECTORY=/tmp/mps_log_0 my_awesome_app

And in order to shutdown MPS, run the following:

#!/bin/bash
set -eux

# Set the number of available GPUs
GPUs=4

for ((i=0;i<$GPUs;i++)); do
  export CUDA_MPS_PIPE_DIRECTORY=/tmp/mps_$i
  echo "quit" | nvidia-cuda-mps-control

  nvidia-smi -i $i -c DEFAULT

  rm /tmp/mps_log_$i -rf
  rm /tmp/mps_$i -rf
done

ps -ef | grep mps

Limitations

Some notes on CUDA capability and MPS limitations, from the same man page:

The MPS server creates the shared GPU context, and manages its clients. An MPS server can support a finite amount of CUDA contexts determined by the hardware architecture it is running on. For compute capability SM 3.5 through SM 6.0 the limit is 16 clients per GPU at a time. Compute capability SM 7.0 has a limit of 48. MPS is transparent to CUDA programs, with all the complexity of communication between the client process, the server and the control daemon hidden within the driver binaries. Currently, CUDA MPS is available on 64-bit Linux only, requires a device that supports Unified Virtual Address (UVA) and has compute capability SM 3.5 or higher. Applications requiring pre-CUDA 4.0 APIs are not supported under CUDA MPS. Certain capabilities are only available starting with compute capability SM 7.0.

For more info, see here