Enabling MPS

Published:
Updated:
4 minute read

MPS

MPS (Multi-Process Service) is a feature of the NVIDIA driver that allows multiple CUDA applications to share a single GPU. This feature is useful when individual applications do not fully utilize the GPU. MPS can improve GPU utilization and reduce the number of context switches between applications.

As the man page of nvidia-cuda-mps-control suggests:

When CUDA is first initialized in a program, the CUDA driver attempts to connect to the MPS control daemon. If the connection attempt fails, the program continues to run as it normally would without MPS. If however, the connection attempt to the control daemon succeeds, the CUDA driver then requests the daemon to start an MPS server on its behalf. If there’s an MPS server already running, and the user id of that server process matches that of the requesting client process, the control daemon simply notifies the client process of it, which then proceeds to connect to the server. If there’s no MPS server already running on the system, the control daemon launches an MPS server with the same user id (UID) as that of the requesting client process. If there’s an MPS server already running, but with a different user id than that of the client process, the control daemon requests the existing server to shutdown as soon as all its clients are done. Once the existing server has terminated, the control daemon launches a new server with the user id same as that of the queued client process.

To enable MPS, run the following commands:

nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-smi -i 0 -r # 

nvidia-cuda-mps-control -d

As NVIDA document suggests, the first command sets the GPU to exclusive mode in order make sure that other processes access GPUs through MPS server. The second command reset the GPU with the new setting. I am still not sure if the settings are applied without a reset or not! The third command starts the MPS server. All commands should be run as root.

To see the status of MPS, run the following command:

ps -ef | grep mps # See if the MPS daemon is running.

To shutdown MPS, run the following command:

echo quit | nvidia-cuda-mps-control
# Make sure to change the GPU mode back to default.
nvidia-smi -i 0 -c DEFAULT
nvidia-smi -i 0 -r

CUDA Visible Devices

Some notes on CUDA_VISIBLE_DEVICES, from here:

CUDA_VISIBLE_DEVICES is used to specify which GPU’s should be visible to a CUDA application. Only the devices whose index or UUID is present in the sequence are visible to CUDA applications and they are enumerated in the order of the sequence.

When CUDA_VISIBLE_DEVICES is set before launching the control daemon, the devices will be remapped by the MPS server. This means that if your system has devices 0, 1 and 2, and if CUDA_VISIBLE_DEVICES is set to “0,2”, then when a client connects to the server it will see the remapped devices - device 0 and a device 1. Therefore, keeping CUDA_VISIBLE_DEVICES set to “0,2” when launching the client would lead to an error.

Limitations

Some notes on CUDA capability and MPS limitations, from the same man page:

The MPS server creates the shared GPU context, and manages its clients. An MPS server can support a finite amount of CUDA contexts determined by the hardware architecture it is running on. For compute capability SM 3.5 through SM 6.0 the limit is 16 clients per GPU at a time. Compute capability SM 7.0 has a limit of 48. MPS is transparent to CUDA programs, with all the complexity of communication between the client process, the server and the control daemon hidden within the driver binaries. Currently, CUDA MPS is available on 64-bit Linux only, requires a device that supports Unified Virtual Address (UVA) and has compute capability SM 3.5 or higher. Applications requiring pre-CUDA 4.0 APIs are not supported under CUDA MPS. Certain capabilities are only available starting with compute capability SM 7.0.

For more info, see here