Use cuda_visible_devices Environment Variable to Select GPUs in CUDA Applications image 4
Hardware

Use cuda_visible_devices Environment Variable to Select GPUs in CUDA Applications

A Guide to Understanding cuda_visible_devices in GPU Computing

If you’ve ever worked with CUDA or GPU computing, chances are you’ve come across the term “cuda_visible_devices” but may not fully understand what it refers to or how it impacts your work. In this article, I’ll attempt to demystify cuda_visible_devices and answer some of the key questions users have when researching this topic.

What is cuda_visible_devices?

In simple terms, cuda_visible_devices refers to the list of GPU devices that are visible or accessible to CUDA applications on your system. When you run a CUDA program, it needs to know which GPUs it is allowed to use for parallel processing. The cuda_visible_devices variable contains this list of available GPUs.

By default, all GPUs installed in your system will be included in the cuda_visible_devices list. However, there may be situations where you want to restrict CUDA’s visibility to only certain GPUs. For example, if you have multiple users sharing a computer with multiple GPUs, you likely don’t want every program accessing every GPU. Setting cuda_visible_devices allows you to control this visibility.

How do I check my cuda_visible_devices?

To view the current cuda_visible_devices on your system, you can run the command:

nvidia-smi -L

This will display each GPU’s index number and description. For example, on a system with two GPUs it may show:


GPU 0: GeForce GTX 1080
GPU 1: Tesla V100

Use cuda_visible_devices Environment Variable to Select GPUs in CUDA Applications image 3

So in this case, both GPUs (GPU 0 and GPU 1) would be included in the cuda_visible_devices list by default.

How can I modify cuda_visible_devices?

If you need to restrict which GPUs are visible, you can modify the cuda_visible_devices variable like this:

  1. On Linux/UNIX systems, export the CUDA_VISIBLE_DEVICES environment variable before running your CUDA program:
  2. export CUDA_VISIBLE_DEVICES=0,1

  3. On Windows, set the CUDA_VISIBLE_DEVICES variable in the system properties before launching your application.

This would make GPUs 0 and 1 the only visible devices, excluding any other GPUs from being used. You can also set it to a single GPU like CUDA_VISIBLE_DEVICES=0 to limit to just GPU 0.

Some examples of using cuda_visible_devices

Here are a few real-world examples of when you may want to modify cuda_visible_devices:

Sharing resources across users

If a workstation has multiple GPUs but you only want certain users/programs to access specific ones, cuda_visible_devices allows isolating the GPUs.

Debugging code on a single GPU

When developing or troubleshooting CUDA code, it can help to restrict to a single GPU so you have full control over where work is running.

Use cuda_visible_devices Environment Variable to Select GPUs in CUDA Applications image 2

Resource contention management

If jobs are battling for GPU memory or you see performance drops due to oversubscription, limiting cuda_visible_devices can help optimize utilization.

Testing incompatible hardware

If some GPUs don’t support the full CUDA/driver version, excluding them avoids potential crashes or errors.

In my experience, having full control over cuda_visible_devices is critical for optimizing workflows when sharing systems or developing CUDA applications.

Any gotchas to watch out for?

While modifying cuda_visible_devices is generally straightforward, there are a few things to be aware of:

  • The variable only controls which GPUs CUDA code can use – it does not actually isolate the physical GPU hardware or resources.
  • Any changes to cuda_visible_devices only take effect for subsequently launched processes – it does not retroactively affect already-running code.
  • Make sure to set cuda_visible_devices before running your application or launching CUDA runtime calls.

Also, it behaves differently in some cases based on if you are using the NVIDIA or CUDA Docker containers versus bare metal machines.

So in summary, do your cuda_visible_devices configuration early in the process and test that it is having the intended isolation effect to avoid hard-to-track-down bugs down the road.

Any other tips?

A few more tips based on my CUDA coding experiences:

Use cuda_visible_devices Environment Variable to Select GPUs in CUDA Applications image 1
  • If a process hangs or is consuming excessive resources, try resetting cuda_visible_devices as a potential troubleshooting step.
  • When doing CUDA installation, update or GPU driver troubleshooting, pay attention to how it may impact visible devices.
  • Checking cuda_visible_devices is basically like doing a “GPU device check” to confirm your assumptions about available hardware.

Hope this guide has helped explain what cuda_visible_devices refers to and how you can leverage it to better optimize and isolate your CUDA applications. Feel free to reach out if you have any other questions!

Cuda Visible Devices Table

Device ID Device Name Memory Multiprocessor Count
0 GeForce GTX 1060 6144 MB 15
1 GeForce RTX 2080 8192 MB 68
2 Tesla K80 12288 MB 56
3 Tesla P100 16384 MB 100

FAQ

  1. What does cuda_visible_devices do?

    Basically, the cuda_visible_devices environment variable allows you to control which GPUs are available to CUDA applications. By default, all GPUs installed in your system will be visible to CUDA. But sometimes you may want certain applications to only see specific GPUs.

  2. How do I set cuda_visible_devices?

    To set the cuda_visible_devices variable, you simply assign a comma-separated list of GPU IDs to it. For example, to make only GPU 0 and 1 visible, you would do:
    export cuda_visible_devices=0,1
    The GPU IDs start from 0 and increase consecutively.

  3. What are some uses of cuda_visible_devices?

    cuda_visible_devices is useful for a few different situations. It allows isolating applications to specific GPUs, like keeping training and inference separated. It also lets debugging on a single GPU. Testing software with different GPU combinations is another situation it helps with. Restricting GPU access may improve performance or stability in some cases too.

  4. Can I mix GPU and CPU in an application?

    Indeed, many CUDA applications integrate GPU and CPU processing. By default, when a GPU is specified in cuda_visible_devices, the CPU is also available in that application. Actually, even without visible GPUs, the CPU can be used for work not suited for the GPU. So both devices can normally be utilized together as needed by the application code.

  5. What happens if I leave cuda_visible_devices unset?

    If the cuda_visible_devices variable is not set in your environment, CUDA will basically use all installed GPUs by default. Every CUDA application will then have access to every GPU in the system. This is usually fine, but sometimes you maybe wanna limit an app to certain GPUs. So setting cuda_visible_devices gives you control over what hardware your code can see and use.

  6. Are there any downsides to restricting GPU access?

    While restricting GPU access with cuda_visible_devices is useful in many situations, it does come with some potential downsides. Isolating an application to one GPU means the others sit idle, reducing utilization. It also makes your code less portable if it only works for a specific GPU setup. And debugging may become more complex if you can’t easily move work between devices. So it’s best to only impose GPU limits when there’s a real need rather than by default.

    Use cuda_visible_devices Environment Variable to Select GPUs in CUDA Applications image 0