How to run Tensorflow using NVIDIA CUDA and Docker on Windows 10 WSL ?

26-Jun-20 . 8 mins read
tensorflow docker

Background

This is a followup to my earlier post in which I wrote how to setup Docker and Python. I had recently installed a NVIDIA GPU (RTX 2060 Super) in my machine and I wanted to use it to develop deep learning models in Tensorflow. I have been extensively using Docker and VS Code, so I was looking for a setup that would fit in my existing workflow.

There were few options :

  • Install Python and tensorflow directly on Windows [The setup seems quite complicated and I prefer the containerised approach for development]
  • Dual boot into Ubuntu and setup tensorflow [Best and recommended. But this would mean I would be switching back forth between Ubuntu and Windows]
  • Use the Docker Desktop for Windows and create tensorflow containers [Currently GPU is not supported, hence the next option]
  • Use Windows Subsystem for Linux (WSL)

I decided to go with the last option. And it was a good timing as Microsoft and NVIDIA had recently announced the support for GPU acceleration in WSL 2. So, the plan is as follows :

  1. Enable WSL on Windows
  2. Install Ubuntu inside WSL
  3. Install Docker and NVIDIA toolkit in Ubuntu and create tensorflow containers (with GPU support)
  4. Use the VS Code IDE for development

Please note that as of 26th Jun 20, most of these features are still in development. They worked for me and you may try them at your own risk, as they did end up messing some parts of my system. You will need to sign up for :

Also, under each step I have mentioned some CHECKS. Hope they help in making sure you are on the right track.

1. Windows Insider Preview Build 20150

First and foremost, the GPU support on WSL is not available in the Retail Build of Windows 10. I had to get the latest Windows 10 Insider Preview Build 20150. The details of all the builds are available in the Flight Hub. The installation process took an hour.

CHECK : Windows Version. In Power Shell (PS) or CMD, type winver

2. Install WSL

Steps to be done in PS/CMD:

  1. Enable WSL1 : dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
  2. Enable WSL2 : dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
  3. Restart Computer
  4. Default to version 2 : wsl.exe --set-default-version 2

3. Linux Kernel

After running the above command, I got this message : WSL 2 requires an update to its kernel component. For information please visit https://aka.ms/wsl2kernel.

The page has instructions for installing the WSL2 Linux Kernel MSI. After installing it in Win10, this command worked : wsl.exe --set-default-version 2

CHECK : Linux Kernel version should be 4.19.121 or higher. On PS/CMD type wsl cat /proc/version
Output : Linux version 4.19.121-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Fri Jun 19 21:06:10 UTC 2020

If you dont have latest kernel, then go to Windows Settings \ Update & Security \ Windows Update \ Check for updates. Ensure that “Receive Updates” is turned on.

As per my understanding, in the future releases both the installation/update of WSL Linux Kernel will be handled by these two commands : wsl --install and wsl --update

As per Microsoft :

We’ve removed the Linux kernel from the Windows OS image and instead will be delivering it to your machine via Windows Update, the same way that 3rd party drivers (like graphics, or touchpad drivers) are installed and updated on your machine today. When automatic install and update of the Linux kernel is added you’ll start getting automatic updates to your kernel right away.

4. Install Ubuntu 18.04

  • You can get the distro from the Microsoft store, to be installed on WSL
  • I prefer the Debian distribution as it is lighter than Ubuntu, so I spent quite some time in setting it up. But it did not work.
  • In the end, I went ahead with Ubuntu 18.04 (I didn’t choose 20.04 as I have read somewhere that tensorflow has issues with that version. I might be wrong)

CHECK : Version of Ubuntu. On PS/CMD type wsl -l -v
Output :
NAME STATE VERSION
* Ubuntu-18.04 Stopped 2

5. Install CUDA drivers on Win 10 (host)

The CUDA drivers for WSL - Public Preview can be installed from here

CHECK : The CUDA driver version should be 455.41 or higher. On PS/CMD type nvidia-smi

6. Install Docker

  • Enter the WSL by typing wsl on PS/CMD. This will activate Ubuntu.
  • Install docker using curl https://get.docker.com | sh
    • As far as I understand, the NVIDIA container toolkit does not work with Docker Desktop for Windows so I ignored this WSL DETECTED: We recommend using Docker Desktop for Windows. You may press Ctrl+C now to abort this script.

CHECK : Docker version. At the Ubuntu prompt, type docker -v

Docker Desktop for Windows confusion

  • The Docker Desktop for Windows is different from the docker that I installed in Step 6
  • Prior to the release of WSL2, it would use Hyper-V virtualisation for creating the Linux VMs in Windows
  • After I installed WSL2, I enabled the WSL2 integration in Docker Desktop and let it use the WSL backend
  • But this started interfering with the docker that I had installed in WSL/Ubuntu
    • In particular, it would create its own Linux distros : docker-desktop and docker-desktop-data
    • And the images/containers that I created using the WSL/Ubuntu dockerd went missing
  • I found these solutions to work:
    • Disable the Docker Desktop integration with WSL2 (so that it continues to use Hyper-V backend). And then restart computer
    • Quit Docker Desktop and degister docker-desktop and docker-desktop-data using
      • wslconfig /u docker-desktop and wslconfig /u docker-desktop-data
      • Check wsl -l -v
    • Setting default distro to Ubuntu wsl --set-default Ubuntu 18.04 did not help

7. Install NVIDIA Container Toolkit

Below commands will do the installation of the toolkit. Source : CUDA on WSL

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2

The documentation gives the below steps to start the docker daemon but on my machine it was giving this error docker: unrecognized service

$ sudo service docker stop
$ sudo service docker start

This started the docker for me sudo dockerd.

If it doesnt work you could try sudo systemctl start docker

CHECK

  • To check whether whether the installation was sucessful :
    • $ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
    • This creates a docker container and will show some details about your GPU
  • Some additional test examples are also available here

8. Use VS Code for development

One of the challenges I faced was to connect VS Code to the container that was running on Ubuntu which was inturn running inside WSL. The Remote-Containers extension was unable to reach the container. And Remote-WSL extension only reached the Ubuntu.

With some help from the VS Code community and the docker documentation here and here, I managed to get it working in below 2 steps :

  • In WSL/Ubuntu, I started the docker daemon using the -H flag
    • sudo dockerd -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375
    • This opens the 2375 port for the docker to listen (this is a non-secured way)
  • And in VS Code settings.json, I added this line : "docker.host":"tcp://localhost:2375"

And then in VS Code, I could use the Remote Containers:Attach to running containers to connect to the container running in WSL/Ubuntu

Issues faced

  • The boot loader menu for my Ubuntu - Windows 10 dual boot disappeared (Most likely after the installation of the Insider Build 20150)
  • The CUDA toolkit doesn’t seem to work on Debian distro (I might have to give it another try )
  • The Docker Desktop for Windows (with WSL2 backend) should not be running as it interferes with the docker daemon in Ubuntu
  • Some useful wsl options
    • wsl --set-default Ubuntu 18.04 to change default distro
    • wsl --shutdown to shutdown the Linux Kernel
  • I also had another non-CUDA display adapter on my machine GeForce GT710. And I realised tensorflow was crashing at this command tf.test.gpu_device_name(), so I disabled it and it seemed to work well.

Summary

This was just a start and I could run into issues as I continue using tensorflow. But in summary, it was a good learning experience even though I ended up spending couple of days in setting it up. I am not an expert in Docker neither in CUDA, so if you see any issues with the blog please leave a comment and I will rectify. You can also reach out to the NVIDIA team, they are highly engaged and promptly reply to queries on their forums. Cheers !

Machine Learning setup using Docker and Python

comments powered by Disqus