Tuesday, August 17, 2010

CUDA 3.1 on Ubuntu 9.10 for Tesla c870

CUDA toolkit version 3.1 on Ubuntu 9.10 : [32 bit]
For Tesla c870 with Intel Onboard video (Integrated Graphics)

This is based heavily on the information from here:
http://forums.nvidia.com/index.php?showtopic=171590

[If you want CUDA 3.1 to run on Ubuntu 10.04, follow the above link to the Nvidia forums]

Note : Windows XP does not like dual video cards from different manufacturers, I believe Vista and later do support it - hence I had to goto Ubuntu.

You will need:
a) Ubuntu 9.10 (This is the latest supported Ubuntu package from NVIDIA
for CUDA 3.1)
b) A compatible card that supports CUDA. I'm using the Tesla c870 in this case.

My requirements:
1. The Tesla cards do not provide Video Out - hence the card needed to work in conjunction with the onboard Intel video card.
3. Multiple users need to be able to compile and run the code, so it needs to get installed to a commonly accessible folder

**Installation and Configuration**

==Part A==Getting Started==

1. Install Ubuntu.
2. Run the Update Manager and get all the updates to date.
3. Get g++ installed using "sudo apt-get install g++" (w/o quotes)

This will install g++ compiler for gcc-4.4.1 (which is the default for Ubuntu 9.10)

==Part B == Installing the driver==

Get the latest NVIDIA Developer driver from
http://developer.nvidia.com/object/cuda_3_1_downloads.html

Version 256.40 or later. Earlier versions will not work. (Ubuntu 9.10 comes with 175 and 195 in the restricted driver set, these did not work for me)

When I wrote this, the driver was packaged in a file called
devdriver_3.1_linux_32_256.40.run

You will need to install this after stopping the gdm (Gnome Desktop
Manager). To do this:

a) From the desktop, press CTRL+Alt+F1 - this should take you to a tty prompt.
b) Login with your username.
c) Type in "sudo /etc/init.d/gdm stop" (w/o quotes) to stop the gdm.
d) Wait 1 minute while the X server unloads from memory.
e) To launch the installation type "sudo ./devdriver_3.1_linux_32_256.40.run" (w/o quotes).
f) The installation should start - accept all the default options and you should be good to go.

If step (f) does not start, make sure the file is executable (Type "chmod +x devdriver_3.1_linux_32_256.40.run" (no quotes) at the command prompt to make it executable)

g) Once the driver has finished installation, reboot the system. This can be done easily from the command line - use the command - "sudo reboot"

h) Ubuntu should reboot and bring you back to the Gnome Desktop at this point. Login normally.

==Part C==Installing the Toolkit==

1. Now get the Toolkit from the CUDA site. When I wrote this, the toolkit was in a package called cudatoolkit_3.1_linux_32_ubuntu9.10.run

2. Before running this, there are a couple of packages you need to get. Otherwise during the make process, you will hit errors.

3. Open Terminal, and get the following packages
#(Fixes "cannot find -lXi" error)
* libxext-dev
* libxi-dev
* x11proto-xext-dev

#(Fixes "cannot find -lXmu" error)
* libice-dev
* libsm-dev
* libxt-dev
* libxmu-headers
* libxmu-dev

#(Fixes "cannot find -lglut" error)
* freeglut3-dev
* libglut3-dev

You can get all of these using "sudo apt-get [package-name]" (w/o quotes) where package-name is one of the above. Or you can use one large apt-get statement and get all of them in a single command like below.

"sudo apt-get install libxext-dev libxi-dev x11proto-xext-dev libice-dev
libsm-dev libxt-dev libxmu-headers libxmu-dev freeglut3-dev libglut3-dev"

(no quotes, remove any linebreaks that the HTML inserts)

You may not need all of them and some of them may already be installed on your machine depending on what you other software you have installed.

4. Once you have the above, install the CUDA toolkit.
4.a. Command = "sudo ./cudatoolkit_3.1_linux_32_ubuntu9.10.run" (w/o quotes)
4.b If it doesn't execute make sure it is executable, using the command "chmod +x cudatoolkit_3.1_linux_32_ubuntu9.10.run"

5. Accept the default installation path /usr/local/cuda, or change it as you wish, but keep track of it

6. Let it run to completion. It will exit with a message saying that the installation was successful as well as messages to add certain PATHS to your profile

**7. Open the GLOBAL .bashrc file (located as /etc/bash.bashrc) and add the two path statements listed below to the end of the file. You will have to be root to edit /etc/*, so open the file using the sudo command with your favorite text editor such as "sudo vi /etc/bash.bashrc"

export LD_LIBRARY_PATH=/usr/local/cuda/lib;
export PATH=$PATH:/usr/local/cuda/bin/;

(Single lines, terminate with a semicolon, no space on either side of the = signs, no line breaks.)

This is important for global use. If you don't want it globally, set these two statements inside your .bashrc (~/.bashrc).

8. Back at the command line, type in "sudo source /etc/bash.bashrc" to read in the updated paths. Test it to make sure using "echo $LD_LIBRARY_PATH" it should return the above path.

== Part D == Installing the SDK code samples ==

1. Get the SDK code samples from NVIDIA's site. When I wrote this, the SDK was inside a file called gpucomputingsdk_3.1_linux.run
2. Run it "sudo ./gpucomputingsdk_3.1_linux.run". Again do the chmod +x thing if it is not executable.

3. DO NOT ACCEPT the default install path. The default install path is something like "~/NVIDIA_COMPUTING_SDK" which will put it in your home-directory. This is a bad idea if you want global access. Make it something accessible to all, I used "/opt/cudasdk".

4. It will ask you for path to CUDA, if you installed cuda above in Part C step-5 to the default part, it will be in /usr/local/cuda

5. Let it run till it completes.

6. Once it completes, you need to compile the code samples for your machine. To do this navigate to the SDK path (path you specified in Step-3 of Part D). Then navigate to the subfolder "C". You should see a Makefile. To make the examples, run "sudo make"

7. The make process may throw up a bunch of warnings, but all files should make without errors. If there were errors, see if all the packages you should have installed in Part-C, Step-3 are present.

8. If you have to run make again, always do a "make clean" before running make.

9. At this point, you have almost got everything to work. If you have a CUDA enabled graphics card, you should be ready to go. If you are using a Tesla card like I am, you need a few more steps.

== Part E == Getting the Tesla card available on boot ==

1. Because the Tesla card is another video card, you need to make it get created as a node in /dev/ on each boot. (If you are using a graphics card, you don't need this, since it will be present since it is being used).

2. Make a file scriptname.sh (I call it cuda_start.sh) with the following (remove any line breaks that the HTML inserts) __Update the entry Provides: to match the name of the file__

===begin copy below this====

### BEGIN INIT INFO
# Provides: cuda_start
# Required-Start: $syslog
# Required-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Mounts CUDA device as a node in /dev
# Description: CUDA card requires entry in /dev, force create since NVIDIA is not video card
### END INIT INFO

modprobe nvidia
N3D=`/usr/bin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/usr/bin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
RETVAL=$?
[ "$RETVAL" = 0 ] || exit $RETVAL
done

mknod -m 666 /dev/nvidiactl c 195 255

===end copy above this====

3. What the script does is to make a node in /dev. This is based off the code sample I found from user (mfatica) here : http://forums.nvidia.com/index.php?showtopic=49769&st=0&p=272085&#entry272085

4. Place this inside /etc/init.d/ so that it can be called by init when starting up. If you don't want to do this, you'll have to call the script everytime you reboot the machine.

5. If you are placing it in /etc/init.d, then
5.a Make it executable (chmod +x scriptname.sh)
5.b Run update-rc so that it calls the script as
"sudo update-rc scriptname.sh defaults"

Method from here
http://embraceubuntu.com/2005/09/07/adding-a-startup-script-to-be-run-at-bootup/

==Part F== Global access and making it all work

We are almost there, a few final touches:
Remember I wanted it to be global access. I have a group on my machine called 'examplegroup' - all the members of whom I want to give access to the CUDA sdk, so I need to set appropriate permissions.

1. Navigate to /usr/local.
2. Set group to examplegroup using the chgrp command
"sudo chgrp -R examplegroup CUDA" (w/o quotes)

Sets the group name of the folder CUDA (and recursively everything inside it because of the -R) to examplegroup/

3. Give the group full read-write permissions to the group
"sudo chmod -R g=rwx CUDA"
(w/o quotes, the g=rwx means to assign permission r(read), w(write), x(execute) to the group, -R flag for recursive )

4. Navigate to where you installed the SDK (Part-D, Step-3, for me it is /opt/cudasdk)

cd /opt/cudasdk

5. Set group to examplegroup
"sudo chgrp -R examplegroup *"

(w/o quotes, the * means everything in the folder, the -R makes it recursive)

6. Give the group read/write/execute permissions
"sudo chmod -R g=rwx *" (w/o quotes, again * = everything, -R = recursive)

7. You are done!

8. Logout and reboot the machine. This is so that your init script gets called and your device comes up. (You don't need to do it, you can source the script without rebooting but I'm not going into that)

== Part F == Test installation ==

1. When you reboot and come back, navigate to the SDK folder and then to C/bin/linux/release [For me that was cd /opt/cudasdk/C/bin/linux/release]

2. Run the deviceQuery script (./deviceQuery) You should get some glorious output

==snippet of my deviceQuery output==
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "Tesla C870"
CUDA Driver Version: 3.10
CUDA Runtime Version: 3.10
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 0
Total amount of global memory: 1610416128 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
== end snippet of deviceQuery

3. Run a benchmark test ("./nbody -benchmark -n=100")
==nbody output==

Run "nbody -benchmark [-n=]" to measure perfomance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> Compute 1.0 CUDA device: [Tesla C870]
100 bodies, total time for 10 iterations: 0.326 ms
= 0.307 billion interactions per second
= 6.133 single-precision GFLOP/s at 20 flops per interaction
==end nbody output==

4. There is no step.4.

At this point your installation is ready to go.

Good luck!

=== Update : Apr 21, 2011 ===
An unexpected power outage caused the card to stop working. On rebooting, the card refused to run deviceQuery and complained of a potential mismatch between the API and the Runtime library.

After lots of debugging with the help of my colleagues and the department IT folks, we traced it to a corrupted driver package. The solution was to re-install the driver such that it overwrites the existing installation. To do this, follow Part-B "Driver Installation" again step-by-step and it will reinstall the driver for you.
You must do this step from a local console. Part-B driver installation cannot be reliably done from a remote SSH login.