Internship at NVIDIA 2014
Chapter 3: Internship at NVIDIA
Internship at NVIDIA was new experience for me in two ways. First, I had to arrange housing and traveling all by myself. Second, I was writing a code for non-PC device that comes with some surprises. This chapter describes how an internship at NVIDIA might feel like.
First day at NVIDIA
The first day started with usual welcome presentation explaining what NVIDIA is and how the basics work. Then there was a row of semi-boring but informative presentations about all the legal stuff, insurance, stock, etc.
The cool part was that in the middle of the program the NVIDIA CEO Jen-Hsun Huang himself came to welcome us. As far as I know we were the only group of interns to have the honor to be welcomed by CEO. That was pretty cool but a few moments of his speech seemed to be a little too much "business talk".
After all the presentations interns were picked up by their mentors or managers. Every intern had a manager and a mentor. The manager was usually manager of whole team where you were interning and the mentor was a software engineer.
I was working in NVIDIA Tegra team (mobile GPUs). First, my manager introduced me to all people in the team. Then I came to my cubical where was a windows computer prepared for me. The funny part was when my mentor said:
Here in the mobile team we work only on Linux. Here is a flash drive with Ubuntu.
So my first task was to format the computer and install Ubuntu. Later I learned that the whole NVIDIA is using windows and mobile division is the only one who uses Linux. The reason for Linux comes from need of compilation of Android OS and other embedded systems based on UNIX.
The second big task was to compile the OpenGL driver locally and push it to a test device. The catch here was similar to Windows/Linux problem. The whole NVIDIA uses Perforce for version control but Android is under GIT. This means that the mobile team is somewhat in the middle using both at the same time.
Initial sync of the repository took more than 3 hours and I left compilation running over the night (it took more than 5 hours or so). It took so long because I was compiling whole Android image that I had to flash on my device. Incremental build of the OpenGL driver took just a few minutes.
First week at NVIDIA
Next day my mentor explained me everything about the NVIDIA's code base, how branches work, and how mobile chips evolved over time. That was very interesting and important because without this overview I would be completely lost. I also met with my manager and we discussed my project and what is expected from me. My first project was to fix "simple" low-priority bugs in the OpenGL driver and in their testing framework. This helped me to understand the driver and the work flow.
Some of those "simple" bugs actually turned out to be very difficult for me. I was fixing a unit test but the driver was throwing a GL error. However, according to the OpenGL specs everything seemed to be just OK. I was thinking, could it be a bug in driver itself? That sounded very unlikely. So I asked my mentor and he said:
Oh yes, an old GPU architecture XXYY does not support this feature.
All the tests are run on all versions of GPUs on many operating systems. It turned out that one old GPU does not support certain feature but all others do. I just put an alternative path for that old generation and test was fixed. The scary part is that if you do not have this knowledge you could never solve this issue. This experience taught me that just understand OpenGL is not enough. The needed knowledge has to be much deeper than that. My mentor and co-workers always amazed me be their knowledge of all the details.
I spent the rest of the first week by working closely with my mentor on understanding how everything works and I was continuing fixing some bugs assigned to me.
Coding at NVIDIA
After a few weeks I have been given a real project for the rest of my internship. It was quite complicated change of the OpenGL driver and it took me around two months to complete it. The final change was around 6000 lines (added, changed, and deleted).
I was coding in Qt Creator IDE. It was suggested to me by my mentor and it is actually quite similar to the Visual Studio. Of course there were some guys coding in Vim or similar console text editors. I used the Qt Creator only as a clever text editor. All the other tasks such as compilation or execution were done in a console.
Since I was developing mobile OpenGL driver I had a testing device. It was a bunch of circuit boards with Tegra SOC mounted on a transparent acrylic glass with a touchscreen connected by a cable. It looked like a little messier Jetson TK1 board but there was an engraved NVIDIA claw logo in the glass that looked really cool. When I was running tests on it you could hear funny sounds of different frequencies. There was also a cooler mounted on top of the Tegra chip and I saw that some guys were even using external fans to "keep it cool" in case of some extensive testing.
Developing a driver for non-PC device was completely new experience for me. It comes with a few aspects that make it really different than just software coding. For example the test device was running some customized NVIDIA version of Android OS which had to be compiled and flashed there first. Sometimes when I was messing with the driver I managed to crash the OS or at least freeze it. The funny part is that if I tried to force-restart the device the boot failed since the driver was "broken". The only way how to fix it was to re-flash the OS.
Another unique aspect of coding the OpenGL driver is the portability. The most students in schools usually do not pay attention to portability aspects of their C/C++ code at all which is quite shame (I would totally try to compile code of my students on all platforms if I would be their C++ teacher! :). Anyways, I was not really well experienced in writing portable code and as you can imagine the driver has to be portable across all operating systems and all platforms you can possibly imagine. This includes all versions of Windows, the most Linux distributions, Mac OSX, and some embedded systems like QNX that are used in automotive sector. The problem is that you possibly cannot test your code locally on all platforms. Once I had a test failure and it turned out that the only failing system was Windows 2000.
The hard part of developing for so many platforms is that sometimes you have to "guess" what is wrong. Such guessing requires a lot of experience. One of those "guessing" problems involved pointer alignment. Some tests were just crashing without any errors. All I did was fixing some allocations/deallocations. Later I had to ask my mentor and after a while we realized that the alignment of the new allocations routines may not be same as before so I fixed the alignment and it worked. The problem was that the original allocations were not explicitly aligned — they just happened to be fine.
Another funny story from coding at NVIDIA is about "monster" files — files that are just crazy. For example I was using "Go to definition" command in my IDE very often and once I pressed the key and the IDE froze. It was some macro definition so I jumped to my web browser and tried to search the definition on the web source search tool and the Chrome web browser froze as well. In the meantime my IDE loaded up the file and it was a header file with more than 70 000 lines of code! Then I searched the whole driver source on my drive and I found that the longest file was more than 100 000 lines (35 MB)! Those files were usually generated but there were also very long files that were not generated or the specifications file that was used for generation was even longer.
Sometimes I was able to find some funny comments in the code.
The best one was probably: "// WTF is this?
".
I also saw a little conversation in the code like this:
"// We don't need this anymore
// Actually we still need this because of XXYY...
"
I also found a macro called TRACE_QUAKE
in the driver.
I am not sure what was the purpose of it but maybe there is a built-in Quake in the driver.
You just have to find a correct sequence of driver commands to run it :D
In the middle of my internship I was presenting some of my findings in the team meeting but I was not presenting at the end. The end of my internship came so soon but I kind-of finished my main task.
Meeting "famous" people
In the middle of my project and I was having a conversation with some NVIDIA employees and I noticed that they have quite strong opinion about certain things. Just out of my curiosity I searched their names. The first thing that came out was a Wikipedia page. I thought that is probably somebody else with the same name but the first line on the wiki was: "a graphics software engineer working at NVIDIA". And then I realized that it's actually him.
For anybody who is aware of computer graphics field I was talking with Mark Kilgard — the author of GLUT and lead author of NVIDIA's path rendering library. I was ashamed that I did not realize who he was right away. Another quite "famous" person I had a chance to work with was Gregory Roth. You can see his name on some OpenGL and GLSL specs.
NVIDIA is relatively old company; it has been around since first GPUs. You probably would not be surprised that there are some famous people from area of computer graphics still working there. I just did not realize this fact (and I am really bad in remembering names).
All-hands meetings
A great part of my internship were so called All-hands meetings. Those meetings are company-wide monthly meetings where you get to know road maps, what are people working on, what is coming and much more. Every team has its own so there were Tegra all-hands, desktop all-hands, etc. For non-NVIDIA people like me it's like a little window to the future. For example, I learned about (and saw) new Maxwell-based GPUs months before they were actually released. A demo with four 4K screens connected to a single GPU was really impressive.
NVIDIA works closely with other companies that are directly and indirectly dealing with their products. Once they were presenting highly confidential information about Windows Threshold — a "Windows 9" release that will have some cool features that depends on the driver.