Instructions for setting up OpenCL lab


So finally by the coming sem you are introducing OpenCl in the curriculum in your university… and thus  creating next generation of software developers!!   Before the students are back from their vacations you should be ready with your lab setup, so that immediately you can start your openCL lab sessions. Instructors face problem in setting up the lab and thus I have written a guide to help professors and lab instructors to setup  OpenCl lab:   Download Instructions_for_setting_up_OpenCl_Lab

If you think that the guide lacks some thing or some of the features are not working as described, kindly let me know.  Also I take no responsibility of any kind of information loss or damage during installation. It’s just that I did this way , and it worked well.  I will be improving the guide and shall upload the same.

HB in collaboration with AMD organized India’s First Faculty Enablement Program on OpenCL


The news is covered here as well:

http://www.varindia.com/1june2011_8.htm

http://www.efytimes.com/e1/63848/fullnews.htm

We often talk about time and space coordinates: (x,y,t), in science and engineering. And I think Coimbatore Institute of Technology (CIT, our host institute) was at the best (x,y,t) coordinate for India’s First Faculty Enablement Program on GPU Computing using OpenCl, organized from May 12th to 14th 2011.

This was the first such program in India after the inception of OpenCL API by Khronos group.  This event was the stepping stone in the R & D in HPC in India as this was being attended by professors and lecturers from at least 25 top colleges from South India. These professors in turn have introduced the courses on Parallel/GPU computing using OpenCL in the curriculum at their respective universities. And from the coming semesters thousands of students in India will be learning the art of GPU Computing using OpenCL. Keeping in view these facts AMD has already started a fully dedicated page on its website, OpenCl Zone, which is aimed at promoting GPU computing and helping researchers & professionals solve their problems in minimum possible time. The fact is everybody is enthusiastic about the development in GPU computing, especially using OpenCL. Interestingly NVIDIA’s CEO Jen Hsun Huang admitted in an interview few months back, that NVIDIA is one of the most enthusiastic supporters of OpenCL, (despite having a big success with NVIDIA CUDA).

In past 3 years I have been involved in GPU Computing (using NVIDIA CUDA & OpenCL) education and project consulting and mostly targeted professionals and students who were engaged in some kind of research activities and wanted acceleration in their code at little or no extra expense. This event, however, was a very important event for me because of the reason that this was the first time I was going to address close to 60 professors from different universities at the same location. In my college days I was taught/mentored by some of the noblest teachers and therefore I have lots of regards for teaching community in general. Even for few professors who can be compared to “Professor Virus” (remember 3 Idiots??) :-), I have lots of regards, because after all I have learnt at least something from them.
It was my first journey to Coimbatore. As expected the atmosphere was very pleasant, and at last I got some relief from scorching heat of my home land called Dilli or Delhi. Before leaving from New Delhi, Google has informed me a lot about the City, which is famous for textile Industry.


At 11:30 in the morning, I reached Coimbatore Airport. This looks a small airport with mostly south Indian staffs. People were very courteous and generous, though they were either speaking English or their local regional language (Tamil I guess). I was welcomed by my friend Velu Jayaprakash, Developer Relations Manager from AMD, Bangalore at the airport, and together we reached The Residency where we were supposed to stay for the next 3 days.

After having some rest followed by Lunch in the hotel, we planned to visit the college and setup the lab for OpenCL training the next day. We reached CIT at around 3 PM. After meeting the staffs, including the HoDs and other key people at CIT, we went directly to the lab where we were supposed to have the hands on sessions the next day.
We setup the lab and tested it by running some sample codes from AMD APP SDK and by running some stand alone OpenCl code.

We had a single system (running RHEL 5.5) with an AMD GPU, which we connected in LAN with other PCs (running Windows). The Clients running Windows could access the main PC with GPU using putty (an SSH client). We created several users and tested our code through the clients. The system was working fine.

At 7 PM we are in Hotel again. After dinner I went through the PPT slides of Day-1. Day-1 was supposed to be an introductory session on Parallel Computing, GPU Computing and OpenCL.

Its 10 AM at CIT Seminar hall. Inauguration ceremony has started and chief guests are now addressing the participants.
Its 1 PM now, and we all are rushing towards dining hall for lunch. This is one of the most unique and memorable lunch, I have ever had.  We were served Dal, Chawal, chatni and other delicious stuffs on a big Banana leaf.  This is the custom of South India and I liked it a lot.

The Lunch is over now and its time for much awaited technical session. My friend Velu started the session by giving a short presentation on GPU Computing. His slides were excellent and thanks to the power of AMD Fusion APU in his laptop, it was full of very appealing graphics and animations.

Velu introduced me to the audience and now the dice is mine!   I started with a brief description of why parallel computing is needed and gradually proceeded towards technical details of GPU Computing. I was very happy that for many professors GPU Computing was not a new word.

The content for Day-1 was:
What is Parallel Computing?
–Necessity or Luxury?
–Opportunities Vs Challenges
Fundamentals of GPU Computing
–Why use GPUs
–Basic differences between GPUs and CPUs
–The APU (Accelerated Processing Unit)
What is OpenCL ?
–Introduction to the language
–Getting started with APP SDK 2.4, installation, configuration etc
–Sample programs walk through

Hands on Session

We deviated somewhat from this flow. We were supposed to have a hands on session the first day but as we did not get sufficient time we postponed it to Day-2

I started the session by asking a question from the audience:

Problem: We want to write a C program to find the square of first 1Million integers as quickly as possible. Which processor will you choose to program?
– A single core CPU running at 5 GHz, or
–A Dual Core CPU running at 2.5 GHz

Many interesting answers were given by the audience, and finally I got the answer what I wanted to hear: A Dual Core CPU running at 2.5 GHz.  Single core processor running at 5 GHz will be converting most of the input energy into heat, which obviously you don’t want. But do we really know how to program a multicore processor? It’s where OpenCl comes and makes the life of programmers easier by providing necessary abstractions and portability.

I then proceeded for further details and discussed a couple of examples where Parallel Computing can be extremely helpful. Specifically an example of reduction and its parallel implementation was sufficient to convince my audience that parallel computing can do wonders, if applied properly. Moreover, it is very important to remember that parallel computing can do wonders, but we must find the areas where it should be applied.  Word processor for example may not be a good problem to be ported on GPUs, as the human beings are happy and content with the speed at which word processor processes data (at least at present). However, a video transcoder for Blue Ray HD movie is something that must be ported on 100s (rather 1000) of processor core of a GPU. These applications take many hours to processes data on serial fashion. If this is done on GPUs in parallel using OpenCL, it can accelerate the process by many folds.

Anyways, the first day went well. I could see lots of enthusiasm in the audience right at the end of the first day. Everyone agreed that GPU computing using OpenCL is a game changer, paradigm shift, a breakthrough technology, a must have for technology leaders and strategists, and this is the right time to invest time and money (money?? OpenCL is free, but yes time is money!).


The next day we scheduled the hands-on session and people learned the flow of a typical OpenCL program:

Kernel
Resource setup
Building the OpenCL program
Memory transfer (host to device)
Setting Kernel arguments
Launching the kernel
Memory transfer (Device to Host)

Also we demonstrated various tools for performance optimization; kernel analyzers, AMD APP Profiler and Code Analyst.

Although the third day was scheduled only for general discussions, syllabus and some adventures, we had one more hands on session in the first half as due to some problem with the server few people were unable to run OpenCL code on the day-2.  By the first half of the day 3 all the participants were equipped with the following information:

1-  What is GPU Computing?

2-  Why OpenCL?

3-  Getting Started with OpenCL

4-  Memory Model (Clever techniques for speeding up matrix multiplication)

5-  Running samples from AMD APP SDK 2.4

6-  Familiarity with performance optimization tools

7-  Compiling and running an OpenCL program in Linux

8-  Concept of work items and work groups

9-  How OpenCL Kernel is written?

10- Research opportunities


In this workshop we did not discuss much about the optimization except a brief overview of local memory usage for increasing performance in matrix multiplication.

After the sessions we had a detail discussion on syllabus and some research areas where we should focus.
At the end we distributed the certificates to the participants and gifts and had a group photo session.

Over all it was a very nice experience interacting with the professors and indeed it was an opportunity for me to learn about their recent advancement in research and development activities. Also throughout the workshop and since it was being planned, Velu’s invaluable suggestions and helps, made this event a success.

We got very good feedback from the participants. The fact is it was an introductory workshop. Generally when we do workshop it covers topics from elementary to advance, and is more focused on optimization part.
In our future workshop we plan to have the following flow:

Module One

1-Review of few important C & C++ concepts
C Programming basics
Building C Program on Linux, Linux basics, Compiling, Running a Program
User Defined Data Type
Array
Pointer
Dynamic Memory Allocation: malloc(),free() etc
General concepts of Object Oriented Programming

2- Introduction to Multi-core/ GPU Computing
Why GPU/Parallel Computing?
Heterogeneous computing
Basic differences between GPUs and CPUs
The Accelerated Processing Unit (APU)?
The processor of 2020
Alternatives to Parallel Computing

3- Getting started with the OpenCL program (Lab)
The software Development Environment and Tools
Requirements
Installing on Windows
Installing on Linux
The first program: Hello World!
Compilation (on Linux and Windows)

Module Two

1-Introduction to parallel programming
Algorithms
Task and data decomposition
Parallel computing
Software models
Hardware architectures
CPU-GPU Communication (PCI-Express Vs PCI )
Parallel Programming Challenges

2- OpenCL Architecture
Comparison with other programming models
Platform Model
Execution Model
Memory Model
Programming Model

3- OpenCL programming in detail (Lab Exercises)
Image Convolution
Matrix Multiplication

Module Three

1-Architecture of some recent CPUs and GPUs
Intel Dual Core Processors
Nvidia Fermi
AMD Fusion
Cell Broadband Engine

2- OpenCL C programming language detail
Supported features
Restrictions
3-Understanding GPU memories
Bank Conflicts
Memory Coalescing
4-AMD Accelerated Parallel Processing Math Libraries (APPML)
5- Lab Exercises based on above concepts

Module Four

1-Advance concepts
Debugging
Event Timing and Profiling
Threading and Scheduling
Programming Multi Device
2- General Optimization Tips
3- Lab exercises
Matrix Transpose
Gaussian Noise
Sobel Edge detection

Module Five
Full day hands-on/ demonstration sessions and revision
Brief demonstrations on the following:
FFT
N-body Simulation
Application in Artificial Neural Network
Image/Video Processing
Revision

Assignments

Note that each day we have hands on sessions so as to make the workshop more interactive and useful. For further information please visit www.hbconsort.com or send specific queries to info@hbeongpgpu.com