Computer Vision -1: Getting comfortable with technology



The story is just some time back, probably last year. When I was delivering a lecture on Open Source and Computer Programming with 'C', where I explained to them about open source and computer vision. This entire article is a narration of the events, where I was explaining to students about computer vision and how they can implement computer vision by just learning a new library OpenCV. How a 'C' programming lecture became a new discussion forum for students to understand computer vision technology and how students can learn computer vision just by experimenting with things. This article will be a series of articles where I will let readers at ease with computer vision algorithms.


Me: Have you ever clicked a photograph?

Students started shouting Yes sir many times...

Me: Ok. So what is the difference between a human eye and computer a camera?

Ankur: They both do some things.

Me: Good. Now tell me how you differentiate between a glass and a cup?

Ankur: Sir, it's our mind which analyses the images which it receives and then tells us the difference.

Me: How does a child of a year old do the same?

Ankur: Donno sir.

Me: We train the child and help him in differentiating things.

Me: I then asked. How many of you know how a robot sees and analyses its surroundings?

Abhishek: It's the cameras installed in robots.

Me:  Then how does it identify between a human and a cup?

Abhishek: I think it is by the comparison of the pictures stored in the memory of the robot, which it matches and gives us the difference.

Me: Wow!! you seemed to have a little bit of understanding of it.

 

Then I started the topic of computer vision. It is a field of science, which extracts information from images and gives vision to machines.


Then I told the class that there will be 3 subsequent lectures and will be divided into 3 sections. First I will teach you about computer vision and image processing and then will tell you how you can implement computer vision in 'C' programming language using external libraries.


Getting a feel of artificial vision


So Students!! Let me give you the basic definition of computer vision, which is a field of science which deals with image processing and extracting meaningful information from those images after capture.

Let's take an example. Think of a place having CCTV cameras installed on a street. The use of that camera is to identify the traffic and take note of any untoward incident on that street. Those CCTV cameras are there for manual surveillance. Suppose a speedy car rams over another car and the culprit runs away. Then we have to wait for the incident to be reported to the police and then the police will take action by analyzing the captured video or by patrolling in the area and asking people about the incident.  This consumes a lot of time.

One of the solutions to such a problem could be that if we could install a vision system to analyze situations like this and when something of that order happens, the number plate of the car or photograph of the car or something peculiar about the incident can be noted and an alert can be raised. An SMS can be sent to the police control room and also to local police patrol vehicles. This will help police to capture the culprit easily and in less time and money.


This is one of the applications of computer vision. There are numerous other applications of computer vision systems. Some of them are like monitoring production in an industry, managing traffic, counting people for visual surveillance, etc.


I hope you have got enough idea about computer vision. Figure 1 displays one of the vision algorithm applications called template matching. Figure 1(a) Displays the result of the template matching algorithm by using computer vision. A red rectangle around an image shows the result of the template matching algorithm. Figure 2(b) Displays an image that is to be found in figure 1(a).




Figure 1(a) A collage of different images (b) Image to be found


Understanding  the library 


Me: Students are you now ready for the show!!

All jumped with joy!!

Me: I will now be explaining to you about the vision library. We call it “OPENCV”. Opencv is a computer vision library developed by Intel and its first release was in 1999. It is an open-source computer vision library for real-time image processing. Its current release is 2.2 as of March 2011. Its library includes basic image processing functions, segmentation functions, machine learning algorithms, camera calibration, gesture recognition, ego-motion, face recognition, object identification, and much more. While learning OpenCV one must be comfortable in one of the programming languages namely C, C++, or python. One must have a basic understanding of underlying data structures like matrix, structures, and function calls.  If you want to learn more about OpenCV you can learn it from the following link http://opencv.willowgarage.com/wiki/ or http://sourceforge.net/projects/opencvlibrary/ and if you are interested in some other resources they are given in the references.


Let's get our hands dirty!!

Amit: On which operating system we can run OpenCV?

Me: It can run on all major operating systems like Linux, Windows, and Mac.

Amit: Will you be telling us how to install it?

Me: Yes, certainly.

Students: Sir! Some of us know Linux and some of us know only windows. Will you be explaining to us both of the operating systems.

Me: No. I will be explaining to you how one can install it on Linux only. If you want to install it on a Windows operating system, you can read from http://opencv.willowgarage.com/wiki/InstallGuide.


Let's first see how we can install it on the Linux operating system. I will use Ubuntu 10.10(Maverik). You must have an Internet connection too as a prerequisite. I will now demonstrate the steps one by one.

Steps:

First click on system->Administration->Synaptic Package Manager. Supply the root password.




Figure 2: Synaptic Package Manager


Type OpenCV in a quick search textbox. It will then show packages as in figure 2. Click on the following packages and mark them for installation:-

  1. OpenCV-doc [optional]

  2. libcv2.1

  3. libhighgui2.1

  4. libcvaux2.1

  5. libcv-dev

  6. libcvaux-dev

  7. libcvhighgui-dev


After marking all of them click on apply for installation. When installation gets completed you can run the following command shown in figure 3 to verify your installation.



Figure 3: Terminal Command for verification




This will show you directories where your include files are this will help you to identify all header files.


Structure

Opencv comprises a bundle of image processing algorithms to help end-users in programming. As a programmer, we must understand the structure of OpenCV. OpenCV components are described in figure 4. The structure contains four components like computer vision algorithms(cv), machine learning algorithms, highgui algorithms and cxcore algorithms. Some of the major functions that computer vision algorithms provide us are image filtering, image transformations, feature detection, motion analysis, structural analysis, object detection, and camera calibration whereas the highgui component provides functions for creating the user interface, reading/writing images and videos. The Cxcore component of OpenCV contains basic data structures for images and video, drawing functions and many more.

We will now study basic image components and how these components can be altered with the help of OpenCV. Let’s first understand an image and its components. Image is a collection of lots of information stored in the form of pixels. A colored picture can be stored in any of the following formats like RGB, CMYK or HSV. Channel is the number of colors available in an image. RGB (RED, GREEN and BLUE) is a 3 channel image, where each channel requires an 8bit value. CMYK (CYAN, MAGENTA, YELLOW, BLACK)is a 4 channel image and lastly HSV(Hue Saturation Value) is a 4 channel image which is the same as RGB but with one addition and that is alpha or brightness.   



Figure 4: Structure of OpenCV


I hope now you'll have got much understanding of computer vision, openCV and its components. We will now study our first program of OpenCV and learn how we can execute it?


My first program with OpenCV!!

Hey all let's now begin with our first program in OpenCV. We have used gedit to write our first program.

--------------------------------CODE----------------------------------

#include "highgui.h"

#include "stdio.h"

int main(int argc, char** argv)

{

if(argc!=2)

{

printf("Syntax: disp_image image-name\n");

exit(-1);

}

IplImage *image;

image=cvLoadImage(argv[1], CV_LOAD_IMAGE_UNCHANGED);

cvNamedWindow("Output1", CV_WINDOW_AUTOSIZE);

cvShowImage("Output1",image);

cvWaitKey(0);

cvReleaseImage(&image);

cvDestroyWindow("Output1");

return 0;

}

------------------------CODE----------------------------------


Listing 1: Image display program

Listing 1 displays our first program in OpenCV to display an image. We will explain each line one by one. As discussed earlier, highigui.h is a header file that includes functions that are used for reading images. Functions like cvLoadImage,  cvNamedWindow, cvShowImage, cvReleaseImage and cvDestroyWindow. Let’s understand these functions one by one.


  1. IplImage* cvLoadImage(const char* filename,

int iscolor = CV_LOAD_IMAGE_COLOR)

cvLoadImage takes two arguments: filename and color. iscolor has three possible values namely CV_LOAD_IMAGE_COLOR, CV_LOAD_IMAGE_GRAYSCALE,

CV_LOAD_IMAGE_UNCHANGED. It loads an image.

  1. int cvNamedWindow(const char* name, int flags) takes two arguments one is the name of the window, which is user-defined and secondly CV_WINDOW_AUTOSIZE flag which is used to set the width and height of window according to image size. It is used to create a window with its title.

  2. void cvShowImage(const char* name, const CvArr* image) takes two arguments namely the name of the image and image structure. It allocates memory for images.

  3. void cvReleaseImage(IplImage **image) takes image as argument. It deallocates memory for images.

  4. void cvDestroyWindow(const char*name) takes an argument as a name of a window to be destroyed.

  5. int cvWaitKey(int delay=0)  takes delay in milliseconds. If we supply zero, it means that the program has to wait till the user presses any key.


The program works like this: first, we created a placeholder for the image i.e, IplImage structure. Afterward, we load the image and do not change anything in it. We can change the colour of the image by changing the iscolor option to CV_LOAD_IMAGE_GRAYSCALE. Next, we create a small window titled Output and set its flags to auto-size so that as we load an image in the window the size of the window is automatically adjusted. Thereafter, we display the image and wait till the user presses any key to exit. In the end, we destroyed all the windows and released all the space occupied by the image..


How to compile and execute the program?

We have used the command shown in figure 5 for creating a binary executable file.



Figure 5: Command to compile and create a binary output file.






The output of the program is given in figure 6.



Figure 6: Output of Listing 1



Let's get back to our conversation:-

Me: This example explained to you how in just 7 lines you can display a picture.

Amit: Sir! Can we do some basic operations on images like smoothing, converting an image to grayscale, inverting it etc.

Me: Yes. We can do it very easily. In my subsequent lectures, I will demonstrate these things. So students I hope you have got enough understanding of computer vision and OpenCV.


In my next article, I will tell readers how one can perform filtering on images and with some interesting facts.



Gaurav Parashar (Faculty)

Computer Science Department

KIET Group of Institutions

Comments

Popular posts from this blog

QUANTUM STORAGE & MEMORY

QUANTUM COMPUTING: CAN FIGHT CLIMATE CHANGE

BLOCKCHAIN: HOW IT WORKS AND WHY SO POPULAR