AI on the Edge LESSON 27: Track Objects of Interest in OpenCV Using Contours
Hey everyone, Paul McWhorter here from TopTechBoy.com. Welcome back to our channel, where we learn to build real, intelligent systems on edge hardware. Grab yourself a nice hot cup of coffee or a cold glass of iced tea, because today we are taking a massive leap forward in our computer vision journey.
Up until now, we have learned how to configure our cameras, calculate frame rates smoothly, and isolate specific objects based on color using the HSV color space. We built beautiful masks and composite images that show only our target color. But let’s be honest with ourselves: a mask is just a collection of white pixels on a black screen. The computer doesn’t actually know where the object is, how big it is, or how to follow it if it moves.
In this lesson, we are going to fix that. We are going to teach the machine to look at our mask, isolate the single biggest shape of interest, ignore the background noise, and draw a real-time bounding tracking box around it. This is true object tracking.
The Core Concept: What is a Contour?
Think of a contour as a mathematical boundary line. When OpenCV looks at a binary mask (where your target object is white and everything else is black), a contour is the continuous line that traces the outer edge of that white shape.
The beauty of contours is that they turn a chaotic cloud of thousands of isolated pixels into structured, manageable vector shapes. Once OpenCV finds these shapes, it can calculate their physical properties, such as their area, perimeter, and exact center.
The Three Steps to Algorithmic Object Tracking
To turn a raw camera frame into a fully tracked target, our script follows a strict three-part engineering pipeline inside our main execution loop:
1. Extracting Every Boundary
First, we pass our binary mask into OpenCV’s contour detection engine. We configure it to use external retrieval, meaning it will ignore any hollow holes inside the object and only trace the outermost boundary. It returns a list of every single contour it finds in the frame.
2. Hunting for the Largest Target
In the real world, your camera view is never perfectly clean. Even with an excellent HSV color mask, you will get random speckles, reflections, or background noise showing up as tiny white dots on your mask. If we tried to track everything, our program would lose its mind. To solve this, we use a Python maximization function to scan our list of contours and extract the absolute largest one based on its physical area.
3. Setting an Area Noise Floor
Even after finding the largest contour, what happens if your object completely leaves the camera view? The largest remaining “object” might be a tiny, single-pixel spec of static noise on the edge of the screen. To prevent our tracking box from jumping around erratically, we establish a strict structural threshold—a noise floor. If the area of the largest contour isn’t big enough to confidently be our target, we ignore it completely.
Drawing the Bounding Box
Once we have successfully isolated our valid, large contour, we don’t just want to draw a messy, squiggly line around it. We want clean coordinates that an automation system or a robotic pan-tilt kit could actually use to follow the target.
We pass our largest contour into a bounding rectangle function. OpenCV automatically calculates the exact mathematical limits of that shape and returns four precise numbers:
-
-
X: The horizontal starting pixel coordinate of the object.
-
Y: The vertical starting pixel coordinate of the object.
-
W: The total width of the object in pixels.
-
H: The total height of the object in pixels.
-
With those four dimensions locked down, we use a standard drawing function to overlay a crisp, green rectangle directly onto our live color camera feed. Now, as you move your object around the room, the box follows it dynamically, tracking its position in real time at high frame rates.
Note you will have to tune the LC and UC parameters for your object of interest, as we showed last week.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
import cv2 import time from picamera2 import Picamera2 from fusion_hat.pwm import PWM piCam = Picamera2() W=1280 H=720 tStart = time.time() fps = 0 redPin = 5 greenPin = 6 bluePin = 7 redLED = PWM(redPin) greenLED = PWM(greenPin) blueLED = PWM(bluePin) RES = (W,H) piCam.preview_configuration.main.size = RES piCam.preview_configuration.main.format = "RGB888" piCam.preview_configuration.controls.FrameRate=60 piCam.preview_configuration.align() piCam.configure("preview") piCam.start() textLowerLeft = (int(W*.01),int(H*.06)) fontFace = cv2.FONT_HERSHEY_SIMPLEX fontThickness = int(W/425) fontScale = H*.0015 fontColor = (0,0,255) xPos = 0 textLowerLeft1 = (int(W*.01),int(H*.06)*2) textLowerLeft2 = (int(W*.01),int(H*.06)*3) yPos = 0 valR = 0 valG = 0 valB = 0 Hue = 0 Sat = 0 Val = 0 LC = (25,100,100) UC = (32,255,255) frame = None def mouseAction(event, x, y, flags, param): global frame, xPos, yPos, Hue, Sat, Val if event == 0: xPos = x yPos = y if frame is not None: valB, valG, valR = frame[y,x] redLED.pulse_width_percent(int(valR/255*100)) greenLED.pulse_width_percent(int(valG/255*100/2)) blueLED.pulse_width_percent(int(valB/255*100/4)) frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV) Hue, Sat, Val =frameHSV[y,x] cv2.namedWindow('Camera',cv2.WINDOW_GUI_NORMAL) cv2.moveWindow('Camera',0,65) cv2.resizeWindow('Camera',W,H) cv2.namedWindow('Mask',cv2.WINDOW_GUI_NORMAL) cv2.moveWindow('Mask',W,65) cv2.resizeWindow('Mask',int(W/2),int(H/2)) cv2.namedWindow('Composite',cv2.WINDOW_GUI_NORMAL) cv2.moveWindow('Composite',W,65+int(H/2)+25) cv2.resizeWindow('Composite',int(W/2),int(H/2)) cv2.setMouseCallback('Camera',mouseAction) while True: deltaT = time.time() - tStart tStart=time.time() fps = fps*.95 + (1/deltaT)*.05 frame= piCam.capture_array() frame=cv2.flip(frame,-1) frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV) mask=cv2.inRange(frameHSV,LC,UC) composite = cv2.bitwise_and(frame, frame, mask=mask) contours, _ =cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) if contours: #cv2.drawContours(frame,contours,-1,(255,0,0),3) largestContour = max(contours, key = cv2.contourArea) area = cv2.contourArea(largestContour) if area>150: #cv2.drawContours(frame,largestContour,-1,(255,0,0),3) x, y, w, h = cv2.boundingRect(largestContour) cv2.rectangle(frame, (x,y),(x+w,y+h),(0,255,0),3) myText = "FPS: "+str(round(fps,1)) cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness) text1 = "Mouse Pos: "+str((xPos,yPos)) text2 = "Pixel Color: "+str((Hue,Sat,Val)) cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness) cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness) cv2.imshow("Camera", frame) cv2.imshow("Composite",composite) cv2.imshow("Mask",mask) if cv2.waitKey(1)==ord('q'): break cv2.destroyAllWindows() redLED.pulse_width_percent(0) greenLED.pulse_width_percent(0) blueLED.pulse_width_percent(0) print('Program Terminated') |
