Tag Archives: ROI

AI on the Edge LESSON 27: Track Objects of Interest in OpenCV Using Contours

June 25, 2026 admin

AI on the Edge LESSON 27: Track Objects of Interest in OpenCV Using Contours

Hey everyone, Paul McWhorter here from TopTechBoy.com. Welcome back to our channel, where we learn to build real, intelligent systems on edge hardware. Grab yourself a nice hot cup of coffee or a cold glass of iced tea, because today we are taking a massive leap forward in our computer vision journey.

Up until now, we have learned how to configure our cameras, calculate frame rates smoothly, and isolate specific objects based on color using the HSV color space. We built beautiful masks and composite images that show only our target color. But let’s be honest with ourselves: a mask is just a collection of white pixels on a black screen. The computer doesn’t actually know where the object is, how big it is, or how to follow it if it moves.

In this lesson, we are going to fix that. We are going to teach the machine to look at our mask, isolate the single biggest shape of interest, ignore the background noise, and draw a real-time bounding tracking box around it. This is true object tracking.

The Core Concept: What is a Contour?

Think of a contour as a mathematical boundary line. When OpenCV looks at a binary mask (where your target object is white and everything else is black), a contour is the continuous line that traces the outer edge of that white shape.

The beauty of contours is that they turn a chaotic cloud of thousands of isolated pixels into structured, manageable vector shapes. Once OpenCV finds these shapes, it can calculate their physical properties, such as their area, perimeter, and exact center.

The Three Steps to Algorithmic Object Tracking

To turn a raw camera frame into a fully tracked target, our script follows a strict three-part engineering pipeline inside our main execution loop:

1. Extracting Every Boundary

First, we pass our binary mask into OpenCV’s contour detection engine. We configure it to use external retrieval, meaning it will ignore any hollow holes inside the object and only trace the outermost boundary. It returns a list of every single contour it finds in the frame.

2. Hunting for the Largest Target

In the real world, your camera view is never perfectly clean. Even with an excellent HSV color mask, you will get random speckles, reflections, or background noise showing up as tiny white dots on your mask. If we tried to track everything, our program would lose its mind. To solve this, we use a Python maximization function to scan our list of contours and extract the absolute largest one based on its physical area.

3. Setting an Area Noise Floor

Even after finding the largest contour, what happens if your object completely leaves the camera view? The largest remaining “object” might be a tiny, single-pixel spec of static noise on the edge of the screen. To prevent our tracking box from jumping around erratically, we establish a strict structural threshold—a noise floor. If the area of the largest contour isn’t big enough to confidently be our target, we ignore it completely.

Drawing the Bounding Box

Once we have successfully isolated our valid, large contour, we don’t just want to draw a messy, squiggly line around it. We want clean coordinates that an automation system or a robotic pan-tilt kit could actually use to follow the target.

We pass our largest contour into a bounding rectangle function. OpenCV automatically calculates the exact mathematical limits of that shape and returns four precise numbers:

- X: The horizontal starting pixel coordinate of the object.
- Y: The vertical starting pixel coordinate of the object.
- W: The total width of the object in pixels.
- H: The total height of the object in pixels.

With those four dimensions locked down, we use a standard drawing function to overlay a crisp, green rectangle directly onto our live color camera feed. Now, as you move your object around the room, the box follows it dynamically, tracking its position in real time at high frame rates.

Note you will have to tune the LC and UC parameters for your object of interest, as we showed last week.

import cv2
import time
from picamera2 import Picamera2
from fusion_hat.pwm import PWM
piCam = Picamera2()
W=1280
H=720
tStart = time.time()
fps = 0

redPin = 5
greenPin = 6
bluePin = 7
redLED = PWM(redPin)
greenLED = PWM(greenPin)
blueLED = PWM(bluePin)

RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.06))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)
xPos = 0
textLowerLeft1 = (int(W*.01),int(H*.06)*2)
textLowerLeft2 = (int(W*.01),int(H*.06)*3)
yPos = 0
valR = 0
valG = 0
valB = 0

Hue = 0
Sat = 0
Val = 0

LC = (25,100,100)
UC = (32,255,255)

frame = None
def mouseAction(event, x, y, flags, param):
    global frame, xPos, yPos, Hue, Sat, Val
    if event == 0:
        xPos = x
        yPos = y
        if frame is not None:
            valB, valG, valR = frame[y,x]
            redLED.pulse_width_percent(int(valR/255*100))
            greenLED.pulse_width_percent(int(valG/255*100/2))
            blueLED.pulse_width_percent(int(valB/255*100/4))
            frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
            Hue, Sat, Val =frameHSV[y,x]
cv2.namedWindow('Camera',cv2.WINDOW_GUI_NORMAL)
cv2.moveWindow('Camera',0,65)
cv2.resizeWindow('Camera',W,H)

cv2.namedWindow('Mask',cv2.WINDOW_GUI_NORMAL)
cv2.moveWindow('Mask',W,65)
cv2.resizeWindow('Mask',int(W/2),int(H/2))

cv2.namedWindow('Composite',cv2.WINDOW_GUI_NORMAL)
cv2.moveWindow('Composite',W,65+int(H/2)+25)
cv2.resizeWindow('Composite',int(W/2),int(H/2))

cv2.setMouseCallback('Camera',mouseAction)

while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    
    frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
    mask=cv2.inRange(frameHSV,LC,UC)
    composite = cv2.bitwise_and(frame, frame, mask=mask)
    
    contours, _ =cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
    if contours:
        #cv2.drawContours(frame,contours,-1,(255,0,0),3)
        largestContour = max(contours, key = cv2.contourArea)
        area = cv2.contourArea(largestContour)
        if area>150:
            #cv2.drawContours(frame,largestContour,-1,(255,0,0),3)
            x, y, w, h = cv2.boundingRect(largestContour)
            cv2.rectangle(frame, (x,y),(x+w,y+h),(0,255,0),3)
    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    
    text1 = "Mouse Pos: "+str((xPos,yPos))
    text2 = "Pixel Color: "+str((Hue,Sat,Val))
    cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness)    
    cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness)    
    cv2.imshow("Camera", frame)
    cv2.imshow("Composite",composite)
    cv2.imshow("Mask",mask)

    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
redLED.pulse_width_percent(0)
greenLED.pulse_width_percent(0)
blueLED.pulse_width_percent(0)
print('Program Terminated')

100

101

102

103

104

105

106

107

108

109

110

import cv2

import time

from picamera2 import Picamera2

from fusion_hat.pwm import PWM

piCam = Picamera2()

W=1280

H=720

tStart = time.time()

fps = 0

redPin = 5

greenPin = 6

bluePin = 7

redLED = PWM(redPin)

greenLED = PWM(greenPin)

blueLED = PWM(bluePin)

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.06))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

xPos = 0

textLowerLeft1 = (int(W*.01),int(H*.06)*2)

textLowerLeft2 = (int(W*.01),int(H*.06)*3)

yPos = 0

valR = 0

valG = 0

valB = 0

Hue = 0

Sat = 0

Val = 0

LC = (25,100,100)

UC = (32,255,255)

frame = None

def mouseAction(event, x, y, flags, param):

global frame, xPos, yPos, Hue, Sat, Val

if event == 0:

xPos = x

yPos = y

if frame is not None:

valB, valG, valR = frame[y,x]

redLED.pulse_width_percent(int(valR/255*100))

greenLED.pulse_width_percent(int(valG/255*100/2))

blueLED.pulse_width_percent(int(valB/255*100/4))

frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)

Hue, Sat, Val =frameHSV[y,x]

cv2.namedWindow('Camera',cv2.WINDOW_GUI_NORMAL)

cv2.moveWindow('Camera',0,65)

cv2.resizeWindow('Camera',W,H)

cv2.namedWindow('Mask',cv2.WINDOW_GUI_NORMAL)

cv2.moveWindow('Mask',W,65)

cv2.resizeWindow('Mask',int(W/2),int(H/2))

cv2.namedWindow('Composite',cv2.WINDOW_GUI_NORMAL)

cv2.moveWindow('Composite',W,65+int(H/2)+25)

cv2.resizeWindow('Composite',int(W/2),int(H/2))

cv2.setMouseCallback('Camera',mouseAction)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)

mask=cv2.inRange(frameHSV,LC,UC)

composite = cv2.bitwise_and(frame, frame, mask=mask)

contours, _ =cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)

if contours:

#cv2.drawContours(frame,contours,-1,(255,0,0),3)

largestContour = max(contours, key = cv2.contourArea)

area = cv2.contourArea(largestContour)

if area>150:

#cv2.drawContours(frame,largestContour,-1,(255,0,0),3)

x, y, w, h = cv2.boundingRect(largestContour)

cv2.rectangle(frame, (x,y),(x+w,y+h),(0,255,0),3)

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

text1 = "Mouse Pos: "+str((xPos,yPos))

text2 = "Pixel Color: "+str((Hue,Sat,Val))

cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness)

cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Composite",composite)

cv2.imshow("Mask",mask)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

redLED.pulse_width_percent(0)

greenLED.pulse_width_percent(0)

blueLED.pulse_width_percent(0)

print('Program Terminated')

AI On the Edge, Raspberry Pi

AI on the Edge LESSON 24: Processing Mouse Events in OpenCV on Pi 5

June 12, 2026 admin

Welcome back, everyone! In our last lesson, we learned how to use matrix slicing to hardcode a Region of Interest (ROI) into our frames. That was a great static approach, but today we are taking interactivity to a whole new level.

In this lesson, you are going to learn how to catch Mouse Events inside your OpenCV windows. Instead of guess-and-checking coordinates in your code, you will be able to click anywhere on your live video stream to instantly grab the precise (x, y) pixel coordinates and read the exact color value of the pixel right under your mouse pointer. This is the foundational mechanic you need to build interactive, point-and-click AI applications.

The Core Concept: Mouse Callbacks and Global Frames

To listen for mouse clicks or movement, OpenCV uses what is called a Callback Function. You tell OpenCV: “Hey, keep an eye on this specific window. If the user does anything with the mouse inside it, instantly jump over to my custom function and tell me what happened.”

We set this up using:

cv2.setMouseCallback(‘Camera’, mouseAction)

The [y, x] Matrix Inversion Trap

There is a massive mathematical trap that catches almost every beginner when they start mapping mouse clicks to image matrices:

OpenCV Mouse Coordinates: When you move your mouse, OpenCV tracks position using standard Cartesian geometry: (x, y), where x is the column (horizontal distance from the left) and y is the row (vertical distance from the top).
NumPy Array Coordinates: When you plug those numbers into your image array to inspect a pixel, NumPy expects matrix indexing: [row, column].

Because rows correspond to the height (y) and columns correspond to the width (x), you must always invert the coordinates when accessing the frame array:

If you try to pass frame[x, y], your program will either crash with an “index out of bounds” error or return data from the completely wrong part of the image!

The Python Code Developed in This Lesson

Here is the complete, streamlined script we built during today’s tutorial. Copy this code into your workspace on your Raspberry Pi 5, fire it up, and watch your terminal output as you click around the video window.

We first developed this program as a simple example of processing mouse clicks, and print the detected event:

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=1280
H=720
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)
frame = None
def mouseAction(event, x, y, flags, param):
    global frame
    if frame is not None:
        print("Event: ",event, (x,y), frame[y,x])

cv2.namedWindow('Camera')
cv2.setMouseCallback('Camera',mouseAction)

while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.moveWindow("Camera",0,60)
    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=1280

H=720

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

frame = None

def mouseAction(event, x, y, flags, param):

global frame

if frame is not None:

print("Event: ",event, (x,y), frame[y,x])

cv2.namedWindow('Camera')

cv2.setMouseCallback('Camera',mouseAction)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.moveWindow("Camera",0,60)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

In order to make the program more useful, we developed this code that monitors the position of the mouse cursor, and reports the color of the pixel the mouse points at. The values are printed as labels on the openCV frame:

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=1280
H=720
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.06))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)
xPos = 0
textLowerLeft1 = (int(W*.01),int(H*.06)*2)
textLowerLeft2 = (int(W*.01),int(H*.06)*3)
yPos = 0
valR = 0
valG = 0
valB = 0
frame = None
def mouseAction(event, x, y, flags, param):
    global frame, xPos, yPos, valR, valG, valB
    if event == 0:
        xPos = x
        yPos = y
        if frame is not None:
            valB, valG, valR = frame[y,x]

cv2.namedWindow('Camera')
cv2.setMouseCallback('Camera',mouseAction)

while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    
    text1 = "Mouse Pos: "+str((xPos,yPos))
    text2 = "Pixel Color: "+str((valR,valG,valB))
    cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness)    
    cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness)    
    cv2.imshow("Camera", frame)
    cv2.moveWindow("Camera",0,60)
    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=1280

H=720

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.06))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

xPos = 0

textLowerLeft1 = (int(W*.01),int(H*.06)*2)

textLowerLeft2 = (int(W*.01),int(H*.06)*3)

yPos = 0

valR = 0

valG = 0

valB = 0

frame = None

def mouseAction(event, x, y, flags, param):

global frame, xPos, yPos, valR, valG, valB

if event == 0:

xPos = x

yPos = y

if frame is not None:

valB, valG, valR = frame[y,x]

cv2.namedWindow('Camera')

cv2.setMouseCallback('Camera',mouseAction)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

text1 = "Mouse Pos: "+str((xPos,yPos))

text2 = "Pixel Color: "+str((valR,valG,valB))

cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness)

cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.moveWindow("Camera",0,60)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

We can now take the project to the next level by setting the LED color to the color pointed at by the cursor in the openCV window. We will be using our standard circuit we have used in the earlier lessons.

Fusion Hat Circuit Diagram — This is the circuit we will use moving forward in the class

This is the code we developed to set the LED color based on the pixel position of the cursor in the openCV window.

import cv2
import time
from picamera2 import Picamera2
from fusion_hat.pwm import PWM
piCam = Picamera2()
W=1280
H=720
tStart = time.time()
fps = 0

redPin = 5
greenPin = 6
bluePin = 7
redLED = PWM(redPin)
greenLED = PWM(greenPin)
blueLED = PWM(bluePin)

RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.06))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)
xPos = 0
textLowerLeft1 = (int(W*.01),int(H*.06)*2)
textLowerLeft2 = (int(W*.01),int(H*.06)*3)
yPos = 0
valR = 0
valG = 0
valB = 0
frame = None
def mouseAction(event, x, y, flags, param):
    global frame, xPos, yPos, valR, valG, valB
    if event == 0:
        xPos = x
        yPos = y
        if frame is not None:
            valB, valG, valR = frame[y,x]
            redLED.pulse_width_percent(int(valR/255*100))
            greenLED.pulse_width_percent(int(valG/255*100/2))
            blueLED.pulse_width_percent(int(valB/255*100/4))

cv2.namedWindow('Camera')
cv2.setMouseCallback('Camera',mouseAction)

while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    
    text1 = "Mouse Pos: "+str((xPos,yPos))
    text2 = "Pixel Color: "+str((valR,valG,valB))
    cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness)    
    cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness)    
    cv2.imshow("Camera", frame)
    cv2.moveWindow("Camera",0,60)
    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
redLED.pulse_width_percent(0)
greenLED.pulse_width_percent(0)
blueLED.pulse_width_percent(0)
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

from fusion_hat.pwm import PWM

piCam = Picamera2()

W=1280

H=720

tStart = time.time()

fps = 0

redPin = 5

greenPin = 6

bluePin = 7

redLED = PWM(redPin)

greenLED = PWM(greenPin)

blueLED = PWM(bluePin)

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.06))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

xPos = 0

textLowerLeft1 = (int(W*.01),int(H*.06)*2)

textLowerLeft2 = (int(W*.01),int(H*.06)*3)

yPos = 0

valR = 0

valG = 0

valB = 0

frame = None

def mouseAction(event, x, y, flags, param):

global frame, xPos, yPos, valR, valG, valB

if event == 0:

xPos = x

yPos = y

if frame is not None:

valB, valG, valR = frame[y,x]

redLED.pulse_width_percent(int(valR/255*100))

greenLED.pulse_width_percent(int(valG/255*100/2))

blueLED.pulse_width_percent(int(valB/255*100/4))

cv2.namedWindow('Camera')

cv2.setMouseCallback('Camera',mouseAction)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

text1 = "Mouse Pos: "+str((xPos,yPos))

text2 = "Pixel Color: "+str((valR,valG,valB))

cv2.putText(frame,text1,textLowerLeft1,fontFace,fontScale,fontColor,fontThickness)

cv2.putText(frame,text2,textLowerLeft2,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.moveWindow("Camera",0,60)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

redLED.pulse_width_percent(0)

greenLED.pulse_width_percent(0)

blueLED.pulse_width_percent(0)

print('Program Terminated')

Homework Assignment

Alright, it’s time to put this knowledge to work. Your homework assignment is to turn this simple reporting tool into an interactive, dynamic ROI selector. The homework is to first create a text display under the FPS on the frame that show RGB value at the pixel position the mouse is pointing at, and the pixel location.

Your homework assignment is to turn this simple reporting tool into an interactive, dynamic ROI selector.

Start with your clean 1280×720 live camera stream.
Modify your mouseAction callback function to look for specific mouse clicks.
The Target Mechanic: When you Left-Click on the video window, store those specific coordinates as your upper-left corner. When you release the click, store those coordinates as your lower-right corner. As you are selecting, draw a live box outline over your ROI
Using those two dynamic coordinate sets, use matrix slicing to pull a clean Region of Interest (ROI) out of the frame and instantly display it in a completely separate, standalone window called “Target ROI”.
Safety Requirement: Make sure your code can handle clicks in any order without crashing (e.g., if a user right-clicks higher or further left than their left-click, write the conditional logic to sort the indices properly before slicing).

Get your black coffee ready, write your logic step-by-step from scratch, and do not copy code you can’t explain. Post your homework solution video on YouTube and drop a link in the comments section below so I can see who is running with the big dogs!

AI On the Edge, Tutorial

AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

June 11, 2026 admin

Welcome back, everyone! In this lesson, we are stepping into a foundational aspect of computer vision: manipulation of specific regions within a video frame.

Up to this point, we have been grabbing the full frame from our camera and performing operations on the entire image. But in real-world edge AI and robotics applications, processing every single pixel of a high-resolution frame is an absolute waste of compute power. If you want to detect a license plate, track a face, or monitor a specific sensor layout on a machine, you don’t need to look at the sky or the floor. You need to isolate a Region of Interest (ROI).

In this lesson, you will learn how to use Python’s powerful matrix slicing capabilities to chop up a frame, isolate specific quadrants, manipulate pixels inside an ROI, and display multiple synchronized windows across your desktop without crashing your system footprint.

The Core Concept: Image Slicing and ROIs

In OpenCV, an image frame isn’t just a visual picture—it is a standard NumPy array. A color frame is a 3D matrix structured by rows, columns, and color channels: [Rows, Columns, Channels] or [Height, Width, Color].

Because it is a standard array, we can use standard Python slicing notation to isolate any rectangular box we want:

\text{ROI} = \text{frame}[\text{row}_{\text{start}}:\text{row}_{\text{end}}, \, \text{col}_{\text{start}}:\text{col}_{\text{end}}]

The `.copy()` Trap

When you slice a piece of an array in Python like ROI = frame[0:100, 0:100], Python does not create a new image in your RAM. It creates a view or a pointer back to the original frame. If you modify pixels inside that ROI, you will accidentally alter your original main camera frame!

To isolate a region and modify it independently without bleeding back into your primary frame, you must explicitly use the .copy() method:

ROI = frame[<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.75</span>), <span class="hljs-built_in">int</span>(W*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(W*<span class="hljs-number">.75</span>)].copy()

				1

						ROI = frame[<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.75</span>), <span class="hljs-built_in">int</span>(W*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(W*<span class="hljs-number">.75</span>)].copy()

Below is the complete code script we built during the video tutorial. Copy this code exactly into your Python environment, verify your geometry setups, and run it.

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=640
H=360
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)

topBar = 65
windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera", W, H)
cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera Small",  int(W/2), int(H/2))
cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)
    
cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Gray Small", int(W/2), int(H/2))
cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

quadrants = ["upperLeft","upperRight","lowerLeft","lowerRight"]
x=0
for quadrant in quadrants:
    cv2.namedWindow(quadrant, cv2.WINDOW_GUI_NORMAL)
    cv2.resizeWindow(quadrant,int(W/4),int(H/4))
    cv2.moveWindow(quadrant,W,topBar+int(x*(windowWaste+H/4)))
    x=x+1
    
while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    
    print(frame[int(H/2),int(W/2)])
    frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]
    
    ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

    ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]
    ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY) 
 
    frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))
    
    upperLeft = frame[0:int(H/2),0:int(W/2)]
    upperRight = frame[0:int(H/2),int(W/2):W-1]
    lowerLeft = frame[int(H/2):H-1,0:int(W/2)]
    lowerRight = frame[int(H/2):H-1,int(W/2):W-1]
    
    quadDict = {
        "upperLeft" : upperLeft,
        "upperRight" : upperRight,
        "lowerLeft" : lowerLeft,
        "lowerRight" : lowerRight
        }
    
    

    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.imshow("Camera Small",ROI)
    cv2.imshow("Gray Small",ROIgray)
    for name, image in quadDict.items():
        cv2.imshow(name,image)
    

    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=640

H=360

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

topBar = 65

windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera", W, H)

cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera Small", int(W/2), int(H/2))

cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)

cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Gray Small", int(W/2), int(H/2))

cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

quadrants = ["upperLeft","upperRight","lowerLeft","lowerRight"]

x=0

for quadrant in quadrants:

cv2.namedWindow(quadrant, cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow(quadrant,int(W/4),int(H/4))

cv2.moveWindow(quadrant,W,topBar+int(x*(windowWaste+H/4)))

x=x+1

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

print(frame[int(H/2),int(W/2)])

frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]

ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]

ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY)

frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))

upperLeft = frame[0:int(H/2),0:int(W/2)]

upperRight = frame[0:int(H/2),int(W/2):W-1]

lowerLeft = frame[int(H/2):H-1,0:int(W/2)]

lowerRight = frame[int(H/2):H-1,int(W/2):W-1]

quadDict = {

"upperLeft" : upperLeft,

"upperRight" : upperRight,

"lowerLeft" : lowerLeft,

"lowerRight" : lowerRight

}

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Camera Small",ROI)

cv2.imshow("Gray Small",ROIgray)

for name, image in quadDict.items():

cv2.imshow(name,image)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

Homework Assignment

Alright, it is time to earn your stripes and see if you can fly with the big dogs. Your homework assignment is to take this foundation and build a dynamic tracking target box using the array geometry principles we just learned.

Create a single main camera window ( $640 \times 360$ ).
Draw an independent rectangular ROI box that starts directly in the dead center of the screen.
Using your keyboard parameters (cv2.waitKey), program the system so that using the Arrow Keys (or ‘i’, ‘j’, ‘k’, ‘l’) smoothly updates variables to move the ROI box dynamically around the screen in real-time.
Crucial Constraint: Do not let your boundary indices drift off the array! You must write conditional boundaries so that if your moving target hits the edge of your $640 \times 360$ boundary layout, it locks at the frame border and prevents an out-of-bounds index crash.
In a separate output window, display only the contents of the moving target box in real-time grayscaled format.

Grab your morning coffee, fire up your code editor, write the script from scratch, and do not copy-paste code you don’t understand. Leave a link to your homework solution video in the YouTube comments section so I can see your progress!

AI On the Edge

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

June 10, 2026 admin

Hey guys, Paul McWhorter here with TopTechBoy.com, and today we are diving into the heart of computer vision. We’ve been playing around with getting images from the camera, but have you ever stopped to actually look at what a picture is when it’s inside your computer’s memory?

If you want to be a master of AI on the Edge, you have to stop thinking about images as “pictures” and start seeing them as what they really are: a massive, organized grid of numbers.

What is a Picture, Really?

In this lesson, we are peeling back the curtain on how OpenCV and Python handle video frames. When we call piCam.capture_array(), we aren’t just taking a snapshot; we are pulling a data array into memory.

Think of it like a giant spreadsheet where every single cell is a pixel.

Dimensions: Your image has a width and a height, which correspond to the number of rows and columns in that array. It is important to remember the row designator comes first, then the column, [ R, C]
The Depth (The RGB Channels): It’s not just a flat 2D grid! Each “cell” in that grid is actually a little sub-array containing three values: Red, Green, and Blue. That is why we call it a 3D data structure.

Manipulating Data, Not Just Pixels

The magic happens when you realize you can reach into that array and change those numbers directly.

In the code we developed today, we aren’t just displaying video; we are performing data science on video frames. We explored how to:

Access individual pixels: By referencing specific coordinates in our frame array, we can pull out the color data for a single spot.
Draw shapes by modifying arrays: Notice how we don’t need a “draw square” function to put a box on the screen? We simply tell a slice of that array to equal [0, 0, 255]. We are literally changing the color values of those pixels to solid red.
Regions of Interest (ROI): This is critical for AI. You don’t always need to look at the whole frame. We learned how to “slice” the array to isolate a Region of Interest. By carving out a smaller piece of that memory, we can perform operations—like converting to grayscale—on just that section, which saves a massive amount of processing power.

Why Does This Matter?

If you want to build a robot that recognizes objects or tracks faces, you need to understand this structure. AI models don’t “see” a cat; they see a mathematical representation of that cat’s pixel values. By learning how to slice, manipulate, and convert these arrays, you are learning the fundamental language of machine learning.

We are building the foundation here, folks. Once you get comfortable with how to manipulate these arrays, we are going to start doing some really cool stuff with image processing and filtering.

Dive into that code, change those array values, and see what happens when you mess with the dimensions! Don’t just run it—experiment with it.

I’ll see you guys in the next lesson!

In this lesson we developed the following code:

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=640
H=360
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)

topBar = 65
windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera", W, H)
cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera Small",  int(W/2), int(H/2))
cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)
    
cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Gray Small", int(W/2), int(H/2))
cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)


while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    
    print(frame[int(H/2),int(W/2)])
    frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]
    
    ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

    ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]
    ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY) 
 
    frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))
    

    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.imshow("Camera Small",ROI)
    cv2.imshow("Gray Small",ROIgray)

    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=640

H=360

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

topBar = 65

windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera", W, H)

cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera Small", int(W/2), int(H/2))

cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)

cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Gray Small", int(W/2), int(H/2))

cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

print(frame[int(H/2),int(W/2)])

frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]

ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]

ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY)

frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Camera Small",ROI)

cv2.imshow("Gray Small",ROIgray)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

Technology Tutorials

Tag Archives: ROI

AI on the Edge LESSON 27: Track Objects of Interest in OpenCV Using Contours

AI on the Edge LESSON 27: Track Objects of Interest in OpenCV Using Contours

The Core Concept: What is a Contour?

The Three Steps to Algorithmic Object Tracking

1. Extracting Every Boundary

2. Hunting for the Largest Target

3. Setting an Area Noise Floor

Drawing the Bounding Box

AI on the Edge LESSON 24: Processing Mouse Events in OpenCV on Pi 5

The Core Concept: Mouse Callbacks and Global Frames

The [y, x] Matrix Inversion Trap

The Python Code Developed in This Lesson

Homework Assignment

AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

The Core Concept: Image Slicing and ROIs

The `.copy()` Trap

Homework Assignment

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

What is a Picture, Really?

Manipulating Data, Not Just Pixels

Why Does This Matter?

Making The World a Better Place One High Tech Project at a Time. Enjoy!

AI on the Edge LESSON 27: Track Objects of Interest in OpenCV Using Contours

The Core Concept: What is a Contour?

The Three Steps to Algorithmic Object Tracking

1. Extracting Every Boundary

2. Hunting for the Largest Target

3. Setting an Area Noise Floor

Drawing the Bounding Box

The Core Concept: Mouse Callbacks and Global Frames

The [y, x] Matrix Inversion Trap

The Python Code Developed in This Lesson

Homework Assignment

The Core Concept: Image Slicing and ROIs

The .copy() Trap

Homework Assignment

What is a Picture, Really?

Manipulating Data, Not Just Pixels

Why Does This Matter?

Making The World a Better Place One High Tech Project at a Time. Enjoy!

The `.copy()` Trap