Welcome back, everyone! In this lesson, we are stepping into a foundational aspect of computer vision: manipulation of specific regions within a video frame.
Up to this point, we have been grabbing the full frame from our camera and performing operations on the entire image. But in real-world edge AI and robotics applications, processing every single pixel of a high-resolution frame is an absolute waste of compute power. If you want to detect a license plate, track a face, or monitor a specific sensor layout on a machine, you don’t need to look at the sky or the floor. You need to isolate a Region of Interest (ROI).
In this lesson, you will learn how to use Python’s powerful matrix slicing capabilities to chop up a frame, isolate specific quadrants, manipulate pixels inside an ROI, and display multiple synchronized windows across your desktop without crashing your system footprint.
The Core Concept: Image Slicing and ROIs
In OpenCV, an image frame isn’t just a visual picture—it is a standard NumPy array. A color frame is a 3D matrix structured by rows, columns, and color channels: [Rows, Columns, Channels] or [Height, Width, Color].
Because it is a standard array, we can use standard Python slicing notation to isolate any rectangular box we want:
The .copy() Trap
When you slice a piece of an array in Python like ROI = frame[0:100, 0:100], Python does not create a new image in your RAM. It creates a view or a pointer back to the original frame. If you modify pixels inside that ROI, you will accidentally alter your original main camera frame!
To isolate a region and modify it independently without bleeding back into your primary frame, you must explicitly use the .copy() method:
|
1 |
ROI = frame[<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.75</span>), <span class="hljs-built_in">int</span>(W*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(W*<span class="hljs-number">.75</span>)].copy() |
Below is the complete code script we built during the video tutorial. Copy this code exactly into your Python environment, verify your geometry setups, and run it.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
import cv2 import time from picamera2 import Picamera2 piCam = Picamera2() W=640 H=360 tStart = time.time() fps = 0 RES = (W,H) piCam.preview_configuration.main.size = RES piCam.preview_configuration.main.format = "RGB888" piCam.preview_configuration.controls.FrameRate=60 piCam.preview_configuration.align() piCam.configure("preview") piCam.start() textLowerLeft = (int(W*.01),int(H*.05)) fontFace = cv2.FONT_HERSHEY_SIMPLEX fontThickness = int(W/425) fontScale = H*.0015 fontColor = (0,0,255) topBar = 65 windowWaste = 25 cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL) cv2.resizeWindow("Camera", W, H) cv2.moveWindow("Camera",0,topBar) cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL) cv2.resizeWindow("Camera Small", int(W/2), int(H/2)) cv2.moveWindow("Camera Small",0,topBar+windowWaste+H) cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL) cv2.resizeWindow("Gray Small", int(W/2), int(H/2)) cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H) quadrants = ["upperLeft","upperRight","lowerLeft","lowerRight"] x=0 for quadrant in quadrants: cv2.namedWindow(quadrant, cv2.WINDOW_GUI_NORMAL) cv2.resizeWindow(quadrant,int(W/4),int(H/4)) cv2.moveWindow(quadrant,W,topBar+int(x*(windowWaste+H/4))) x=x+1 while True: deltaT = time.time() - tStart tStart=time.time() fps = fps*.95 + (1/deltaT)*.05 frame= piCam.capture_array() frame=cv2.flip(frame,-1) print(frame[int(H/2),int(W/2)]) frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255] ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy() ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0] ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY) frameSmall=cv2.resize(frame,(int(W/2),int(H/2))) upperLeft = frame[0:int(H/2),0:int(W/2)] upperRight = frame[0:int(H/2),int(W/2):W-1] lowerLeft = frame[int(H/2):H-1,0:int(W/2)] lowerRight = frame[int(H/2):H-1,int(W/2):W-1] quadDict = { "upperLeft" : upperLeft, "upperRight" : upperRight, "lowerLeft" : lowerLeft, "lowerRight" : lowerRight } myText = "FPS: "+str(round(fps,1)) cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness) cv2.imshow("Camera", frame) cv2.imshow("Camera Small",ROI) cv2.imshow("Gray Small",ROIgray) for name, image in quadDict.items(): cv2.imshow(name,image) if cv2.waitKey(1)==ord('q'): break cv2.destroyAllWindows() print('Program Terminated') |
Homework Assignment
Alright, it is time to earn your stripes and see if you can fly with the big dogs. Your homework assignment is to take this foundation and build a dynamic tracking target box using the array geometry principles we just learned.
-
Create a single main camera window (640 x 360).
-
Draw an independent rectangular ROI box that starts directly in the dead center of the screen.
-
Using your keyboard parameters (
cv2.waitKey), program the system so that using the Arrow Keys (or ‘i’, ‘j’, ‘k’, ‘l’) smoothly updates variables to move the ROI box dynamically around the screen in real-time. -
Crucial Constraint: Do not let your boundary indices drift off the array! You must write conditional boundaries so that if your moving target hits the edge of your $640 \times 360$ boundary layout, it locks at the frame border and prevents an out-of-bounds index crash.
-
In a separate output window, display only the contents of the moving target box in real-time grayscaled format.
Grab your morning coffee, fire up your code editor, write the script from scratch, and do not copy-paste code you don’t understand. Leave a link to your homework solution video in the YouTube comments section so I can see your progress!
