Tag Archives: Fusion HAT+

AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

Welcome back, everyone! In this lesson, we are stepping into a foundational aspect of computer vision: manipulation of specific regions within a video frame.

Up to this point, we have been grabbing the full frame from our camera and performing operations on the entire image. But in real-world edge AI and robotics applications, processing every single pixel of a high-resolution frame is an absolute waste of compute power. If you want to detect a license plate, track a face, or monitor a specific sensor layout on a machine, you don’t need to look at the sky or the floor. You need to isolate a Region of Interest (ROI).

In this lesson, you will learn how to use Python’s powerful matrix slicing capabilities to chop up a frame, isolate specific quadrants, manipulate pixels inside an ROI, and display multiple synchronized windows across your desktop without crashing your system footprint.

The Core Concept: Image Slicing and ROIs

In OpenCV, an image frame isn’t just a visual picture—it is a standard NumPy array. A color frame is a 3D matrix structured by rows, columns, and color channels: [Rows, Columns, Channels] or [Height, Width, Color].

Because it is a standard array, we can use standard Python slicing notation to isolate any rectangular box we want:

ROI = frame[rowstart : rowend,  colstart : colend]

The .copy() Trap

When you slice a piece of an array in Python like ROI = frame[0:100, 0:100], Python does not create a new image in your RAM. It creates a view or a pointer back to the original frame. If you modify pixels inside that ROI, you will accidentally alter your original main camera frame!

To isolate a region and modify it independently without bleeding back into your primary frame, you must explicitly use the .copy() method:

Below is the complete code script we built during the video tutorial. Copy this code exactly into your Python environment, verify your geometry setups, and run it.

Homework Assignment

Alright, it is time to earn your stripes and see if you can fly with the big dogs. Your homework assignment is to take this foundation and build a dynamic tracking target box using the array geometry principles we just learned.

  1. Create a single main camera window (640 x 360).

  2. Draw an independent rectangular ROI box that starts directly in the dead center of the screen.

  3. Using your keyboard parameters (cv2.waitKey), program the system so that using the Arrow Keys (or ‘i’, ‘j’, ‘k’, ‘l’) smoothly updates variables to move the ROI box dynamically around the screen in real-time.

  4. Crucial Constraint: Do not let your boundary indices drift off the array! You must write conditional boundaries so that if your moving target hits the edge of your $640 \times 360$ boundary layout, it locks at the frame border and prevents an out-of-bounds index crash.

  5. In a separate output window, display only the contents of the moving target box in real-time grayscaled format.

Grab your morning coffee, fire up your code editor, write the script from scratch, and do not copy-paste code you don’t understand. Leave a link to your homework solution video in the YouTube comments section so I can see your progress!

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

Hey guys, Paul McWhorter here with TopTechBoy.com, and today we are diving into the heart of computer vision. We’ve been playing around with getting images from the camera, but have you ever stopped to actually look at what a picture is when it’s inside your computer’s memory?

If you want to be a master of AI on the Edge, you have to stop thinking about images as “pictures” and start seeing them as what they really are: a massive, organized grid of numbers.

What is a Picture, Really?

In this lesson, we are peeling back the curtain on how OpenCV and Python handle video frames. When we call piCam.capture_array(), we aren’t just taking a snapshot; we are pulling a data array into memory.

Think of it like a giant spreadsheet where every single cell is a pixel.

  • Dimensions: Your image has a width and a height, which correspond to the number of rows and columns in that array. It is important to remember the row designator comes first, then the column, [ R, C]

  • The Depth (The RGB Channels): It’s not just a flat 2D grid! Each “cell” in that grid is actually a little sub-array containing three values: Red, Green, and Blue. That is why we call it a 3D data structure.

Manipulating Data, Not Just Pixels

The magic happens when you realize you can reach into that array and change those numbers directly.

In the code we developed today, we aren’t just displaying video; we are performing data science on video frames. We explored how to:

  1. Access individual pixels: By referencing specific coordinates in our frame array, we can pull out the color data for a single spot.

  2. Draw shapes by modifying arrays: Notice how we don’t need a “draw square” function to put a box on the screen? We simply tell a slice of that array to equal [0, 0, 255]. We are literally changing the color values of those pixels to solid red.

  3. Regions of Interest (ROI): This is critical for AI. You don’t always need to look at the whole frame. We learned how to “slice” the array to isolate a Region of Interest. By carving out a smaller piece of that memory, we can perform operations—like converting to grayscale—on just that section, which saves a massive amount of processing power.

Why Does This Matter?

If you want to build a robot that recognizes objects or tracks faces, you need to understand this structure. AI models don’t “see” a cat; they see a mathematical representation of that cat’s pixel values. By learning how to slice, manipulate, and convert these arrays, you are learning the fundamental language of machine learning.

We are building the foundation here, folks. Once you get comfortable with how to manipulate these arrays, we are going to start doing some really cool stuff with image processing and filtering.

Dive into that code, change those array values, and see what happens when you mess with the dimensions! Don’t just run it—experiment with it.

I’ll see you guys in the next lesson!

In this lesson we developed the following code:

 

AI on the Edge LESSON 21: Managing Multiple Windows in OpenCV on the Raspberry Pi

Hey everyone, Paul McWhorter here!

Welcome back to the AI on the Edge series!

In today’s lesson, we’re going to take an important next step in computer vision. We’re going to learn how to create, position, resize, and manage multiple windows at the same time using OpenCV on the Raspberry Pi.

This might sound simple, but it’s actually a very big deal. Once you can comfortably work with multiple windows, you can start building much more powerful vision applications — like having a main camera view, a processed view, zoomed-in sections, and debug windows all running at once.

In this lesson we create:

  • One large main camera window
  • A smaller color preview
  • A small grayscale version
  • Five tiny grayscale windows stacked on the side

This gives you a clean, organized workspace while the camera is running.


What You Learned in This Lesson

  • How to create multiple named windows with cv2.namedWindow()
  • How to resize windows using cv2.resizeWindow()
  • How to precisely position windows on your screen with cv2.moveWindow()
  • How to work with different resolutions of the same image (full size, half size, quarter size)
  • Converting between color and grayscale while running live video
  • Keeping everything running smoothly with good FPS

Mastering multiple windows is one of those foundational skills that separates basic OpenCV projects from more professional and useful vision systems.


Pro Tip: Play around with the window positions and sizes after you get it working. Try making one window much larger, or experiment with different layouts. This is your workspace — make it comfortable!


Ready for the next step? In the next lesson, we’re going to start doing something really cool — we’ll begin combining live video with drawn graphics and start creating interactive vision projects.

Keep building, keep learning, and I’ll see you in the next video!

In the lesson, we develop the code below:

 

AI on the Edge LESSON 17: Decorating and Annotating Video Frames in openCV

Welcome to AI on the Edge – Lesson 17: Decorating and Annotating Video Frames in OpenCV. In this lesson we take our live video stream from the Raspberry Pi camera and start making it really useful and professional-looking. Now that we can grab frames and display them, it’s time to learn how to draw directly on top of those frames. We’re talking rectangles, lines, arrows, circles, and crisp text overlays — all the visual elements you’ll need when you start adding real AI like face detection or object recognition.
You’ll see exactly how to use OpenCV’s drawing functions to create clean, scalable annotations that look great whether you’re running at 320×180 for maximum speed or higher resolutions like 1280×720. We cover how to control line thickness, use filled shapes, position text properly, and most importantly, how to make all your drawings scale automatically with your chosen resolution so everything stays nicely proportioned.
By the end of this lesson you’ll have the skills to draw bounding boxes around detected objects, add confidence scores, label people or items, draw tracking lines — basically anything you need to show what your AI is seeing. This is one of those foundational skills that you’ll use over and over again in your computer vision projects.As always, I encourage you to type the code along with me in the video, then start playing with colors, sizes, positions, and messages. Change things around, break it, and make it your own. That’s the best way to really learn this stuff.
So fire up your Raspberry Pi 5, grab that camera, and let’s start turning raw video frames into clear, informative, and great-looking annotated output!

 

AI on the Edge LESSON 16: Control Pan/Tilt Camera Position Using Voice Commands

In AI on the Edge Lesson 16, we take a big step forward by combining voice recognition with physical motion. In this project, you will build a voice-controlled pan/tilt camera system. Using simple spoken commands such as “right,” “left,” “up,” “down,” and “quit,” you can move the Raspberry Pi camera in real time. This lesson brings together the Fusion HAT+ servo control, the Speech-to-Text (STT) capabilities we explored earlier, live video streaming with picamera2 and OpenCV, and multithreading to keep everything running smoothly.
The hardware setup is straightforward. We connect two servos to the Fusion HAT+ — one for pan (horizontal movement) on pin 2 and one for tilt (vertical movement) on pin 3. The Raspberry Pi Camera is mounted on a pan/tilt mechanism so it can physically follow your voice commands. We start the camera at a neutral position (pan = 0°, tilt = -20°) and define step sizes so the movement feels responsive but controlled.
The Python code uses two main threads: one for continuous voice listening and another for displaying the live video feed. In the listening thread, we create an STT object and continuously wait for voice input. When a command is recognized, we adjust the pan or tilt angle accordingly and immediately send the new position to the appropriate servo. The main loop captures frames from the Pi Camera, flips them for correct orientation, displays them in an OpenCV window, and checks for the ‘q’ key to exit gracefully.
This project demonstrates several important concepts working together: real-time voice command processing, servo motor control, camera streaming with picamera2 at 1280×720 resolution and 60 fps, and proper use of threading so that listening and video display do not block each other. You will also notice how we use global variables carefully to share the current pan and tilt positions between the threads.
By the end of this lesson, you will have a working voice-controlled camera that you can point anywhere you want just by talking to it. This is an excellent foundation for more advanced projects such as voice-controlled object tracking, security cameras, or interactive AI assistants that can both see and move.The complete code is provided below, along with explanations of the key sections. Feel free to experiment with different step sizes (xDelta and yDelta), starting angles, or even add new voice commands once you are comfortable with the basic version.
This is the code developed in the video lesson: