Tag Archives: Fusion AI Lab Kit

AI on the Edge LESSON 28: Use Pan Tilt Camera to Track Object of Interest in OpenCV

In this video lesson we learn how to track our object of interest using the Pan Tilt camera. The camera will constantly adjust position to keep the Object of Interest in the center of the camera frame. We identify the Object of Interest based on Color, as we learned in earlier lessons. In this video, we developed the code below:

 

AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

Welcome back, everyone! In this lesson, we are stepping into a foundational aspect of computer vision: manipulation of specific regions within a video frame.

Up to this point, we have been grabbing the full frame from our camera and performing operations on the entire image. But in real-world edge AI and robotics applications, processing every single pixel of a high-resolution frame is an absolute waste of compute power. If you want to detect a license plate, track a face, or monitor a specific sensor layout on a machine, you don’t need to look at the sky or the floor. You need to isolate a Region of Interest (ROI).

In this lesson, you will learn how to use Python’s powerful matrix slicing capabilities to chop up a frame, isolate specific quadrants, manipulate pixels inside an ROI, and display multiple synchronized windows across your desktop without crashing your system footprint.

The Core Concept: Image Slicing and ROIs

In OpenCV, an image frame isn’t just a visual picture—it is a standard NumPy array. A color frame is a 3D matrix structured by rows, columns, and color channels: [Rows, Columns, Channels] or [Height, Width, Color].

Because it is a standard array, we can use standard Python slicing notation to isolate any rectangular box we want:

ROI = frame[rowstart : rowend,  colstart : colend]

The .copy() Trap

When you slice a piece of an array in Python like ROI = frame[0:100, 0:100], Python does not create a new image in your RAM. It creates a view or a pointer back to the original frame. If you modify pixels inside that ROI, you will accidentally alter your original main camera frame!

To isolate a region and modify it independently without bleeding back into your primary frame, you must explicitly use the .copy() method:

Below is the complete code script we built during the video tutorial. Copy this code exactly into your Python environment, verify your geometry setups, and run it.

Homework Assignment

Alright, it is time to earn your stripes and see if you can fly with the big dogs. Your homework assignment is to take this foundation and build a dynamic tracking target box using the array geometry principles we just learned.

  1. Create a single main camera window (640 x 360).

  2. Draw an independent rectangular ROI box that starts directly in the dead center of the screen.

  3. Using your keyboard parameters (cv2.waitKey), program the system so that using the Arrow Keys (or ‘i’, ‘j’, ‘k’, ‘l’) smoothly updates variables to move the ROI box dynamically around the screen in real-time.

  4. Crucial Constraint: Do not let your boundary indices drift off the array! You must write conditional boundaries so that if your moving target hits the edge of your $640 \times 360$ boundary layout, it locks at the frame border and prevents an out-of-bounds index crash.

  5. In a separate output window, display only the contents of the moving target box in real-time grayscaled format.

Grab your morning coffee, fire up your code editor, write the script from scratch, and do not copy-paste code you don’t understand. Leave a link to your homework solution video in the YouTube comments section so I can see your progress!

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

Hey guys, Paul McWhorter here with TopTechBoy.com, and today we are diving into the heart of computer vision. We’ve been playing around with getting images from the camera, but have you ever stopped to actually look at what a picture is when it’s inside your computer’s memory?

If you want to be a master of AI on the Edge, you have to stop thinking about images as “pictures” and start seeing them as what they really are: a massive, organized grid of numbers.

What is a Picture, Really?

In this lesson, we are peeling back the curtain on how OpenCV and Python handle video frames. When we call piCam.capture_array(), we aren’t just taking a snapshot; we are pulling a data array into memory.

Think of it like a giant spreadsheet where every single cell is a pixel.

  • Dimensions: Your image has a width and a height, which correspond to the number of rows and columns in that array. It is important to remember the row designator comes first, then the column, [ R, C]

  • The Depth (The RGB Channels): It’s not just a flat 2D grid! Each “cell” in that grid is actually a little sub-array containing three values: Red, Green, and Blue. That is why we call it a 3D data structure.

Manipulating Data, Not Just Pixels

The magic happens when you realize you can reach into that array and change those numbers directly.

In the code we developed today, we aren’t just displaying video; we are performing data science on video frames. We explored how to:

  1. Access individual pixels: By referencing specific coordinates in our frame array, we can pull out the color data for a single spot.

  2. Draw shapes by modifying arrays: Notice how we don’t need a “draw square” function to put a box on the screen? We simply tell a slice of that array to equal [0, 0, 255]. We are literally changing the color values of those pixels to solid red.

  3. Regions of Interest (ROI): This is critical for AI. You don’t always need to look at the whole frame. We learned how to “slice” the array to isolate a Region of Interest. By carving out a smaller piece of that memory, we can perform operations—like converting to grayscale—on just that section, which saves a massive amount of processing power.

Why Does This Matter?

If you want to build a robot that recognizes objects or tracks faces, you need to understand this structure. AI models don’t “see” a cat; they see a mathematical representation of that cat’s pixel values. By learning how to slice, manipulate, and convert these arrays, you are learning the fundamental language of machine learning.

We are building the foundation here, folks. Once you get comfortable with how to manipulate these arrays, we are going to start doing some really cool stuff with image processing and filtering.

Dive into that code, change those array values, and see what happens when you mess with the dimensions! Don’t just run it—experiment with it.

I’ll see you guys in the next lesson!

In this lesson we developed the following code:

 

AI on the Edge LESSON 21: Managing Multiple Windows in OpenCV on the Raspberry Pi

Hey everyone, Paul McWhorter here!

Welcome back to the AI on the Edge series!

In today’s lesson, we’re going to take an important next step in computer vision. We’re going to learn how to create, position, resize, and manage multiple windows at the same time using OpenCV on the Raspberry Pi.

This might sound simple, but it’s actually a very big deal. Once you can comfortably work with multiple windows, you can start building much more powerful vision applications — like having a main camera view, a processed view, zoomed-in sections, and debug windows all running at once.

In this lesson we create:

  • One large main camera window
  • A smaller color preview
  • A small grayscale version
  • Five tiny grayscale windows stacked on the side

This gives you a clean, organized workspace while the camera is running.


What You Learned in This Lesson

  • How to create multiple named windows with cv2.namedWindow()
  • How to resize windows using cv2.resizeWindow()
  • How to precisely position windows on your screen with cv2.moveWindow()
  • How to work with different resolutions of the same image (full size, half size, quarter size)
  • Converting between color and grayscale while running live video
  • Keeping everything running smoothly with good FPS

Mastering multiple windows is one of those foundational skills that separates basic OpenCV projects from more professional and useful vision systems.


Pro Tip: Play around with the window positions and sizes after you get it working. Try making one window much larger, or experiment with different layouts. This is your workspace — make it comfortable!


Ready for the next step? In the next lesson, we’re going to start doing something really cool — we’ll begin combining live video with drawn graphics and start creating interactive vision projects.

Keep building, keep learning, and I’ll see you in the next video!

In the lesson, we develop the code below:

 

AI on the Edge LESSON 20: Resizing, Moving, Converting and Tiling Video frames in OpenCV

Welcome back to the AI on the Edge class series! In this lesson, we are diving deep into some of the most critical foundational skills you need when working with video streams on edge devices: Resizing, Moving, Converting, and Tiling video frames using OpenCV.

When you are developing real-world AI applications on the edge, you rarely just display a single camera feed. You often need to manipulate frames to feed them into your AI models, look at grayscale versions for edge detection, or arrange multiple windows on your desktop neatly so you can monitor your data visually.

If you want to follow along exactly as we do in the video, make sure you have your Raspberry Pi 5 set up with your Camera Module.

What We Cover in This Lesson

  • Fixed FPS Estimation: We continue using our robust low-pass filter formula to track smooth, non-jittery frames-per-second data directly on the video frame.

  • Creating Named Windows: Understanding how cv2.namedWindow() combined with cv2.WINDOW_GUI_NORMAL gives you absolute programmatic control over the placement of your displays.

  • Resizing & Moving Windows: How to accurately position multiple OpenCV windows on your screen using specific coordinates while accounting for operating system taskbars and window decorative margins.

  • Frame Manipulation: Using cv2.resize() to scale down video frames and cv2.cvtColor() to transform the color space from BGR to grayscale.

  • Window Tiling: Arranging a main camera view, a scaled-down color view, and a scaled-down grayscale view in a perfect grid layout on your desktop.

The Complete Lesson 20 Code

Below is the complete Python code we developed during this lesson. It sets up your hardware camera stream, calculates running performance metrics, processes three distinct variations of the video feed, and tiles them cleanly on your screen.