Category Archives: AI On the Edge

Welcome back, everyone! In this lesson, we are stepping into a foundational aspect of computer vision: manipulation of specific regions within a video frame.

Up to this point, we have been grabbing the full frame from our camera and performing operations on the entire image. But in real-world edge AI and robotics applications, processing every single pixel of a high-resolution frame is an absolute waste of compute power. If you want to detect a license plate, track a face, or monitor a specific sensor layout on a machine, you don’t need to look at the sky or the floor. You need to isolate a Region of Interest (ROI).

In this lesson, you will learn how to use Python’s powerful matrix slicing capabilities to chop up a frame, isolate specific quadrants, manipulate pixels inside an ROI, and display multiple synchronized windows across your desktop without crashing your system footprint.

The Core Concept: Image Slicing and ROIs

In OpenCV, an image frame isn’t just a visual picture—it is a standard NumPy array. A color frame is a 3D matrix structured by rows, columns, and color channels: [Rows, Columns, Channels] or [Height, Width, Color].

Because it is a standard array, we can use standard Python slicing notation to isolate any rectangular box we want:

\text{ROI} = \text{frame}[\text{row}_{\text{start}}:\text{row}_{\text{end}}, \, \text{col}_{\text{start}}:\text{col}_{\text{end}}]

The `.copy()` Trap

When you slice a piece of an array in Python like ROI = frame[0:100, 0:100], Python does not create a new image in your RAM. It creates a view or a pointer back to the original frame. If you modify pixels inside that ROI, you will accidentally alter your original main camera frame!

To isolate a region and modify it independently without bleeding back into your primary frame, you must explicitly use the .copy() method:

ROI = frame[<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.75</span>), <span class="hljs-built_in">int</span>(W*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(W*<span class="hljs-number">.75</span>)].copy()

				1

						ROI = frame[<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(H*<span class="hljs-number">.75</span>), <span class="hljs-built_in">int</span>(W*<span class="hljs-number">.25</span>):<span class="hljs-built_in">int</span>(W*<span class="hljs-number">.75</span>)].copy()

Below is the complete code script we built during the video tutorial. Copy this code exactly into your Python environment, verify your geometry setups, and run it.

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=640
H=360
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)

topBar = 65
windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera", W, H)
cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera Small",  int(W/2), int(H/2))
cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)
    
cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Gray Small", int(W/2), int(H/2))
cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

quadrants = ["upperLeft","upperRight","lowerLeft","lowerRight"]
x=0
for quadrant in quadrants:
    cv2.namedWindow(quadrant, cv2.WINDOW_GUI_NORMAL)
    cv2.resizeWindow(quadrant,int(W/4),int(H/4))
    cv2.moveWindow(quadrant,W,topBar+int(x*(windowWaste+H/4)))
    x=x+1
    
while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    
    print(frame[int(H/2),int(W/2)])
    frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]
    
    ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

    ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]
    ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY) 
 
    frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))
    
    upperLeft = frame[0:int(H/2),0:int(W/2)]
    upperRight = frame[0:int(H/2),int(W/2):W-1]
    lowerLeft = frame[int(H/2):H-1,0:int(W/2)]
    lowerRight = frame[int(H/2):H-1,int(W/2):W-1]
    
    quadDict = {
        "upperLeft" : upperLeft,
        "upperRight" : upperRight,
        "lowerLeft" : lowerLeft,
        "lowerRight" : lowerRight
        }
    
    

    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.imshow("Camera Small",ROI)
    cv2.imshow("Gray Small",ROIgray)
    for name, image in quadDict.items():
        cv2.imshow(name,image)
    

    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=640

H=360

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

topBar = 65

windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera", W, H)

cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera Small", int(W/2), int(H/2))

cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)

cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Gray Small", int(W/2), int(H/2))

cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

quadrants = ["upperLeft","upperRight","lowerLeft","lowerRight"]

x=0

for quadrant in quadrants:

cv2.namedWindow(quadrant, cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow(quadrant,int(W/4),int(H/4))

cv2.moveWindow(quadrant,W,topBar+int(x*(windowWaste+H/4)))

x=x+1

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

print(frame[int(H/2),int(W/2)])

frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]

ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]

ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY)

frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))

upperLeft = frame[0:int(H/2),0:int(W/2)]

upperRight = frame[0:int(H/2),int(W/2):W-1]

lowerLeft = frame[int(H/2):H-1,0:int(W/2)]

lowerRight = frame[int(H/2):H-1,int(W/2):W-1]

quadDict = {

"upperLeft" : upperLeft,

"upperRight" : upperRight,

"lowerLeft" : lowerLeft,

"lowerRight" : lowerRight

}

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Camera Small",ROI)

cv2.imshow("Gray Small",ROIgray)

for name, image in quadDict.items():

cv2.imshow(name,image)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

Homework Assignment

Alright, it is time to earn your stripes and see if you can fly with the big dogs. Your homework assignment is to take this foundation and build a dynamic tracking target box using the array geometry principles we just learned.

Create a single main camera window ( $640 \times 360$ ).
Draw an independent rectangular ROI box that starts directly in the dead center of the screen.
Using your keyboard parameters (cv2.waitKey), program the system so that using the Arrow Keys (or ‘i’, ‘j’, ‘k’, ‘l’) smoothly updates variables to move the ROI box dynamically around the screen in real-time.
Crucial Constraint: Do not let your boundary indices drift off the array! You must write conditional boundaries so that if your moving target hits the edge of your $640 \times 360$ boundary layout, it locks at the frame border and prevents an out-of-bounds index crash.
In a separate output window, display only the contents of the moving target box in real-time grayscaled format.

Grab your morning coffee, fire up your code editor, write the script from scratch, and do not copy-paste code you don’t understand. Leave a link to your homework solution video in the YouTube comments section so I can see your progress!

AI On the Edge

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

June 10, 2026 admin

Hey guys, Paul McWhorter here with TopTechBoy.com, and today we are diving into the heart of computer vision. We’ve been playing around with getting images from the camera, but have you ever stopped to actually look at what a picture is when it’s inside your computer’s memory?

If you want to be a master of AI on the Edge, you have to stop thinking about images as “pictures” and start seeing them as what they really are: a massive, organized grid of numbers.

What is a Picture, Really?

In this lesson, we are peeling back the curtain on how OpenCV and Python handle video frames. When we call piCam.capture_array(), we aren’t just taking a snapshot; we are pulling a data array into memory.

Think of it like a giant spreadsheet where every single cell is a pixel.

Dimensions: Your image has a width and a height, which correspond to the number of rows and columns in that array. It is important to remember the row designator comes first, then the column, [ R, C]
The Depth (The RGB Channels): It’s not just a flat 2D grid! Each “cell” in that grid is actually a little sub-array containing three values: Red, Green, and Blue. That is why we call it a 3D data structure.

Manipulating Data, Not Just Pixels

The magic happens when you realize you can reach into that array and change those numbers directly.

In the code we developed today, we aren’t just displaying video; we are performing data science on video frames. We explored how to:

Access individual pixels: By referencing specific coordinates in our frame array, we can pull out the color data for a single spot.
Draw shapes by modifying arrays: Notice how we don’t need a “draw square” function to put a box on the screen? We simply tell a slice of that array to equal [0, 0, 255]. We are literally changing the color values of those pixels to solid red.
Regions of Interest (ROI): This is critical for AI. You don’t always need to look at the whole frame. We learned how to “slice” the array to isolate a Region of Interest. By carving out a smaller piece of that memory, we can perform operations—like converting to grayscale—on just that section, which saves a massive amount of processing power.

Why Does This Matter?

If you want to build a robot that recognizes objects or tracks faces, you need to understand this structure. AI models don’t “see” a cat; they see a mathematical representation of that cat’s pixel values. By learning how to slice, manipulate, and convert these arrays, you are learning the fundamental language of machine learning.

We are building the foundation here, folks. Once you get comfortable with how to manipulate these arrays, we are going to start doing some really cool stuff with image processing and filtering.

Dive into that code, change those array values, and see what happens when you mess with the dimensions! Don’t just run it—experiment with it.

I’ll see you guys in the next lesson!

In this lesson we developed the following code:

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=640
H=360
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)

topBar = 65
windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera", W, H)
cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera Small",  int(W/2), int(H/2))
cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)
    
cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Gray Small", int(W/2), int(H/2))
cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)


while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    
    print(frame[int(H/2),int(W/2)])
    frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]
    
    ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

    ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]
    ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY) 
 
    frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))
    

    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.imshow("Camera Small",ROI)
    cv2.imshow("Gray Small",ROIgray)

    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=640

H=360

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

topBar = 65

windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera", W, H)

cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera Small", int(W/2), int(H/2))

cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)

cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Gray Small", int(W/2), int(H/2))

cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

print(frame[int(H/2),int(W/2)])

frame[int(H/2):int(H/2)+10,int(W/2):int(W/2)+10] = [0,0,255]

ROI = frame[int(H*.25):int(H*.75),int(W*.25):int(W*.75)].copy()

ROI[int(.25*H*.5):int(.75*H*.5),int(.25*W*.5):int(.75*W*.5)] = [0,0,0]

ROIgray = cv2.cvtColor(ROI,cv2.COLOR_BGR2GRAY)

frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Camera Small",ROI)

cv2.imshow("Gray Small",ROIgray)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

AI On the Edge, NVIDIA

AI on the Bleeding Edge: Run Llama LLM Locally on GPU CUDA with NVIDIA Jetson Orin Nano on Jetpack 7.2

June 10, 2026 admin

Hey there, world! Paul McWhorter here. You know me—I don’t just want to use technology; I want to understand exactly how it works under the hood. Today, we’re taking the NVIDIA Jetson Orin Nano and making it “think” right here on our own hardware.

We are bypassing the heavy, automated installers to build llama.cpp from source. This is the gold standard for high-performance AI on edge devices. Let’s get to work!

Part 1: Standalone Llama.cpp Build

First, we need to prepare our engine. This step takes the raw source code from GitHub and compiles it specifically for the Orin Nano’s GPU using the CUDA toolkit.

# =====================================================================
# PART 1: STANDALONE LLAMA.CPP BUILD
# && means only go to next command if this one works
# \ is like hitting enter
# this allows us to make a script.
# =====================================================================
cd ~ && \
git clone https://github.com/ggerganov/llama.cpp && \
cd llama.cpp && \
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc && \
cmake --build build --config Release --parallel $(nproc) && \
mkdir -p ~/models && \
wget -O ~/models/qwen.gguf https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_k_m.gguf && \
echo "=== Part 1 Complete: Standalone High-Performance Backend Built ==="

# =====================================================================

# PART 1: STANDALONE LLAMA.CPP BUILD

# && means only go to next command if this one works

# \ is like hitting enter

# this allows us to make a script.

# =====================================================================

cd ~ && \

git clone https://github.com/ggerganov/llama.cpp && \

cd llama.cpp && \

cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc && \

cmake --build build --config Release --parallel $(nproc) && \

mkdir -p ~/models && \

wget -O ~/models/qwen.gguf https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_k_m.gguf && \

echo "=== Part 1 Complete: Standalone High-Performance Backend Built ==="

What’s happening here? We are cloning the project, creating a “build” blueprint that tells the compiler to use your GPU (CUDA), and then using your Orin’s full processing power (nproc) to assemble the program. We also create a folder to keep our “brain” files neat and tidy.

Part 2: Run in Web Interface

Now that the engine is ready, let’s launch the server and start chatting with our first model, Qwen.

# =====================================================================
# PART 2 RUN IN WEB INTERFACE
# =====================================================================
cd ~/llama.cpp
./build/bin/llama-server \
  -m ~/models/qwen.gguf \
  --n-gpu-layers 99 \
  --port 8080

# =====================================================================

# PART 2 RUN IN WEB INTERFACE

# =====================================================================

cd ~/llama.cpp

./build/bin/llama-server \

-m ~/models/qwen.gguf \

--n-gpu-layers 99 \

--port 8080

What’s happening here? We move into the folder where we built our engine and launch the server. The --n-gpu-layers 99 flag is the magic! It tells the system to push as many model layers as possible into the GPU memory. The --port 8080 defines the digital “door” our web browser will use to chat with the AI at http://localhost:8080.

Part 3: Download and Run a New Model

One of the best things about llama.cpp is how easy it is to swap out “brains.” Let’s download a more advanced model, clear our network port, and fire it up!

# =====================================================================
# PART 3 DOWNLOAD AND RUN A NEW MODEL
# =====================================================================
#Now try different model (RUN ONE COMMAND AT A TIME)
wget -O ~/models/phi-4-mini.gguf https://huggingface.co/bartowski/microsoft_Phi-4-mini-instruct-GGUF/resolve/main/microsoft_Phi-4-mini-instruct-Q4_K_M.gguf
#between runs it is good to kill the port 8080
fuser -k 8080/tcp
# Now Lets Run the phi-4-mini.gguf
cd ~/llama.cpp
./build/bin/llama-server \
  -m ~/models/phi-4-mini.gguf \
  --n-gpu-layers 99 \
  --port 8080

#

# =====================================================================

# PART 3 DOWNLOAD AND RUN A NEW MODEL

# =====================================================================

#Now try different model (RUN ONE COMMAND AT A TIME)

wget -O ~/models/phi-4-mini.gguf https://huggingface.co/bartowski/microsoft_Phi-4-mini-instruct-GGUF/resolve/main/microsoft_Phi-4-mini-instruct-Q4_K_M.gguf

#between runs it is good to kill the port 8080

fuser -k 8080/tcp

# Now Lets Run the phi-4-mini.gguf

cd ~/llama.cpp

./build/bin/llama-server \

-m ~/models/phi-4-mini.gguf \

--n-gpu-layers 99 \

--port 8080

What’s happening here? We download the phi-4-mini model, and then we use fuser -k 8080/tcp. This acts like a “master key”—if the previous server process didn’t close properly, this forces the port open so we don’t get any “address in use” errors. Then, we launch the server again, pointing it to our new model!

You’re in Control

Once that server is live, you’re not just watching AI happen—you’re running it! Keep an eye on your terminal logs, watch that GPU utilization jump, and remember: you are working on the absolute bleeding edge of local AI performance.

Buckle up, let’s do some exciting projects together. Drop those tokens-per-second scores in the comments!

A Long Way to Go

Guys our eventual goal is to get nemoclaw operating as an agent on the Jetson Orin Nano on Jetpack 7.2. Our first effort was to run Llama and Ollama on the Jetson Orin. We were successful with that but the challenge way, using the canned install commands, we ended up running on the CPU not the Cuda GPU. Today we have a major step forward as we are now running on GPU, with the core models. Next up, we will try to get it running under Olama, while still staying on the GPU.

WHAT HAPPENS ON YOUR DESKTOP STAYS ON YOUR DESKTOP!

OK, here is your homework. Download all the models we looked at last week using the method above. When complete, you should have these models:

Model	Model Family	Size / Parameter Count	Best Used For
`gemma3:1b`	Google Gemma 3	1 Billion	Ultra-fast responses, light footprint
`llama3.2:1b`	Meta Llama 3.2	1 Billion	High-efficiency conversational loops
`phi4-mini:3.8b`	Microsoft Phi-4	3.8 Billion	Heavy reasoning and coding logic
`qwen3:4b`	Alibaba Qwen 3	4 Billion	Structured data and multilingual logic
`qwen3.5:4b`	Alibaba Qwen 3.5	4 Billion	Advanced context processing
`gemma3:4b`	Google Gemma 3	4 Billion	Maximum analytical depth on Orin Nano

AI On the Edge, Tutorial

AI on the Edge LESSON 21: Managing Multiple Windows in OpenCV on the Raspberry Pi

June 5, 2026 admin

Hey everyone, Paul McWhorter here!

Welcome back to the AI on the Edge series!

In today’s lesson, we’re going to take an important next step in computer vision. We’re going to learn how to create, position, resize, and manage multiple windows at the same time using OpenCV on the Raspberry Pi.

This might sound simple, but it’s actually a very big deal. Once you can comfortably work with multiple windows, you can start building much more powerful vision applications — like having a main camera view, a processed view, zoomed-in sections, and debug windows all running at once.

In this lesson we create:

One large main camera window
A smaller color preview
A small grayscale version
Five tiny grayscale windows stacked on the side

This gives you a clean, organized workspace while the camera is running.

What You Learned in This Lesson

How to create multiple named windows with cv2.namedWindow()
How to resize windows using cv2.resizeWindow()
How to precisely position windows on your screen with cv2.moveWindow()
How to work with different resolutions of the same image (full size, half size, quarter size)
Converting between color and grayscale while running live video
Keeping everything running smoothly with good FPS

Mastering multiple windows is one of those foundational skills that separates basic OpenCV projects from more professional and useful vision systems.

Pro Tip: Play around with the window positions and sizes after you get it working. Try making one window much larger, or experiment with different layouts. This is your workspace — make it comfortable!

Ready for the next step? In the next lesson, we’re going to start doing something really cool — we’ll begin combining live video with drawn graphics and start creating interactive vision projects.

Keep building, keep learning, and I’ll see you in the next video!

In the lesson, we develop the code below:

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=640
H=360
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)

topBar = 65
windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera", W, H)
cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera Small",  int(W/2), int(H/2))
cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)
    
cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Gray Small", int(W/2), int(H/2))
cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

grays = ["Gray1","Gray2","Gray3","Gray4","Gray5"]
x = 0
for gray in grays:
    print(gray)
    cv2.namedWindow(gray,cv2.WINDOW_GUI_NORMAL)
    cv2.resizeWindow(gray,int(W/4),int(H/4))
    cv2.moveWindow(gray,W,topBar + x*(int(H/4) + windowWaste))
    x=x+1

while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))
    frameGraySmall = cv2.cvtColor(frameSmall,cv2.COLOR_BGR2GRAY)

    frameTiny1 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))
    frameTiny2 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))
    frameTiny3= cv2.resize(frameGraySmall,(int(W/4),int(H/4)))
    frameTiny4 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))
    frameTiny5 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))
    

    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.imshow("Camera Small",frameSmall)
    cv2.imshow("Gray Small",frameGraySmall)
    
    cv2.imshow("Gray1",frameTiny1)
    cv2.imshow("Gray2",frameTiny2)
    cv2.imshow("Gray3",frameTiny3)
    cv2.imshow("Gray4",frameTiny4)
    cv2.imshow("Gray5",frameTiny5)


    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=640

H=360

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

topBar = 65

windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera", W, H)

cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera Small", int(W/2), int(H/2))

cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)

cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Gray Small", int(W/2), int(H/2))

cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

grays = ["Gray1","Gray2","Gray3","Gray4","Gray5"]

x = 0

for gray in grays:

print(gray)

cv2.namedWindow(gray,cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow(gray,int(W/4),int(H/4))

cv2.moveWindow(gray,W,topBar + x*(int(H/4) + windowWaste))

x=x+1

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))

frameGraySmall = cv2.cvtColor(frameSmall,cv2.COLOR_BGR2GRAY)

frameTiny1 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))

frameTiny2 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))

frameTiny3= cv2.resize(frameGraySmall,(int(W/4),int(H/4)))

frameTiny4 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))

frameTiny5 = cv2.resize(frameGraySmall,(int(W/4),int(H/4)))

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Camera Small",frameSmall)

cv2.imshow("Gray Small",frameGraySmall)

cv2.imshow("Gray1",frameTiny1)

cv2.imshow("Gray2",frameTiny2)

cv2.imshow("Gray3",frameTiny3)

cv2.imshow("Gray4",frameTiny4)

cv2.imshow("Gray5",frameTiny5)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

AI On the Edge, Tutorial

AI on the Edge LESSON 20: Resizing, Moving, Converting and Tiling Video frames in OpenCV

June 1, 2026 admin

Welcome back to the AI on the Edge class series! In this lesson, we are diving deep into some of the most critical foundational skills you need when working with video streams on edge devices: Resizing, Moving, Converting, and Tiling video frames using OpenCV.

When you are developing real-world AI applications on the edge, you rarely just display a single camera feed. You often need to manipulate frames to feed them into your AI models, look at grayscale versions for edge detection, or arrange multiple windows on your desktop neatly so you can monitor your data visually.

If you want to follow along exactly as we do in the video, make sure you have your Raspberry Pi 5 set up with your Camera Module.

What We Cover in This Lesson

Fixed FPS Estimation: We continue using our robust low-pass filter formula to track smooth, non-jittery frames-per-second data directly on the video frame.
Creating Named Windows: Understanding how cv2.namedWindow() combined with cv2.WINDOW_GUI_NORMAL gives you absolute programmatic control over the placement of your displays.
Resizing & Moving Windows: How to accurately position multiple OpenCV windows on your screen using specific coordinates while accounting for operating system taskbars and window decorative margins.
Frame Manipulation: Using cv2.resize() to scale down video frames and cv2.cvtColor() to transform the color space from BGR to grayscale.
Window Tiling: Arranging a main camera view, a scaled-down color view, and a scaled-down grayscale view in a perfect grid layout on your desktop.

The Complete Lesson 20 Code

Below is the complete Python code we developed during this lesson. It sets up your hardware camera stream, calculates running performance metrics, processes three distinct variations of the video feed, and tiles them cleanly on your screen.

import cv2
import time
from picamera2 import Picamera2
piCam = Picamera2()
W=640
H=360
tStart = time.time()
fps = 0
RES = (W,H)
piCam.preview_configuration.main.size = RES
piCam.preview_configuration.main.format = "RGB888"
piCam.preview_configuration.controls.FrameRate=60
piCam.preview_configuration.align()
piCam.configure("preview")
piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))
fontFace = cv2.FONT_HERSHEY_SIMPLEX
fontThickness = int(W/425)
fontScale = H*.0015
fontColor = (0,0,255)

topBar = 65
windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera", W, H)
cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Camera Small",  int(W/2), int(H/2))
cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)
    
cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)
cv2.resizeWindow("Gray Small", int(W/2), int(H/2))
cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)


while True:
    deltaT = time.time() - tStart
    tStart=time.time()
    fps = fps*.95 + (1/deltaT)*.05
    frame= piCam.capture_array()
    frame=cv2.flip(frame,-1)
    frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))
    frameGraySmall = cv2.cvtColor(frameSmall,cv2.COLOR_BGR2GRAY)

    myText = "FPS: "+str(round(fps,1))
    cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)
    cv2.imshow("Camera", frame)
    cv2.imshow("Camera Small",frameSmall)
    cv2.imshow("Gray Small",frameGraySmall)

    if cv2.waitKey(1)==ord('q'):
        break
cv2.destroyAllWindows()
print('Program Terminated')

import cv2

import time

from picamera2 import Picamera2

piCam = Picamera2()

W=640

H=360

tStart = time.time()

fps = 0

RES = (W,H)

piCam.preview_configuration.main.size = RES

piCam.preview_configuration.main.format = "RGB888"

piCam.preview_configuration.controls.FrameRate=60

piCam.preview_configuration.align()

piCam.configure("preview")

piCam.start()

textLowerLeft = (int(W*.01),int(H*.05))

fontFace = cv2.FONT_HERSHEY_SIMPLEX

fontThickness = int(W/425)

fontScale = H*.0015

fontColor = (0,0,255)

topBar = 65

windowWaste = 25

cv2.namedWindow("Camera", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera", W, H)

cv2.moveWindow("Camera",0,topBar)

cv2.namedWindow("Camera Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Camera Small", int(W/2), int(H/2))

cv2.moveWindow("Camera Small",0,topBar+windowWaste+H)

cv2.namedWindow("Gray Small", cv2.WINDOW_GUI_NORMAL)

cv2.resizeWindow("Gray Small", int(W/2), int(H/2))

cv2.moveWindow("Gray Small",int(W/2),topBar+windowWaste+H)

while True:

deltaT = time.time() - tStart

tStart=time.time()

fps = fps*.95 + (1/deltaT)*.05

frame= piCam.capture_array()

frame=cv2.flip(frame,-1)

frameSmall=cv2.resize(frame,(int(W/2),int(H/2)))

frameGraySmall = cv2.cvtColor(frameSmall,cv2.COLOR_BGR2GRAY)

myText = "FPS: "+str(round(fps,1))

cv2.putText(frame,myText,textLowerLeft,fontFace,fontScale,fontColor,fontThickness)

cv2.imshow("Camera", frame)

cv2.imshow("Camera Small",frameSmall)

cv2.imshow("Gray Small",frameGraySmall)

if cv2.waitKey(1)==ord('q'):

break

cv2.destroyAllWindows()

print('Program Terminated')

Technology Tutorials

Category Archives: AI On the Edge

AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

The Core Concept: Image Slicing and ROIs

The `.copy()` Trap

Homework Assignment

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

What is a Picture, Really?

Manipulating Data, Not Just Pixels

Why Does This Matter?

AI on the Bleeding Edge: Run Llama LLM Locally on GPU CUDA with NVIDIA Jetson Orin Nano on Jetpack 7.2

Part 1: Standalone Llama.cpp Build

Part 2: Run in Web Interface

Part 3: Download and Run a New Model

You’re in Control

AI on the Edge LESSON 21: Managing Multiple Windows in OpenCV on the Raspberry Pi

What You Learned in This Lesson

AI on the Edge LESSON 20: Resizing, Moving, Converting and Tiling Video frames in OpenCV

What We Cover in This Lesson

The Complete Lesson 20 Code

Making The World a Better Place One High Tech Project at a Time. Enjoy!

The Core Concept: Image Slicing and ROIs

The .copy() Trap

Homework Assignment

What is a Picture, Really?

Manipulating Data, Not Just Pixels

Why Does This Matter?

Part 1: Standalone Llama.cpp Build

Part 2: Run in Web Interface

Part 3: Download and Run a New Model

You’re in Control

What You Learned in This Lesson

What We Cover in This Lesson

The Complete Lesson 20 Code

Making The World a Better Place One High Tech Project at a Time. Enjoy!

The `.copy()` Trap