AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

In this lesson we learn to become more comfortable creating Regions of Interests (ROIs). We also show some new methods to stack your windows and keep your windows organized and tightly packed. In this video lesson, we developed the following code:

 

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

Hey guys, Paul McWhorter here with TopTechBoy.com, and today we are diving into the heart of computer vision. We’ve been playing around with getting images from the camera, but have you ever stopped to actually look at what a picture is when it’s inside your computer’s memory?

If you want to be a master of AI on the Edge, you have to stop thinking about images as “pictures” and start seeing them as what they really are: a massive, organized grid of numbers.

What is a Picture, Really?

In this lesson, we are peeling back the curtain on how OpenCV and Python handle video frames. When we call piCam.capture_array(), we aren’t just taking a snapshot; we are pulling a data array into memory.

Think of it like a giant spreadsheet where every single cell is a pixel.

  • Dimensions: Your image has a width and a height, which correspond to the number of rows and columns in that array. It is important to remember the row designator comes first, then the column, [ R, C]

  • The Depth (The RGB Channels): It’s not just a flat 2D grid! Each “cell” in that grid is actually a little sub-array containing three values: Red, Green, and Blue. That is why we call it a 3D data structure.

Manipulating Data, Not Just Pixels

The magic happens when you realize you can reach into that array and change those numbers directly.

In the code we developed today, we aren’t just displaying video; we are performing data science on video frames. We explored how to:

  1. Access individual pixels: By referencing specific coordinates in our frame array, we can pull out the color data for a single spot.

  2. Draw shapes by modifying arrays: Notice how we don’t need a “draw square” function to put a box on the screen? We simply tell a slice of that array to equal [0, 0, 255]. We are literally changing the color values of those pixels to solid red.

  3. Regions of Interest (ROI): This is critical for AI. You don’t always need to look at the whole frame. We learned how to “slice” the array to isolate a Region of Interest. By carving out a smaller piece of that memory, we can perform operations—like converting to grayscale—on just that section, which saves a massive amount of processing power.

Why Does This Matter?

If you want to build a robot that recognizes objects or tracks faces, you need to understand this structure. AI models don’t “see” a cat; they see a mathematical representation of that cat’s pixel values. By learning how to slice, manipulate, and convert these arrays, you are learning the fundamental language of machine learning.

We are building the foundation here, folks. Once you get comfortable with how to manipulate these arrays, we are going to start doing some really cool stuff with image processing and filtering.

Dive into that code, change those array values, and see what happens when you mess with the dimensions! Don’t just run it—experiment with it.

I’ll see you guys in the next lesson!

In this lesson we developed the following code:

 

AI on the Bleeding Edge: Run Llama LLM Locally on GPU CUDA with NVIDIA Jetson Orin Nano on Jetpack 7.2

Hey there, world! Paul McWhorter here. You know me—I don’t just want to use technology; I want to understand exactly how it works under the hood. Today, we’re taking the NVIDIA Jetson Orin Nano and making it “think” right here on our own hardware.

We are bypassing the heavy, automated installers to build llama.cpp from source. This is the gold standard for high-performance AI on edge devices. Let’s get to work!

Part 1: Standalone Llama.cpp Build

First, we need to prepare our engine. This step takes the raw source code from GitHub and compiles it specifically for the Orin Nano’s GPU using the CUDA toolkit.

  • What’s happening here? We are cloning the project, creating a “build” blueprint that tells the compiler to use your GPU (CUDA), and then using your Orin’s full processing power (nproc) to assemble the program. We also create a folder to keep our “brain” files neat and tidy.

Part 2: Run in Web Interface

Now that the engine is ready, let’s launch the server and start chatting with our first model, Qwen.

  • What’s happening here? We move into the folder where we built our engine and launch the server. The --n-gpu-layers 99 flag is the magic! It tells the system to push as many model layers as possible into the GPU memory. The --port 8080 defines the digital “door” our web browser will use to chat with the AI at http://localhost:8080.

Part 3: Download and Run a New Model

One of the best things about llama.cpp is how easy it is to swap out “brains.” Let’s download a more advanced model, clear our network port, and fire it up!

  • What’s happening here? We download the phi-4-mini model, and then we use fuser -k 8080/tcp. This acts like a “master key”—if the previous server process didn’t close properly, this forces the port open so we don’t get any “address in use” errors. Then, we launch the server again, pointing it to our new model!

You’re in Control

Once that server is live, you’re not just watching AI happen—you’re running it! Keep an eye on your terminal logs, watch that GPU utilization jump, and remember: you are working on the absolute bleeding edge of local AI performance.

Buckle up, let’s do some exciting projects together. Drop those tokens-per-second scores in the comments!

A Long Way to Go

Guys our eventual goal is to get nemoclaw operating as an agent on the Jetson Orin Nano on Jetpack 7.2. Our first effort was to run Llama and Ollama on the Jetson Orin. We were successful with that but the challenge way, using the canned install commands, we ended up running on the CPU not the Cuda GPU. Today we have a major step forward as we are now running on GPU, with the core models. Next up, we will try to get it running under Olama, while still staying on the GPU.

WHAT HAPPENS ON YOUR DESKTOP STAYS ON YOUR DESKTOP!

OK, here is your homework. Download all the models we looked at last week using the method above. When complete, you should have these models:

Model Model Family Size / Parameter Count Best Used For
gemma3:1b Google Gemma 3 1 Billion Ultra-fast responses, light footprint
llama3.2:1b Meta Llama 3.2 1 Billion High-efficiency conversational loops
phi4-mini:3.8b Microsoft Phi-4 3.8 Billion Heavy reasoning and coding logic
qwen3:4b Alibaba Qwen 3 4 Billion Structured data and multilingual logic
 qwen3.5:4b Alibaba Qwen 3.5 4 Billion Advanced context processing
gemma3:4b Google Gemma 3 4 Billion Maximum analytical depth on Orin Nano

No Cloud. No Internet. No Problem. Two Commands for Local LLM on Jetson Orin Nano

Hey guys, welcome back to the channel. Paul McWhorter here from TopTechBoy.com. Today, we aren’t just messing around with simple circuits or basic scripts—we are going to take that NVIDIA Jetson Orin Nano we rescued from the brink of destruction in the last video, and we are going to turn it into a completely sovereign, local thinking machine.

I don’t know about you, but I am tired of Big Tech telling me I need a credit card, a monthly subscription, and a constant high-speed internet connection just to make an AI model reply to a prompt. Today, we are going to do it completely naked. We are going to cut the cord, pull the ethernet, and run cutting-edge Large Language Models entirely on the local physical silicon of your Jetson Orin Nano.

And we are going to do it in exactly two commands. One to build the engine room, and one to fire up the mind.

Let’s get started.

The Hardware Architecture

Before we drop the code into the terminal, let’s understand exactly what we are building today. We are dealing with three core components working together in a unified system.

  • The Model (The Fuel): This is your raw neural network file (like Google Gemma or Meta Llama). It contains the weights, vocabulary, and potential intelligence. On its own, it’s just a massive, inert file sitting on your storage drive.

  • Ollama (The Engine Room): This is the heavy lifter. Ollama is a local execution framework that takes that raw model file and boots it directly into the Jetson’s unified RAM and CUDA cores. It handles the brutal mathematical calculations required to generate tokens.

  • The Terminal Chat (The Dashboard): This is your interface. It provides the clean command-line text box for you to type your prompts and prints the model’s responses back to you in real time.

The Two-Command Installation

Go ahead and fire up your Jetson Orin Nano, open a fresh terminal window, and get ready to type. Remember: copying and pasting makes you weak. Type these out like a real engineer so your hands learn the muscle memory.

Command 1: Install the Ollama Engine

This command fetches the official automated bootstrapper script from Ollama and executes it locally to configure the background system service on your host OS.

Command 2: Fire Up the Local Model

Once the installation script finishes, your engine room is live. Now, tell Ollama to pull down the optimized 1-billion parameter Google Gemma model and launch an interactive local dialog loop instantly:

The moment you hit enter, your Jetson will download the model weights directly to your local drive, load them straight into the VRAM, and drop you into a clean prompt box. Type a question, hit enter, and watch your local silicon generate answers with zero cloud dependencies.

Choosing the Right Mind for Your Machine

The beautiful part about setting up Ollama is that you aren’t locked into just one model. Different models have different parameter sizes and strengths. On the 8GB Jetson Orin Nano, you want to balance model size against your available hardware headroom to keep your generation speeds crisp.

Here are the verified, hardware-accelerated local models you can experiment with right out of the box:

Launch Command Model Family Size / Parameter Count Best Used For
ollama run gemma3:1b Google Gemma 3 1 Billion Ultra-fast responses, light footprint
ollama run llama3.2:1b Meta Llama 3.2 1 Billion High-efficiency conversational loops
ollama run phi4-mini:3.8b Microsoft Phi-4 3.8 Billion Heavy reasoning and coding logic
ollama run qwen3:4b Alibaba Qwen 3 4 Billion Structured data and multilingual logic
ollama run qwen3.5:4b Alibaba Qwen 3.5 4 Billion Advanced context processing
ollama run gemma3:4b Google Gemma 3 4 Billion Maximum analytical depth on Orin Nano

⚠️ Paul’s Engineering Note on Headroom

The 1B (1-Billion parameter) models are incredibly light and will run at lightning speed on the Orin Nano. If you want to push the machine harder for more complex reasoning, step up to the 3.8B or 4B models. Just keep an eye on your system resources—running a 4B model pushes close to the limits of the Orin Nano’s 8GB unified memory architecture, especially if you are running a heavy graphical desktop environment in the background!

To exit out of any active terminal chat session and return to your standard command prompt, simply type:

Homework Assignment

Alright, you have the hardware running, you have the engine installed, and you know how to switch out the minds of your machine. Now it’s time for your homework.

I want you to install both the gemma3:1b model and the heavier gemma3:4b model on your Jetson Orin Nano. Run them both through a test sequence: ask them to write a simple Python script, and then ask them a complex logic riddle.

I want you to observe the difference in quality of thought versus speed of generation. Is the 4-billion parameter model smart enough to justify the extra computation time on your hardware, or does the 1-billion parameter model give you the snappy responsiveness you need for a real-time edge application?

Leave a comment down under the video showing your results, tell me which model you prefer running natively on your bench, and I will see you guys in the next lesson!

AI on the Edge LESSON 21: Managing Multiple Windows in OpenCV on the Raspberry Pi

Hey everyone, Paul McWhorter here!

Welcome back to the AI on the Edge series!

In today’s lesson, we’re going to take an important next step in computer vision. We’re going to learn how to create, position, resize, and manage multiple windows at the same time using OpenCV on the Raspberry Pi.

This might sound simple, but it’s actually a very big deal. Once you can comfortably work with multiple windows, you can start building much more powerful vision applications — like having a main camera view, a processed view, zoomed-in sections, and debug windows all running at once.

In this lesson we create:

  • One large main camera window
  • A smaller color preview
  • A small grayscale version
  • Five tiny grayscale windows stacked on the side

This gives you a clean, organized workspace while the camera is running.


What You Learned in This Lesson

  • How to create multiple named windows with cv2.namedWindow()
  • How to resize windows using cv2.resizeWindow()
  • How to precisely position windows on your screen with cv2.moveWindow()
  • How to work with different resolutions of the same image (full size, half size, quarter size)
  • Converting between color and grayscale while running live video
  • Keeping everything running smoothly with good FPS

Mastering multiple windows is one of those foundational skills that separates basic OpenCV projects from more professional and useful vision systems.


Pro Tip: Play around with the window positions and sizes after you get it working. Try making one window much larger, or experiment with different layouts. This is your workspace — make it comfortable!


Ready for the next step? In the next lesson, we’re going to start doing something really cool — we’ll begin combining live video with drawn graphics and start creating interactive vision projects.

Keep building, keep learning, and I’ll see you in the next video!

In the lesson, we develop the code below:

 

Making The World a Better Place One High Tech Project at a Time. Enjoy!