Edge AI on the NVIDIA Jetson Orin Nano: You are Running With the Big Dogs Now!

Welcome back. If you are watching this, you’re ready to stop playing with toys and start building real-world AI. Today, we are looking at the NVIDIA Jetson Orin Nano. Let’s get one thing straight: this is not a Raspberry Pi.

Under the hood, you are working with an Ampere-architecture GPU featuring 1,024 CUDA cores and 32 Tensor cores. You have a 6-core ARM Cortex-A78AE v8.2 64-bit CPU. Depending on how you configure your power mode, you are looking at anywhere from 20 to 40 TOPS of AI performance. This is raw, unadulterated horsepower that can process multi-stream video pipelines in real-time. In the 15W mode, you are managing a delicate balance of thermals and throughput; in the 25W mode, you are pushing the limits of the silicon itself. But this power comes with a price. You have been playing in an amusement park, but now, you’re going skydiving. The guardrails are gone.

The Skydiving Mindset: In the Pi or Arduino world, everything is ‘turn-key.’ You follow the recipe, you get the cake. It’s safe. It’s predictable. But when you are dealing with 40 TOPS of compute, the environment is fundamentally different. There are no guardrails here. If you don’t do the work, if you don’t check your own gear, you hit the ground.

There is a fundamental shift in responsibility when you move from consumer hobbyist boards to professional embedded silicon. You aren’t just a user anymore; you are an architect. If you’re looking for a guaranteed result because you clicked a link, go back to the Pi. If you’re looking to master high-performance silicon, welcome to the deep end. We are ‘Running with the Big Dogs’ now.

The Infrastructure Tax: Let’s start with the cost of entry. If you are trying to develop on an Orin using a Virtual Machine or a dual-boot setup on your Windows gaming laptop, stop. Just stop. You are setting yourself up for a failure that has nothing to do with the board and everything to do with your infrastructure.

I’ll give you a horror story. I tried to dual-boot my main workstation to make it ‘easier’ to access the Ubuntu environment needed for the SDK Manager. I triggered a BitLocker conflict. It didn’t just break the bootloader; it effectively bricked my NVMe drive so thoroughly that I had to dump the drive, buy a replacement, and reload my entire backup image from scratch.

That is the ‘Big Dog’ tax. Professionals don’t risk their primary workstation for a development tool. You build a dedicated, stand-alone Ubuntu machine. That is the cost of entry. If you can’t commit to a clean Linux environment, you aren’t ready for this hardware. The SDK Manager requires low-level USB access and partition control that hypervisors simply cannot handle reliably. You want to play with the big silicon? You bring the right infrastructure.

The Illusion of Instructions: You’ve probably heard people complain that my instructions didn’t work. Or they get angry at NVIDIA because the latest JetPack caused a kernel panic. I want to tell you the truth: You aren’t following instructions; you’re following suggestions.

Look at JetPack 7.2. Thousands of people followed the official documentation to the letter, and for half of them, it failed. The ‘Super Mode’ didn’t show up. And in the frantic attempt to force it to appear, many of them bricked their boards. When you brick an Orin—and you will—you don’t get a ‘reset’ button. You get a terminal, a flashing USB cable, and the SDK Manager.

When you’re flying a jet, you don’t blame the manual when the engine flame-outs. You check the instrumentation. The Jetson is your instrumentation. If it says ‘Over-Current,’ you don’t get mad at the manufacturer—you analyze your power budget. You are pushing hardware to its thermal and electrical limits. You are choosing your destiny with every power-mode configuration you change. This isn’t a software update; it’s a battlefield.

The Oracle of Delphi: Now, let’s talk about the NVIDIA forums. Think of those forums as the Oracle of Delphi. You do not walk into that house and demand service. If you post, ‘I followed the instructions and it broke, what a goat rodeo, you guys released a broken OS,’ you are done. You will be ignored, and you will lose all professional credibility.

Here is the 12-Hour Rule: Before you post, you spend 12 hours of deep-dive, log-file-reading, self-inflicted pain on your own. You read the dmesg output. You check your logs in /var/log/syslog. You look at jtop and you watch the power rails. If you can’t describe exactly what is happening, you aren’t ready for help.

When you do post, you provide a reproduction script. You provide data. You treat those engineers with the respect they deserve. And when they respond? You shut up and listen. They are the pilot; you are the co-pilot. You do not touch the controls. You follow their lead, you execute their tests, and you report the results. Any frustration you express makes you look like a hobbyist who doesn’t understand the complexity of what they are touching. You are a guest in their house. Earn your stay.

Log-Driven Development: If your terminal isn’t covered in log outputs, you aren’t debugging—you’re guessing. Guessing is for hobbyists. Engineers measure. In the Pi world, you just write code and it works. On the Jetson, you have to think like an architect. Is your code saturating the memory bandwidth? Is your model actually hitting the Tensor cores? If you treat the Orin like a general-purpose PC, you are wasting the most powerful tool on your desk. You have to learn the power envelope. You have to learn the thermal limitations. You are driving a Ferrari in first gear if you don’t understand what’s happening under the hood.”

The Verdict: So, here is my promise to you. You will brick it. You will want to throw it against the wall. But the moment you decide to solve the problem instead of blaming the manufacturer, that is the exact moment you stop being a hobbyist and start being an engineer. You want to run with the Big Dogs? Then stop whining about the guardrails and start learning how to read the logs. See you in the next lesson.

So the question for you now is, are you really ready to Run with the Big Dogs? Are you ready to jump into the deep end of the pool, or do you want to return to the wading pond?

AI on the Edge LESSON 24: Processing Mouse Events in OpenCV on Pi 5

Welcome back, everyone! In our last lesson, we learned how to use matrix slicing to hardcode a Region of Interest (ROI) into our frames. That was a great static approach, but today we are taking interactivity to a whole new level.

In this lesson, you are going to learn how to catch Mouse Events inside your OpenCV windows. Instead of guess-and-checking coordinates in your code, you will be able to click anywhere on your live video stream to instantly grab the precise (x, y) pixel coordinates and read the exact color value of the pixel right under your mouse pointer. This is the foundational mechanic you need to build interactive, point-and-click AI applications.

The Core Concept: Mouse Callbacks and Global Frames

To listen for mouse clicks or movement, OpenCV uses what is called a Callback Function. You tell OpenCV: “Hey, keep an eye on this specific window. If the user does anything with the mouse inside it, instantly jump over to my custom function and tell me what happened.”

We set this up using:

cv2.setMouseCallback(‘Camera’, mouseAction)

The [y, x] Matrix Inversion Trap

There is a massive mathematical trap that catches almost every beginner when they start mapping mouse clicks to image matrices:

  • OpenCV Mouse Coordinates: When you move your mouse, OpenCV tracks position using standard Cartesian geometry: (x, y), where x is the column (horizontal distance from the left) and y is the row (vertical distance from the top).

  • NumPy Array Coordinates: When you plug those numbers into your image array to inspect a pixel, NumPy expects matrix indexing: [row, column].

Because rows correspond to the height (y) and columns correspond to the width (x), you must always invert the coordinates when accessing the frame array:

If you try to pass frame[x, y], your program will either crash with an “index out of bounds” error or return data from the completely wrong part of the image!

The Python Code Developed in This Lesson

Here is the complete, streamlined script we built during today’s tutorial. Copy this code into your workspace on your Raspberry Pi 5, fire it up, and watch your terminal output as you click around the video window.

We first developed this program as a simple example of processing mouse clicks, and print the detected event:

In order to make the program more useful, we developed this code that monitors the position of the mouse cursor, and reports the color of the pixel the mouse points at. The values are printed as labels on the openCV frame:

We can now take the project to the next level by setting the LED color to the color pointed at by the cursor in the openCV window. We will be using our standard circuit we have used in the earlier lessons.

Fusion Hat Circuit Diagram
This is the circuit we will use moving forward in the class

This is the code we developed to set the LED color based on the pixel position of the cursor in the openCV window.

Homework Assignment

 

Alright, it’s time to put this knowledge to work. Your homework assignment is to turn this simple reporting tool into an interactive, dynamic ROI selector. The homework is to  first create a text display under the FPS on the frame that show RGB value at the pixel position the mouse is pointing at, and the pixel location.

 Your homework assignment is to turn this simple reporting tool into an interactive, dynamic ROI selector.

  1. Start with your clean 1280×720 live camera stream.

  2. Modify your mouseAction callback function to look for specific mouse clicks.

  3. The Target Mechanic: When you Left-Click on the video window, store those specific coordinates as your upper-left corner. When you release the click, store those coordinates as your lower-right corner. As you are selecting, draw a live box outline over your ROI

  4. Using those two dynamic coordinate sets, use matrix slicing to pull a clean Region of Interest (ROI) out of the frame and instantly display it in a completely separate, standalone window called “Target ROI”.

  5. Safety Requirement: Make sure your code can handle clicks in any order without crashing (e.g., if a user right-clicks higher or further left than their left-click, write the conditional logic to sort the indices properly before slicing).

Get your black coffee ready, write your logic step-by-step from scratch, and do not copy code you can’t explain. Post your homework solution video on YouTube and drop a link in the comments section below so I can see who is running with the big dogs!

AI on the Edge LESSON 23: Creating Regions of Interest (ROI) in OpenCV with Slicing

Welcome back, everyone! In this lesson, we are stepping into a foundational aspect of computer vision: manipulation of specific regions within a video frame.

Up to this point, we have been grabbing the full frame from our camera and performing operations on the entire image. But in real-world edge AI and robotics applications, processing every single pixel of a high-resolution frame is an absolute waste of compute power. If you want to detect a license plate, track a face, or monitor a specific sensor layout on a machine, you don’t need to look at the sky or the floor. You need to isolate a Region of Interest (ROI).

In this lesson, you will learn how to use Python’s powerful matrix slicing capabilities to chop up a frame, isolate specific quadrants, manipulate pixels inside an ROI, and display multiple synchronized windows across your desktop without crashing your system footprint.

The Core Concept: Image Slicing and ROIs

In OpenCV, an image frame isn’t just a visual picture—it is a standard NumPy array. A color frame is a 3D matrix structured by rows, columns, and color channels: [Rows, Columns, Channels] or [Height, Width, Color].

Because it is a standard array, we can use standard Python slicing notation to isolate any rectangular box we want:

ROI = frame[rowstart : rowend,  colstart : colend]

The .copy() Trap

When you slice a piece of an array in Python like ROI = frame[0:100, 0:100], Python does not create a new image in your RAM. It creates a view or a pointer back to the original frame. If you modify pixels inside that ROI, you will accidentally alter your original main camera frame!

To isolate a region and modify it independently without bleeding back into your primary frame, you must explicitly use the .copy() method:

Below is the complete code script we built during the video tutorial. Copy this code exactly into your Python environment, verify your geometry setups, and run it.

Homework Assignment

Alright, it is time to earn your stripes and see if you can fly with the big dogs. Your homework assignment is to take this foundation and build a dynamic tracking target box using the array geometry principles we just learned.

  1. Create a single main camera window (640 x 360).

  2. Draw an independent rectangular ROI box that starts directly in the dead center of the screen.

  3. Using your keyboard parameters (cv2.waitKey), program the system so that using the Arrow Keys (or ‘i’, ‘j’, ‘k’, ‘l’) smoothly updates variables to move the ROI box dynamically around the screen in real-time.

  4. Crucial Constraint: Do not let your boundary indices drift off the array! You must write conditional boundaries so that if your moving target hits the edge of your $640 \times 360$ boundary layout, it locks at the frame border and prevents an out-of-bounds index crash.

  5. In a separate output window, display only the contents of the moving target box in real-time grayscaled format.

Grab your morning coffee, fire up your code editor, write the script from scratch, and do not copy-paste code you don’t understand. Leave a link to your homework solution video in the YouTube comments section so I can see your progress!

AI on the Edge LESSON 22: Understanding Pictures and Video Frames as a Data Structure

Hey guys, Paul McWhorter here with TopTechBoy.com, and today we are diving into the heart of computer vision. We’ve been playing around with getting images from the camera, but have you ever stopped to actually look at what a picture is when it’s inside your computer’s memory?

If you want to be a master of AI on the Edge, you have to stop thinking about images as “pictures” and start seeing them as what they really are: a massive, organized grid of numbers.

What is a Picture, Really?

In this lesson, we are peeling back the curtain on how OpenCV and Python handle video frames. When we call piCam.capture_array(), we aren’t just taking a snapshot; we are pulling a data array into memory.

Think of it like a giant spreadsheet where every single cell is a pixel.

  • Dimensions: Your image has a width and a height, which correspond to the number of rows and columns in that array. It is important to remember the row designator comes first, then the column, [ R, C]

  • The Depth (The RGB Channels): It’s not just a flat 2D grid! Each “cell” in that grid is actually a little sub-array containing three values: Red, Green, and Blue. That is why we call it a 3D data structure.

Manipulating Data, Not Just Pixels

The magic happens when you realize you can reach into that array and change those numbers directly.

In the code we developed today, we aren’t just displaying video; we are performing data science on video frames. We explored how to:

  1. Access individual pixels: By referencing specific coordinates in our frame array, we can pull out the color data for a single spot.

  2. Draw shapes by modifying arrays: Notice how we don’t need a “draw square” function to put a box on the screen? We simply tell a slice of that array to equal [0, 0, 255]. We are literally changing the color values of those pixels to solid red.

  3. Regions of Interest (ROI): This is critical for AI. You don’t always need to look at the whole frame. We learned how to “slice” the array to isolate a Region of Interest. By carving out a smaller piece of that memory, we can perform operations—like converting to grayscale—on just that section, which saves a massive amount of processing power.

Why Does This Matter?

If you want to build a robot that recognizes objects or tracks faces, you need to understand this structure. AI models don’t “see” a cat; they see a mathematical representation of that cat’s pixel values. By learning how to slice, manipulate, and convert these arrays, you are learning the fundamental language of machine learning.

We are building the foundation here, folks. Once you get comfortable with how to manipulate these arrays, we are going to start doing some really cool stuff with image processing and filtering.

Dive into that code, change those array values, and see what happens when you mess with the dimensions! Don’t just run it—experiment with it.

I’ll see you guys in the next lesson!

In this lesson we developed the following code:

 

AI on the Bleeding Edge: Run Llama LLM Locally on GPU CUDA with NVIDIA Jetson Orin Nano on Jetpack 7.2

Hey there, world! Paul McWhorter here. You know me—I don’t just want to use technology; I want to understand exactly how it works under the hood. Today, we’re taking the NVIDIA Jetson Orin Nano and making it “think” right here on our own hardware.

We are bypassing the heavy, automated installers to build llama.cpp from source. This is the gold standard for high-performance AI on edge devices. Let’s get to work!

Part 1: Standalone Llama.cpp Build

First, we need to prepare our engine. This step takes the raw source code from GitHub and compiles it specifically for the Orin Nano’s GPU using the CUDA toolkit.

  • What’s happening here? We are cloning the project, creating a “build” blueprint that tells the compiler to use your GPU (CUDA), and then using your Orin’s full processing power (nproc) to assemble the program. We also create a folder to keep our “brain” files neat and tidy.

Part 2: Run in Web Interface

Now that the engine is ready, let’s launch the server and start chatting with our first model, Qwen.

  • What’s happening here? We move into the folder where we built our engine and launch the server. The --n-gpu-layers 99 flag is the magic! It tells the system to push as many model layers as possible into the GPU memory. The --port 8080 defines the digital “door” our web browser will use to chat with the AI at http://localhost:8080.

Part 3: Download and Run a New Model

One of the best things about llama.cpp is how easy it is to swap out “brains.” Let’s download a more advanced model, clear our network port, and fire it up!

  • What’s happening here? We download the phi-4-mini model, and then we use fuser -k 8080/tcp. This acts like a “master key”—if the previous server process didn’t close properly, this forces the port open so we don’t get any “address in use” errors. Then, we launch the server again, pointing it to our new model!

You’re in Control

Once that server is live, you’re not just watching AI happen—you’re running it! Keep an eye on your terminal logs, watch that GPU utilization jump, and remember: you are working on the absolute bleeding edge of local AI performance.

Buckle up, let’s do some exciting projects together. Drop those tokens-per-second scores in the comments!

A Long Way to Go

Guys our eventual goal is to get nemoclaw operating as an agent on the Jetson Orin Nano on Jetpack 7.2. Our first effort was to run Llama and Ollama on the Jetson Orin. We were successful with that but the challenge way, using the canned install commands, we ended up running on the CPU not the Cuda GPU. Today we have a major step forward as we are now running on GPU, with the core models. Next up, we will try to get it running under Olama, while still staying on the GPU.

WHAT HAPPENS ON YOUR DESKTOP STAYS ON YOUR DESKTOP!

OK, here is your homework. Download all the models we looked at last week using the method above. When complete, you should have these models:

Model Model Family Size / Parameter Count Best Used For
gemma3:1b Google Gemma 3 1 Billion Ultra-fast responses, light footprint
llama3.2:1b Meta Llama 3.2 1 Billion High-efficiency conversational loops
phi4-mini:3.8b Microsoft Phi-4 3.8 Billion Heavy reasoning and coding logic
qwen3:4b Alibaba Qwen 3 4 Billion Structured data and multilingual logic
 qwen3.5:4b Alibaba Qwen 3.5 4 Billion Advanced context processing
gemma3:4b Google Gemma 3 4 Billion Maximum analytical depth on Orin Nano

Making The World a Better Place One High Tech Project at a Time. Enjoy!