Category Archives: NVIDIA

Engineering Your Own Local Voice Assistant: No Cloud, No Compromise

Most “smart” voice assistants are just glorified remote controls for someone else’s server. Today, we’re changing that. We are going to build a local, offline voice command pipeline. This isn’t just about saving data; it’s about ownership. When you can control your hardware—like opening and closing a farm gate—without an internet connection, you have built a system that is robust, private, and yours to control forever. Today you are going to get Speech to Text up and running on your NVIDIA Jetson Orin Nano running under Jetpack 7.2.

The “Why” Behind the Setup

You might ask, “Why not just use a cloud API?” Because cloud APIs are fragile. They rely on internet stability, external servers, and privacy-invasive data logging. By running Vosk locally, we keep the processing on your hardware (like the NVIDIA Jetson). It’s faster, it works in the middle of a power-isolated homestead, and it’s 100% secure.

Part 1: Preparing the Environment

Before we can make the machine listen, we have to prepare the battlefield. We aren’t just downloading files; we are setting up a stable environment where your dependencies won’t conflict with your OS.

# Create a dedicated directory so our work stays organized
mkdir -p ~/STT
cd ~/STT

# We need the PortAudio headers. Why? Because Python’s 'pyaudio' is just a wrapper.
# Under the hood, it talks to the C-based PortAudio library. If the headers aren't there,
# the library won't compile, and your microphone will never wake up.
sudo apt update
sudo apt install -y portaudio19-dev wget unzip

# Download the lightweight, low-latency English model weights from Vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip

# Unzip the archive and rename the folder cleanly to 'model' so the Python script finds it
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model

# Clean up the downloaded zip file to keep the folder pristine
rm vosk-model-small-en-us-0.15.zip

# Setting up a Virtual Environment (venv) is the hallmark of a pro.
# It traps our project dependencies inside this folder so we don't break our system Python.
python3 -m venv sttVenv
source sttVenv/bin/activate
pip install --upgrade pip

# Vosk does the heavy lifting for speech recognition, and Pyaudio grabs the raw data.
pip install vosk pyaudio

# Create a dedicated directory so our work stays organized

mkdir -p ~/STT

cd ~/STT

# We need the PortAudio headers. Why? Because Python’s 'pyaudio' is just a wrapper.

# Under the hood, it talks to the C-based PortAudio library. If the headers aren't there,

# the library won't compile, and your microphone will never wake up.

sudo apt update

sudo apt install -y portaudio19-dev wget unzip

# Download the lightweight, low-latency English model weights from Vosk

wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip

# Unzip the archive and rename the folder cleanly to 'model' so the Python script finds it

unzip vosk-model-small-en-us-0.15.zip

mv vosk-model-small-en-us-0.15 model

# Clean up the downloaded zip file to keep the folder pristine

rm vosk-model-small-en-us-0.15.zip

# Setting up a Virtual Environment (venv) is the hallmark of a pro.

# It traps our project dependencies inside this folder so we don't break our system Python.

python3 -m venv sttVenv

source sttVenv/bin/activate

pip install --upgrade pip

# Vosk does the heavy lifting for speech recognition, and Pyaudio grabs the raw data.

pip install vosk pyaudio

This is IMPORTANT!

Now you can post the code below. You also have to point Thonny to run in the virtual environment. Open Thonny, and under run – select interpreter. Then you must point it to /home/yourUserName/STT/ttsVenv/bin/python3. For me, my username is pjm, but you put in your user name in path above. Here is what mine looked like:

Part 2: Solving the PipeWire Challenge

The biggest headache in modern Linux audio is PipeWire. If you try to open a microphone stream using a hardcoded sample rate that doesn’t match your hardware, your program won’t just fail—it will segfault. We use the validation script below to programmatically query the hardware, asking it: “What sample rate are you running at?” before we even try to open a stream.

# Import the PyAudio library to handle real-time microphone recording and audio stream manipulation.
import pyaudio
# Import the built-in JSON module to parse the structured text results returned by the speech recognition engine.
import json
# Import the audioop library to perform real-time mathematical operations and downsampling on raw audio byte data.
import audioop
# Import the core Vosk objects: Model loads the AI brain/weights, and KaldiRecognizer handles the acoustic processing.
from vosk import Model, KaldiRecognizer
# Import the standard OS module to reliably build absolute system file paths across different user directories.
import os
state=None

# Define a function to discover the specific hardware index and sample rate of the active system sound server.
def get_pipewire_index():
    # Instantiate a PyAudio session object to query the operating system's hardware configuration list.
    p = pyaudio.PyAudio()
    # Loop through every single available audio device reported by the operating system kernel.
    for i in range(p.get_device_count()):
        # Retrieve a descriptive configuration dictionary for the current audio device index.
        dev = p.get_device_info_by_index(i)
        # Check if the name of the device contains 'pipewire' to target the modern Linux audio subsystem.
        if "pipewire" in dev.get('name', '').lower():
            # Return the valid device index integer and its hardware default sample rate, converted cleanly to an integer.
            return i, int(dev.get('defaultSampleRate'))
    # Return empty values if the loop completes without finding an active PipeWire sound server profile.
    return None, None

# Execute the discovery function to capture both the correct system hardware device index and its native sample rate.
dev_index, native_rate = get_pipewire_index()

# Check if the hardware discovery handshake failed to locate an active PipeWire audio interface.
if dev_index is None:
    # Alert the user on screen that the script cannot communicate with the system's active audio server layer.
    print("Error: Could not find 'pipewire' device.")
    # Stop the script execution immediately since it is impossible to record audio without a valid device hook.
    exit()

# Instantiate a fresh PyAudio session specifically dedicated to opening our incoming recording channel.
p = pyaudio.PyAudio()
# Open the active system recording stream channel using explicit audio hardware parameter configurations.
# PARAMETER MANUAL:
# - format=pyaudio.paInt16: Sets 16-bit Signed Integer PCM format. (LEAVE ALONE - required by the Vosk engine).
# - channels=1: Sets recording to Mono channel audio. (LEAVE ALONE - Vosk processes single-channel input).
# - rate=native_rate: Sets the sample rate to match the mic exactly. (LEAVE ALONE - prevents driver hardware failure).
# - input=True: Configures this stream as an audio capture device. (LEAVE ALONE - required to wake up the microphone).
# - input_device_index=dev_index: Points to our discovered target device. (LEAVE ALONE - ensures we grab the right mic).
# - frames_per_buffer=4096: Controls audio chunk size. (PLAY WITH: Lower values like 2048 reduce latency but risk buffer overflows; higher values like 8192 increase safety but introduce slight latency).
stream = p.open(format=pyaudio.paInt16, channels=1, rate=native_rate, 
                input=True, input_device_index=dev_index, frames_per_buffer=4096)

# Generate an absolute directory path targeting the exact folder containing your extracted acoustic model weights.
# PARAMETER MANUAL:
# - "STT", "model": Represents folder names. (PLAY WITH: Change these only if your physical folder layout uses different names).
model_path = os.path.join(os.path.expanduser("~"), "STT", "model")
# Load the heavy machine learning weights into system memory to initialize the Vosk voice processing brain.
model = Model(model_path)
# Initialize the primary speech recognizer object using the loaded model weights and target processing frequency.
# PARAMETER MANUAL:
# - model: The initialized AI brain variable. (LEAVE ALONE).
# - 16000: Specifies the target processing sample rate in Hertz. (LEAVE ALONE - Vosk models are mathematically trained for exactly 16000Hz).
recognizer = KaldiRecognizer(model, 16000)

# Print a status message to the terminal showing that the handshake succeeded and revealing the mic's operating frequency.
print(f"Listening on PipeWire at {native_rate}Hz...")
# Turn on the hardware recording valve to begin filling up the system input data buffers.
stream.start_stream()

# Set up an exception safety block to intercept manual user exits gracefully without leaving system resources locked up.
try:
    # Launch the infinite real-time audio capture loop to continuously monitor the microphone stream.
    while True:
        # Pull raw binary audio frame bytes directly out of the incoming hardware stream buffer memory block.
        # PARAMETER MANUAL:
        # - 4096: The chunk size to read. (LEAVE ALONE - must perfectly match your stream's frames_per_buffer setting).
        # - exception_on_overflow=False: Prevents Python crashes if the Orin momentarily falls behind. (LEAVE ALONE).
        data = stream.read(4096, exception_on_overflow=False)
        # Downsample the raw audio bytes mathematically to 16000Hz using digital filtering algorithms.
        # PARAMETER MANUAL:
        # - data: Raw input audio byte string. (LEAVE ALONE).
        # - 2: Size of sample width in bytes for 16-bit sound. (LEAVE ALONE).
        # - 1: Input audio channel count. (LEAVE ALONE).
        # - native_rate: The native frequency of your mic. (LEAVE ALONE).
        # - 16000: The required downsampled output frequency. (LEAVE ALONE).
        # - state: The digital filtering conversion state history object. (LEAVE ALONE).
        # CRITICAL NOTE: The 'state' variable must be initialized as 'state = None' before entering this 'try:' block, 
        # otherwise Python will throw a NameError on the very first loop iteration because 'state' does not exist yet.
        data_16k, state = audioop.ratecv(data, 2, 1, native_rate, 16000, state)
        
        # Feed the processed 16kHz audio data block into Vosk to evaluate if a complete spoken phrase has finished.
        if recognizer.AcceptWaveform(data_16k):
            # Parse the structured JSON text string returned by the acoustic model into a standard Python dictionary.
            res = json.loads(recognizer.Result())
            # Check the parsed result dictionary to ensure that the detected text sequence is not completely blank.
            if res.get("text"):
                # Cleanly output the decoded text command straight to the standard system terminal display screen.
                print("Command: "+res['text'])
# Intercept the user pressing Ctrl+C in the terminal to stop the continuous script loop cleanly.
except KeyboardInterrupt:
    # Print a clean closing status notification on a new line to confirm the signal was intercepted.
    print("\nStopped.")
# Execute mandatory clean up instructions that are guaranteed to run no matter how the script terminates or crashes.
finally:
    # Turn off the active recording stream data valve to stop pulling raw information from the audio driver layer.
    stream.stop_stream()
    # Close the recording stream channel completely to drop the active hardware pointer binding handles.
    stream.close()
    # Terminate the parent PyAudio engine instance to release all allocated audio device resources back to the OS.
    p.terminate()

100

101

102

103

104

105

106

107

108

# Import the PyAudio library to handle real-time microphone recording and audio stream manipulation.

import pyaudio

# Import the built-in JSON module to parse the structured text results returned by the speech recognition engine.

import json

# Import the audioop library to perform real-time mathematical operations and downsampling on raw audio byte data.

import audioop

# Import the core Vosk objects: Model loads the AI brain/weights, and KaldiRecognizer handles the acoustic processing.

from vosk import Model, KaldiRecognizer

# Import the standard OS module to reliably build absolute system file paths across different user directories.

import os

state=None

# Define a function to discover the specific hardware index and sample rate of the active system sound server.

def get_pipewire_index():

# Instantiate a PyAudio session object to query the operating system's hardware configuration list.

p = pyaudio.PyAudio()

# Loop through every single available audio device reported by the operating system kernel.

for i in range(p.get_device_count()):

# Retrieve a descriptive configuration dictionary for the current audio device index.

dev = p.get_device_info_by_index(i)

# Check if the name of the device contains 'pipewire' to target the modern Linux audio subsystem.

if "pipewire" in dev.get('name', '').lower():

# Return the valid device index integer and its hardware default sample rate, converted cleanly to an integer.

return i, int(dev.get('defaultSampleRate'))

# Return empty values if the loop completes without finding an active PipeWire sound server profile.

return None, None

# Execute the discovery function to capture both the correct system hardware device index and its native sample rate.

dev_index, native_rate = get_pipewire_index()

# Check if the hardware discovery handshake failed to locate an active PipeWire audio interface.

if dev_index is None:

# Alert the user on screen that the script cannot communicate with the system's active audio server layer.

print("Error: Could not find 'pipewire' device.")

# Stop the script execution immediately since it is impossible to record audio without a valid device hook.

exit()

# Instantiate a fresh PyAudio session specifically dedicated to opening our incoming recording channel.

p = pyaudio.PyAudio()

# Open the active system recording stream channel using explicit audio hardware parameter configurations.

# PARAMETER MANUAL:

# - format=pyaudio.paInt16: Sets 16-bit Signed Integer PCM format. (LEAVE ALONE - required by the Vosk engine).

# - channels=1: Sets recording to Mono channel audio. (LEAVE ALONE - Vosk processes single-channel input).

# - rate=native_rate: Sets the sample rate to match the mic exactly. (LEAVE ALONE - prevents driver hardware failure).

# - input=True: Configures this stream as an audio capture device. (LEAVE ALONE - required to wake up the microphone).

# - input_device_index=dev_index: Points to our discovered target device. (LEAVE ALONE - ensures we grab the right mic).

# - frames_per_buffer=4096: Controls audio chunk size. (PLAY WITH: Lower values like 2048 reduce latency but risk buffer overflows; higher values like 8192 increase safety but introduce slight latency).

stream = p.open(format=pyaudio.paInt16, channels=1, rate=native_rate,

input=True, input_device_index=dev_index, frames_per_buffer=4096)

# Generate an absolute directory path targeting the exact folder containing your extracted acoustic model weights.

# PARAMETER MANUAL:

# - "STT", "model": Represents folder names. (PLAY WITH: Change these only if your physical folder layout uses different names).

model_path = os.path.join(os.path.expanduser("~"), "STT", "model")

# Load the heavy machine learning weights into system memory to initialize the Vosk voice processing brain.

model = Model(model_path)

# Initialize the primary speech recognizer object using the loaded model weights and target processing frequency.

# PARAMETER MANUAL:

# - model: The initialized AI brain variable. (LEAVE ALONE).

# - 16000: Specifies the target processing sample rate in Hertz. (LEAVE ALONE - Vosk models are mathematically trained for exactly 16000Hz).

recognizer = KaldiRecognizer(model, 16000)

# Print a status message to the terminal showing that the handshake succeeded and revealing the mic's operating frequency.

print(f"Listening on PipeWire at {native_rate}Hz...")

# Turn on the hardware recording valve to begin filling up the system input data buffers.

stream.start_stream()

# Set up an exception safety block to intercept manual user exits gracefully without leaving system resources locked up.

try:

# Launch the infinite real-time audio capture loop to continuously monitor the microphone stream.

while True:

# Pull raw binary audio frame bytes directly out of the incoming hardware stream buffer memory block.

# PARAMETER MANUAL:

# - 4096: The chunk size to read. (LEAVE ALONE - must perfectly match your stream's frames_per_buffer setting).

# - exception_on_overflow=False: Prevents Python crashes if the Orin momentarily falls behind. (LEAVE ALONE).

data = stream.read(4096, exception_on_overflow=False)

# Downsample the raw audio bytes mathematically to 16000Hz using digital filtering algorithms.

# PARAMETER MANUAL:

# - data: Raw input audio byte string. (LEAVE ALONE).

# - 2: Size of sample width in bytes for 16-bit sound. (LEAVE ALONE).

# - 1: Input audio channel count. (LEAVE ALONE).

# - native_rate: The native frequency of your mic. (LEAVE ALONE).

# - 16000: The required downsampled output frequency. (LEAVE ALONE).

# - state: The digital filtering conversion state history object. (LEAVE ALONE).

# CRITICAL NOTE: The 'state' variable must be initialized as 'state = None' before entering this 'try:' block,

# otherwise Python will throw a NameError on the very first loop iteration because 'state' does not exist yet.

data_16k, state = audioop.ratecv(data, 2, 1, native_rate, 16000, state)

# Feed the processed 16kHz audio data block into Vosk to evaluate if a complete spoken phrase has finished.

if recognizer.AcceptWaveform(data_16k):

# Parse the structured JSON text string returned by the acoustic model into a standard Python dictionary.

res = json.loads(recognizer.Result())

# Check the parsed result dictionary to ensure that the detected text sequence is not completely blank.

if res.get("text"):

# Cleanly output the decoded text command straight to the standard system terminal display screen.

print("Command: "+res['text'])

# Intercept the user pressing Ctrl+C in the terminal to stop the continuous script loop cleanly.

except KeyboardInterrupt:

# Print a clean closing status notification on a new line to confirm the signal was intercepted.

print("\nStopped.")

# Execute mandatory clean up instructions that are guaranteed to run no matter how the script terminates or crashes.

finally:

# Turn off the active recording stream data valve to stop pulling raw information from the audio driver layer.

stream.stop_stream()

# Close the recording stream channel completely to drop the active hardware pointer binding handles.

stream.close()

# Terminate the parent PyAudio engine instance to release all allocated audio device resources back to the OS.

p.terminate()

Homework: Your Gate Controller

You now have a system that identifies audio input, resamples it to 16kHz, and outputs text. Your assignment: Transform this text output into an action.

I want you to add a conditional statement to the main loop. If the recognized text is “open”, print an ASCII art representation of an open gate. If it’s “close”, print the closed version. This is the first step in closing the loop between your AI and the physical world. Go get ’em, and don’t just copy the code—understand how the data flows from the microphone to your decision logic!

NVIDIA, NVIDIA Jetson Orin Nano

AI on the NVIDIA Jetson Orin Nano: Adding Text-to-Speech (TTS) with Piper

July 20, 2026 admin

In this lesson, we are building a local, offline voice pipeline using Piper. This engine runs natively on our Jetson Orin Nano hardware, providing fast and natural speech without needing an internet connection. To keep this simple, we will install everything into one specific folder so the software can easily find its own files.

Step 1: System Prep & Piper Installation

Open your terminal and run these commands one by one to create the workspace and download the required files. We are placing everything into the same directory to ensure the AI engine can always find the voice model.

# 1. Update and install dependencies
sudo apt update
sudo apt install -y wget tar libasound2-dev

# 2. Create the main project folder
mkdir -p ~/voiceAssistant/piper
cd ~/voiceAssistant/piper

# 3. Download and extract the Piper binary
wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_aarch64.tar.gz
tar -xvzf piper_linux_aarch64.tar.gz

# 4. Enter the folder where Piper was extracted
cd piper

# 5. Download the voice model files into this same folder
wget -O en_US-lessac-medium.onnx 'https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx?download=true'
wget -O en_US-lessac-medium.onnx.json 'https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json?download=true'

# 6. Grant execution permissions to the Piper engine
chmod +x piper

# 1. Update and install dependencies

sudo apt update

sudo apt install -y wget tar libasound2-dev

# 2. Create the main project folder

mkdir -p ~/voiceAssistant/piper

cd ~/voiceAssistant/piper

# 3. Download and extract the Piper binary

wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_aarch64.tar.gz

tar -xvzf piper_linux_aarch64.tar.gz

# 4. Enter the folder where Piper was extracted

cd piper

# 5. Download the voice model files into this same folder

wget -O en_US-lessac-medium.onnx 'https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx?download=true'

wget -O en_US-lessac-medium.onnx.json 'https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json?download=true'

# 6. Grant execution permissions to the Piper engine

chmod +x piper

Step 2: Identify Your Audio Device

Because every setup is different, we need to tell the system which speaker to use. Run the following command in your terminal:

aplay -l

aplay -l

Look through the list for your speaker. You will see something like card 0 and device 0. If your card is 0 and device is 0, your identifier is plughw:0,0. You will use these numbers in the Python script below.

Step 3: The Python Pipeline Script

Create a new Python file and paste the code below. Because we installed everything into the same folder, this script will find the files immediately.

import subprocess

# --- CONFIGURATION ---
# piperPath: The location of the downloaded Piper executable file.
# modelPath: The path to the downloaded AI voice model file (.onnx).
# audioDevice: The specific hardware address of the speaker (e.g., plughw:0,0).
piperPath = "/home/pjm/voiceAssistant/piper/piper/piper"
modelPath = "/home/pjm/voiceAssistant/piper/piper/en_US-lessac-medium.onnx"
audioDevice = "plughw:0,0"

# --- BUILD THE COMMANDS ---
# piperCommand: This runs the AI engine. 
# --output-raw tells Piper to send out raw audio data instead of a .wav file.
piperCommand = piperPath + " --model " + modelPath + " --output-raw"

# aplayCommand: This is the Linux sound player that pipes audio to your speakers.
# -D: Specifies the device (hardware).
# -r 22050: Sets the sample rate. Piper requires 22050Hz for this model.
# -f S16_LE: Sets the format to 16-bit Signed Little Endian (the raw digital format).
# -t raw: Tells aplay that the incoming data has no header and is pure audio.
# -: The hyphen tells aplay to read input from 'stdin' (the data flowing from the pipe).
aplayCommand = "aplay -D " + audioDevice + " -r 22050 -f S16_LE -t raw -"

# fullCommand: We connect the two commands using a pipe '|'. 
# This takes the output of the AI (piperCommand) and feeds it directly 
# into the speaker input (aplayCommand).
fullCommand = piperCommand + " | " + aplayCommand

# --- RUN THE PIPELINE ---
# subprocess.Popen allows us to run the command in the background while
# letting our Python code continue to run.
speechProcess = subprocess.Popen(fullCommand, shell=True, stdin=subprocess.PIPE)

# speechMessage: The text we want the AI to read.
# We must encode it as bytes (b"...") to send it to the system.
speechMessage = "Hello students. The voice pipeline is verified and operational."
speakable=speechMessage.encode('utf-8')

# communicate: This sends the text message into the running Piper process.
speechProcess.communicate(input=speakable)

import subprocess

# --- CONFIGURATION ---

# piperPath: The location of the downloaded Piper executable file.

# modelPath: The path to the downloaded AI voice model file (.onnx).

# audioDevice: The specific hardware address of the speaker (e.g., plughw:0,0).

piperPath = "/home/pjm/voiceAssistant/piper/piper/piper"

modelPath = "/home/pjm/voiceAssistant/piper/piper/en_US-lessac-medium.onnx"

audioDevice = "plughw:0,0"

# --- BUILD THE COMMANDS ---

# piperCommand: This runs the AI engine.

# --output-raw tells Piper to send out raw audio data instead of a .wav file.

piperCommand = piperPath + " --model " + modelPath + " --output-raw"

# aplayCommand: This is the Linux sound player that pipes audio to your speakers.

# -D: Specifies the device (hardware).

# -r 22050: Sets the sample rate. Piper requires 22050Hz for this model.

# -f S16_LE: Sets the format to 16-bit Signed Little Endian (the raw digital format).

# -t raw: Tells aplay that the incoming data has no header and is pure audio.

# -: The hyphen tells aplay to read input from 'stdin' (the data flowing from the pipe).

aplayCommand = "aplay -D " + audioDevice + " -r 22050 -f S16_LE -t raw -"

# fullCommand: We connect the two commands using a pipe '|'.

# This takes the output of the AI (piperCommand) and feeds it directly

# into the speaker input (aplayCommand).

fullCommand = piperCommand + " | " + aplayCommand

# --- RUN THE PIPELINE ---

# subprocess.Popen allows us to run the command in the background while

# letting our Python code continue to run.

speechProcess = subprocess.Popen(fullCommand, shell=True, stdin=subprocess.PIPE)

# speechMessage: The text we want the AI to read.

# We must encode it as bytes (b"...") to send it to the system.

speechMessage = "Hello students. The voice pipeline is verified and operational."

speakable=speechMessage.encode('utf-8')

# communicate: This sends the text message into the running Piper process.

speechProcess.communicate(input=speakable)

Step 4: Customizing Your Voice

Piper has dozens of voices available. To see the full library, visit the Piper Voices Repository. Download the .onnx and .onnx.json files for your preferred voice, place them in the ~/voiceAssistant/piper/piper/ folder, and update the modelPath variable in your script.

Homework: The Talking Echo Bot

Your assignment is to play around with different voice models, and choose several that you like the best. Then modify the Python script so it becomes an interactive “Echo Bot.” Instead of hardcoding the message, use Python’s input() function to ask the user what to say. When the user types a sentence and presses Enter, your script should pipe that text into Piper and speak it back to you. Use a while True: loop to keep the program running so you can continue talking to your computer. Make a video of your working project, and the voices you chose. In the description in your video, make sure to leave a link back to the video above. That way users can easily click between my video lesson over to your video, and then back to the class video.

NVIDIA, NVIDIA Jetson Orin Nano

Running Headless on the NVIDIA Jetson Orin Nano on Jetpack 7.2: Run Big Local LLM’s Like a Boss

July 17, 2026 admin

Hey guys, Paul McWhorter here from toptechboy.com. Today, we are going to look at how to stop fighting your hardware and start running local AI like an absolute boss.

The NVIDIA Jetson Orin Nano is an absolute masterpiece of edge-compute hardware. But it has one major design constraint that catches almost every beginner off guard: Unified Memory. On the Orin, your CPU and your GPU share the exact same physical pool of 8GB of LPDDR5 RAM. When you boot into that pretty Ubuntu GNOME desktop, the system instantly steals over 1.5 GB of your precious VRAM just to draw a GUI you aren’t even looking at while your code is running.

In this lesson, we are going to reclaim that stolen memory, optimize our storage, and run a massive 8-billion parameter model (LLaMA 3.1 8B) smoothly on the Orin Nano by learning how to properly run a clean, headless configuration.

Step 1: Find Your Jetson’s IP Address

Before we turn off the monitor, we need to know how to talk to the Orin over the network. If you don’t know its IP address, you can’t SSH in once the screen goes dark. While you are still in the graphical terminal, run this command:

ifconfig

ifconfig

Look under your active connection interface (usually eth0 for Ethernet or wlan0 for Wi-Fi) for the inet address. It will look something like 192.168.1.15. Write this down! You will need it to remote in later.

Step 2: Disable and Remove the Default Swap File

By default, JetPack configures a slow, disk-based swap file on your NVMe drive. While swap space is great for general computing, it is an absolute performance killer for LLMs. If your model spillover starts paging to a disk-based swap file, your tokens-per-second will drop to a crawl, and the high-frequency writes will prematurely wear out your SSD.

We want our models running purely in ultra-fast LPDDR5 RAM. Let’s cleanly turn off and remove the swap file:

# 1. Turn off the active swap space
sudo swapoff -a

# 2. Delete the physical swap file from your drive
sudo rm /swapfile

# 3. Prevent it from mounting on next boot
# Open your fstab file:
sudo nano /etc/fstab

# Find the line containing '/swapfile' and add a '#' at the beginning to comment it out.
# Save and exit (Ctrl+O, Enter, Ctrl+X).

# 1. Turn off the active swap space

sudo swapoff -a

# 2. Delete the physical swap file from your drive

sudo rm /swapfile

# 3. Prevent it from mounting on next boot

# Open your fstab file:

sudo nano /etc/fstab

# Find the line containing '/swapfile' and add a '#' at the beginning to comment it out.

# Save and exit (Ctrl+O, Enter, Ctrl+X).

Step 3: Configure a Clean Boot into the Terminal

Many people will tell you to run sudo systemctl isolate multi-user.target to turn off the GUI. Do not do this! That command aggressively tears down active background services (including Ollama, network managers, and local development scripts) because it forces a state isolation.

Instead, we want to tell the Orin’s bootloader to cleanly start up in command-line mode from a fresh boot. This allows all your network drivers, background scripts, and Ollama to initialize perfectly without a display manager eating your memory:

sudo systemctl set-default multi-user.target

1	sudo systemctl set-default multi-user.target

Once you run this, restart your Orin to let the changes take effect cleanly:

sudo reboot

1	sudo reboot

How to Boot Back to the GUI (If Needed)

We are developers, which means we want to write and debug our scripts comfortably under the graphical desktop, and then deploy them headlessly. If you ever need to turn your monitor back on and return to the GNOME desktop, simply run this command over SSH:

sudo systemctl set-default graphical.target

1	sudo systemctl set-default graphical.target

Followed by a quick reboot (sudo reboot), and your desktop interface will return exactly as it was.

Step 5: The Test — Running LLaMA 3.1 8B in the GUI

To prove why this matters, let’s look at what happens when you try to force a large model to run while your monitor is plugged in and the graphical desktop is active. Open your terminal in the GUI and run:

ollama run llama3.1:8b --verbose

1	ollama run llama3.1:8b --verbose

The Result: The model will either completely crash with an “Out of Memory” (OOM) error, or it will run painfully slow, chugging out less than 2 tokens per second.

The “Why”: Where Did Your Memory Go?

An 8-billion parameter model quantized to 4-bits requires roughly 4.7 GB of static memory just to fit its weights. When you add the Context Window (KV Cache), that memory requirement quickly balloons to over 5.5 GB.

Here is exactly how your 8GB Orin Nano’s memory is divided when you run a GUI:

System State	Memory Allocation (Approximate)
OS Kernel & System Daemons	~1.2 GB
GNOME Desktop GUI (Monitor Active)	~1.6 GB
Available VRAM for AI	~5.2 GB (Not enough for 8B models + Context!)

Because the GUI steals 1.6 GB, your available memory drops below the critical threshold required to run LLaMA 3.1 8B. The moment your context grows, the system runs out of room, hits a bottleneck, or crashes.

Step 6: Reclaiming the Hardware (Headless Memory Profile)

Now let’s look at the memory profile when we boot the Orin Nano cleanly into the terminal without GDM3 starting up. If you SSH in and run free -h or check jtop, this is what you get:

System State	Memory Allocation (Approximate)
OS Kernel & System Daemons	~1.2 GB
GNOME Desktop GUI	0.0 GB (COMPLETELY RECLAIMED!)
Available VRAM for AI	~6.8 GB (Plenty of headroom for 8B models!)

By going headless, we instantly reclaimed **1.6 GB of ultra-fast VRAM**. That is the difference between night and day when deploying edge AI models.

Step 7: Connect from Windows PowerShell

Now that your Orin is booted headlessly, unplug the monitor, keyboard, and mouse. Walk back to your main Windows development machine, open up **PowerShell**, and SSH directly into the Orin over your local network using the IP address you saved in Step 1:

ssh pjm@192.168.1.15

1	ssh pjm@192.168.1.15

(Be sure to replace “pjm” with your actual Orin username and use your specific IP address!)

Step 8: Run LLaMA 3.1 8B Like a Boss

With your GUI safely dead and your memory completely optimized, run the exact same model command inside your PowerShell session:

ollama run llama3.1:8b --verbose

1	ollama run llama3.1:8b --verbose

The Payoff: Because the system now has a massive 6.8 GB of free, continuous VRAM, the model loads entirely into the Orin’s hardware engines. You will see prompt evaluations complete instantly, and the text will output at an extremely usable speed without a single memory warning or system hiccup.

That is how you cleanly manage your hardware resources, develop efficiently, and run large local LLMs on the edge like an absolute boss.

If you enjoyed this write-up, leave a comment below, subscribe to the channel, and I will see you guys in the next lesson!

🎓 Homework: Show Your Work!

Alright guys, no excuses! If you want to truly master this hardware, you cannot just sit there and watch me do it—you have to get your hands dirty. For your homework today, I want to see you running your own LLaMA 3.1 8B model headlessly on your Orin Nano. Show what tokens per second you are getting on this big modal. Create your own favorite query to show how well the model works. Show me that terminal proof and the memory savings!

Here is the plan:

Record a video of your setup successfully running the model headlessly.
Upload your video to YouTube.
In the description of your YouTube video, you must include a link back to this main tutorial video at the very top of your description.
Post a link to your homework video in the comments section on the video above, running your models like a boss.

Now, get to work! I am looking forward to seeing what you guys build.

NVIDIA

NVIDIA Jetson Orin Nano: Secret to Running Ollama on the GPU

June 18, 2026 admin

One of the biggest frustrations with the new Jetpack 7.2 release is finding out that a standard installation of Ollama—the gold standard for running local LLMs—completely ignores your powerful NVIDIA GPU and defaults to the CPU.

In this lesson, we aren’t just going to fix that; we are going to measure the “truth” behind the performance. We will use data to see exactly how much gain we get from the GPU and where the hardware starts to hit the thermal throttling wall.

The Problem: The “Canned” Installation

When you run a standard Ollama install on the Jetson Orin Nano, the system doesn’t automatically recognize the integrated GPU (iGPU). If you open your NVIDIA Power GUI (jtop), you will see your CPU cores pegged at 100% while the GPU sits idle. This leads to slow response times and a disappointing experience.

Lets start by the standard ‘Canned’ Installation. The good news is, it is very simple:

curl -fsSL https://ollama.com/install.sh | sh

1	curl -fsSL https://ollama.com/install.sh \| sh

To see exactly how your system is performing, run Ollama in verbose mode:

ollama run gemma3:1b

1	ollama run gemma3:1b

At this point you will have Ollama running a simple LLM locally on your Jetson Orin Nano. This is a huge step forward, but we now want to dig deeper and actually see how well this simple model is performing. The first thing we do is run the Jetson Power GUI, hidden behind the NVIDIA icon in upper right of the menu bar.

Pay close attention to the Prompt Eval Rate and Eval Rate (tokens per second). These are our baseline numbers.

The “Secret Sauce” Solution

To force Ollama to use the Jetson’s CUDA cores, we have to manually override the system service configuration.

sudo apt-get update && sudo apt-get install nano

1	sudo apt-get update && sudo apt-get install nano

Step 1: Install the Nano Editor

Before we can edit system files, we need a reliable text editor. If you don’t have it yet, run this command:

Step 2: Create the Service Override

We need to tell the Ollama service exactly where to look for the GPU libraries. Use nano to open the following file:

sudo nano /etc/systemd/system/ollama.service.d/override.conf

1	sudo nano /etc/systemd/system/ollama.service.d/override.conf

Step 3: Add the Configuration

Copy and paste the following block into that file. This is the “Secret Sauce” that enables the iGPU and points the system to the correct CUDA backend:

[Service]
Environment="OLLAMA_HOST=127.0.0.1:11434"
Environment="OLLAMA_CONTEXT_LENGTH=4096"
Environment="OLLAMA_IGPU_ENABLE=1"
Environment="GGML_BACKEND_PATH=/usr/local/lib/ollama/cuda_v13/libggml-cuda.so"
Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13"

[Service]

Environment="OLLAMA_HOST=127.0.0.1:11434"

Environment="OLLAMA_CONTEXT_LENGTH=4096"

Environment="OLLAMA_IGPU_ENABLE=1"

Environment="GGML_BACKEND_PATH=/usr/local/lib/ollama/cuda_v13/libggml-cuda.so"

Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13"

Note: Save the file by pressing Ctrl+O, Enter, and then Ctrl+X to exit.

Step 4: Reboot

For the changes to take effect, We will do a reboot.

Benchmarking the Results

Once you have the GPU engaged, the real work begins. In the video, we look at a side-by-side comparison of performance across different Jetson Power Modes (10W, 15W, and MaxN).

Power Level	Prompt Eval Rate (t/s)	Eval Rate (t/s)	Throttling Observed?
CPU	[Your Data]	[Your Data]	Yes/No
10W	[Your Data]	[Your Data]	Yes/No
15W	[Your Data]	[Your Data]	Yes/No
MaxN	[Your Data]	[Your Data]	Yes/No

As we discovered, moving to the GPU provides a boost, but it also increases the heat signature. Watch the full video to see the charts and understand which power level provides the best “sweet spot” for stable, long-term AI performance on your Jetson Orin Nano. This is an important first step . . . getting the heavy lifting down to the GPU. Now in future videos we will explore how to get the work done Well on the GPU.

NVIDIA

Edge AI on the NVIDIA Jetson Orin Nano: You are Running With the Big Dogs Now!

June 16, 2026 admin

Welcome back. If you are watching this, you’re ready to stop playing with toys and start building real-world AI. Today, we are looking at the NVIDIA Jetson Orin Nano. Let’s get one thing straight: this is not a Raspberry Pi.

Under the hood, you are working with an Ampere-architecture GPU featuring 1,024 CUDA cores and 32 Tensor cores. You have a 6-core ARM Cortex-A78AE v8.2 64-bit CPU. Depending on how you configure your power mode, you are looking at anywhere from 20 to 40 TOPS of AI performance. This is raw, unadulterated horsepower that can process multi-stream video pipelines in real-time. In the 15W mode, you are managing a delicate balance of thermals and throughput; in the 25W mode, you are pushing the limits of the silicon itself. But this power comes with a price. You have been playing in an amusement park, but now, you’re going skydiving. The guardrails are gone.

The Skydiving Mindset: In the Pi or Arduino world, everything is ‘turn-key.’ You follow the recipe, you get the cake. It’s safe. It’s predictable. But when you are dealing with 40 TOPS of compute, the environment is fundamentally different. There are no guardrails here. If you don’t do the work, if you don’t check your own gear, you hit the ground.

There is a fundamental shift in responsibility when you move from consumer hobbyist boards to professional embedded silicon. You aren’t just a user anymore; you are an architect. If you’re looking for a guaranteed result because you clicked a link, go back to the Pi. If you’re looking to master high-performance silicon, welcome to the deep end. We are ‘Running with the Big Dogs’ now.

The Infrastructure Tax: Let’s start with the cost of entry. If you are trying to develop on an Orin using a Virtual Machine or a dual-boot setup on your Windows gaming laptop, stop. Just stop. You are setting yourself up for a failure that has nothing to do with the board and everything to do with your infrastructure.

I’ll give you a horror story. I tried to dual-boot my main workstation to make it ‘easier’ to access the Ubuntu environment needed for the SDK Manager. I triggered a BitLocker conflict. It didn’t just break the bootloader; it effectively bricked my NVMe drive so thoroughly that I had to dump the drive, buy a replacement, and reload my entire backup image from scratch.

That is the ‘Big Dog’ tax. Professionals don’t risk their primary workstation for a development tool. You build a dedicated, stand-alone Ubuntu machine. That is the cost of entry. If you can’t commit to a clean Linux environment, you aren’t ready for this hardware. The SDK Manager requires low-level USB access and partition control that hypervisors simply cannot handle reliably. You want to play with the big silicon? You bring the right infrastructure.

The Illusion of Instructions: You’ve probably heard people complain that my instructions didn’t work. Or they get angry at NVIDIA because the latest JetPack caused a kernel panic. I want to tell you the truth: You aren’t following instructions; you’re following suggestions.

Look at JetPack 7.2. Thousands of people followed the official documentation to the letter, and for half of them, it failed. The ‘Super Mode’ didn’t show up. And in the frantic attempt to force it to appear, many of them bricked their boards. When you brick an Orin—and you will—you don’t get a ‘reset’ button. You get a terminal, a flashing USB cable, and the SDK Manager.

When you’re flying a jet, you don’t blame the manual when the engine flame-outs. You check the instrumentation. The Jetson is your instrumentation. If it says ‘Over-Current,’ you don’t get mad at the manufacturer—you analyze your power budget. You are pushing hardware to its thermal and electrical limits. You are choosing your destiny with every power-mode configuration you change. This isn’t a software update; it’s a battlefield.

The Oracle of Delphi: Now, let’s talk about the NVIDIA forums. Think of those forums as the Oracle of Delphi. You do not walk into that house and demand service. If you post, ‘I followed the instructions and it broke, what a goat rodeo, you guys released a broken OS,’ you are done. You will be ignored, and you will lose all professional credibility.

Here is the 12-Hour Rule: Before you post, you spend 12 hours of deep-dive, log-file-reading, self-inflicted pain on your own. You read the dmesg output. You check your logs in /var/log/syslog. You look at jtop and you watch the power rails. If you can’t describe exactly what is happening, you aren’t ready for help.

When you do post, you provide a reproduction script. You provide data. You treat those engineers with the respect they deserve. And when they respond? You shut up and listen. They are the pilot; you are the co-pilot. You do not touch the controls. You follow their lead, you execute their tests, and you report the results. Any frustration you express makes you look like a hobbyist who doesn’t understand the complexity of what they are touching. You are a guest in their house. Earn your stay.

Log-Driven Development: If your terminal isn’t covered in log outputs, you aren’t debugging—you’re guessing. Guessing is for hobbyists. Engineers measure. In the Pi world, you just write code and it works. On the Jetson, you have to think like an architect. Is your code saturating the memory bandwidth? Is your model actually hitting the Tensor cores? If you treat the Orin like a general-purpose PC, you are wasting the most powerful tool on your desk. You have to learn the power envelope. You have to learn the thermal limitations. You are driving a Ferrari in first gear if you don’t understand what’s happening under the hood.”

The Verdict: So, here is my promise to you. You will brick it. You will want to throw it against the wall. But the moment you decide to solve the problem instead of blaming the manufacturer, that is the exact moment you stop being a hobbyist and start being an engineer. You want to run with the Big Dogs? Then stop whining about the guardrails and start learning how to read the logs. See you in the next lesson.

So the question for you now is, are you really ready to Run with the Big Dogs? Are you ready to jump into the deep end of the pool, or do you want to return to the wading pond?

Technology Tutorials

Category Archives: NVIDIA

Local Voice Control of NVIDIA Jetson Orin Nano with STT: Getting Started with Vosk

Engineering Your Own Local Voice Assistant: No Cloud, No Compromise

The “Why” Behind the Setup

Part 1: Preparing the Environment

This is IMPORTANT!

Part 2: Solving the PipeWire Challenge

Homework: Your Gate Controller

AI on the NVIDIA Jetson Orin Nano: Adding Text-to-Speech (TTS) with Piper

Step 1: System Prep & Piper Installation

Step 2: Identify Your Audio Device

Step 3: The Python Pipeline Script

Step 4: Customizing Your Voice

Homework: The Talking Echo Bot

Running Headless on the NVIDIA Jetson Orin Nano on Jetpack 7.2: Run Big Local LLM’s Like a Boss

Step 1: Find Your Jetson’s IP Address

Step 2: Disable and Remove the Default Swap File

Step 3: Configure a Clean Boot into the Terminal

How to Boot Back to the GUI (If Needed)

Step 5: The Test — Running LLaMA 3.1 8B in the GUI

The “Why”: Where Did Your Memory Go?

Step 6: Reclaiming the Hardware (Headless Memory Profile)

Step 7: Connect from Windows PowerShell

Step 8: Run LLaMA 3.1 8B Like a Boss

🎓 Homework: Show Your Work!

NVIDIA Jetson Orin Nano: Secret to Running Ollama on the GPU

The Problem: The “Canned” Installation

The “Secret Sauce” Solution

Step 1: Install the Nano Editor

Step 2: Create the Service Override

Step 3: Add the Configuration

Step 4: Reboot

Benchmarking the Results

Edge AI on the NVIDIA Jetson Orin Nano: You are Running With the Big Dogs Now!

Making The World a Better Place One High Tech Project at a Time. Enjoy!