Tag Archives: Artifiial Intelligence

NVIDIA Jetson Orin Nano: Secret to Running Ollama on the GPU

One of the biggest frustrations with the new Jetpack 7.2 release is finding out that a standard installation of Ollama—the gold standard for running local LLMs—completely ignores your powerful NVIDIA GPU and defaults to the CPU.

In this lesson, we aren’t just going to fix that; we are going to measure the “truth” behind the performance. We will use data to see exactly how much gain we get from the GPU and where the hardware starts to hit the thermal throttling wall.

The Problem: The “Canned” Installation

When you run a standard Ollama install on the Jetson Orin Nano, the system doesn’t automatically recognize the integrated GPU (iGPU). If you open your NVIDIA Power GUI (jtop), you will see your CPU cores pegged at 100% while the GPU sits idle. This leads to slow response times and a disappointing experience.

Lets start by the standard ‘Canned’ Installation. The good news is, it is very simple:

To see exactly how your system is performing, run Ollama in verbose mode:

At this point you will have Ollama running a simple LLM locally on your Jetson Orin Nano. This is a huge step forward, but we now want to dig deeper and actually see how well this simple model is performing.  The first thing we do is run the Jetson Power GUI, hidden behind the NVIDIA icon in upper right of the menu bar.

Pay close attention to the Prompt Eval Rate and Eval Rate (tokens per second). These are our baseline numbers.

The “Secret Sauce” Solution

To force Ollama to use the Jetson’s CUDA cores, we have to manually override the system service configuration.

Step 1: Install the Nano Editor

Before we can edit system files, we need a reliable text editor. If you don’t have it yet, run this command:

Step 2: Create the Service Override

We need to tell the Ollama service exactly where to look for the GPU libraries. Use nano to open the following file:

Step 3: Add the Configuration

Copy and paste the following block into that file. This is the “Secret Sauce” that enables the iGPU and points the system to the correct CUDA backend:

Note: Save the file by pressing Ctrl+OEnter, and then Ctrl+X to exit.

Step 4: Reboot

For the changes to take effect, We will do a reboot.

Benchmarking the Results

Once you have the GPU engaged, the real work begins. In the video, we look at a side-by-side comparison of performance across different Jetson Power Modes (10W, 15W, and MaxN).

Power Level Prompt Eval Rate (t/s)  Eval Rate (t/s) Throttling Observed?
CPU [Your Data] [Your Data] Yes/No
10W [Your Data] [Your Data] Yes/No
15W [Your Data] [Your Data] Yes/No
MaxN [Your Data] [Your Data] Yes/No

As we discovered, moving to the GPU provides a boost, but it also increases the heat signature. Watch the full video to see the charts and understand which power level provides the best “sweet spot” for stable, long-term AI performance on your Jetson Orin Nano. This is an important first step . . . getting the heavy lifting down to the GPU. Now in future videos we will explore how to get the work done Well on the GPU.