NVIDIA Jetson Orin Nano: How to Create and Use Swap Space for Larger Local LLM Models

In our first LLM class, we learned that we could run several of the smaller LLM models on the Jetson Orin, and then in the last class, we saw that we could make those small models run on the GPU, and we did notice faster performance as we began to move the workload to the GPU. The next challenge we found was that the larger models could give EOF errors, which were End of File errors, which usually means we have crashed due to not enough memory. So, we need to work through this more methodically, and we need to run real benchmarks. Remember, we are running with the Big Dogs, and we are finding that we face tradeoffs between running on the CPU more slowly, or switching to the GPU and facing unpredictable throttling.

Our approach today is to deal with the memory issue. We will begin by turning off the GPU modifications, and just operate on the CPU. We will address the memory issue by creating swap space, and then we will benchmark our models running on the CPU, and complete a spreadsheet. We will need to begin by removing our configuration file that pointed us to the GPU:

sudo rm -r /etc/systemd/system/ollama.service.d/

1	sudo rm -r /etc/systemd/system/ollama.service.d/

Now lets create some swap space by allowing the system to use the SD card or the NVME for memory. Note, I am booting on a NVME. If you are using a SC card, you will notice more performance degradation by using swap space.

This command will create a swapfile:

# Create a 6GB file (this may take a few seconds)
sudo fallocate -l 6G /swapfile

# Set the correct permissions so only the system can use it
sudo chmod 600 /swapfile

# Tell the system this file is for swap
sudo mkswap /swapfile

# Create a 6GB file (this may take a few seconds)

sudo fallocate -l 6G /swapfile

# Set the correct permissions so only the system can use it

sudo chmod 600 /swapfile

# Tell the system this file is for swap

sudo mkswap /swapfile

This command will activate the swapfile:

sudo swapon /swapfile

1	sudo swapon /swapfile

This command will turnoff use of the swap space

sudo swapoff /swapfile

1	sudo swapoff /swapfile

This command shows you if the swapfile is being used

swapon --show

1	swapon --show

Now lets activate the swapfile:

sudo swapon /swapfile

1	sudo swapon /swapfile

Now each of these models should run on our system, as the swapfile will prevent the EOF error. Larger models will take a hit because they are bigger, and hence run slower, and then they bigger models will take a further hit in speed because swap memory will be slower than system RAM,

Model	Model Family	Size / Parameter Count	Best Used For
gemma3:1b	Google Gemma 3	1 Billion	Ultra-fast responses, light footprint
llama3.2:1b	Meta Llama 3.2	1 Billion	High-efficiency conversational loops
phi4-mini:3.8b	Microsoft Phi-4	3.8 Billion	Heavy reasoning and coding logic
qwen3:4b	Alibaba Qwen 3	4 Billion	Structured data and multilingual logic
qwen3.5:4b	Alibaba Qwen 3.5	4 Billion	Advanced context processing
gemma3:4b	Google Gemma 3	4 Billion	Maximum analytical depth on Orin Nano
nemotron-3-nano:4b	NVIDIA Nemotron 3	4 Billion	Edge-optimized reasoning and tool-use

For this lesson, we will just be using CPU computation. This will allow us to benchmark simply between models.

Benchmarking Local LLM on NVIDIA Jetson Orin on Jetpack 7.2
Model	CPU/GPU	Power	Prompt Rate	Eval. Rate	Throttling	Correct Answer	Swap
gemma3:1b	CPU	MaxN
gemma3:1b	GPU	MaxN
gemma3:1b	GPU	25 Watts
gemma3:1b	GPU	15 Watts
llama3.2:1b	CPU	MaxN
llama3.2:1b	GPU	MaxN
llama3.2:1b	GPU	25 Watts
llama3.2:1b	GPU	15 Watts
phi4-mini:3.8b	CPU	MaxN
phi4-mini:3.8b	GPU	MaxN
phi4-mini:3.8b	GPU	25 Watts
phi4-mini:3.8b	GPU	15 Watts
qwen3:4b	CPU	MaxN
qwen3:4b	GPU	MaxN
qwen3:4b	GPU	25 Watts
qwen3:4b	GPU	15 Watts
qwen3.5:4b	CPU	MaxN
qwen3.5:4b	GPU	MaxN
qwen3.5:4b	GPU	25 Watts
qwen3.5:4b	GPU	15 Watts
gemma3:4b	CPU	MaxN
gemma3:4b	GPU	MaxN
gemma3:4b	GPU	25 Watts
gemma3:4b	GPU	15 Watts
nemotron-3-nano:4b	CPU	MaxN
nemotron-3-nano:4b	GPU	MaxN
nemotron-3-nano:4b	GPU	25 Watts
nemotron-3-nano:4b	GPU	15 Watts

Technology Tutorials

NVIDIA Jetson Orin Nano: How to Create and Use Swap Space for Larger Local LLM Models

Making The World a Better Place One High Tech Project at a Time. Enjoy!