In our first LLM class, we learned that we could run several of the smaller LLM models on the Jetson Orin, and then in the last class, we saw that we could make those small models run on the GPU, and we did notice faster performance as we began to move the workload to the GPU. The next challenge we found was that the larger models could give EOF errors, which were End of File errors, which usually means we have crashed due to not enough memory. So, we need to work through this more methodically, and we need to run real benchmarks. Remember, we are running with the Big Dogs, and we are finding that we face tradeoffs between running on the CPU more slowly, or switching to the GPU and facing unpredictable throttling.
Our approach today is to deal with the memory issue. We will begin by turning off the GPU modifications, and just operate on the CPU. We will address the memory issue by creating swap space, and then we will benchmark our models running on the CPU, and complete a spreadsheet. We will need to begin by removing our configuration file that pointed us to the GPU:
|
1 |
sudo rm -r /etc/systemd/system/ollama.service.d/ |
Now lets create some swap space by allowing the system to use the SD card or the NVME for memory. Note, I am booting on a NVME. If you are using a SC card, you will notice more performance degradation by using swap space.
This command will create a swapfile:
|
1 2 3 4 5 6 7 8 |
# Create a 6GB file (this may take a few seconds) sudo fallocate -l 6G /swapfile # Set the correct permissions so only the system can use it sudo chmod 600 /swapfile # Tell the system this file is for swap sudo mkswap /swapfile |
This command will activate the swapfile:
|
1 |
sudo swapon /swapfile |
This command will turnoff use of the swap space
|
1 |
sudo swapoff /swapfile |
This command shows you if the swapfile is being used
|
1 |
swapon --show |
Now lets activate the swapfile:
|
1 |
sudo swapon /swapfile |