AI in Your Pocket: Running an LLM from a USB Stick

Published February 14, 2025 0

Jobit Joseph
Author

AI in Your Pocket: Running an LLM from a USB Stick

Large language models (LLMs) like GPT, BERT, Llama, and Mistral are becoming widely used in various applications, from chatbots to coding assistants. However, running these models typically requires powerful hardware, such as high-end GPUs or dedicated AI accelerators, which can be costly and consume significant power. In a recent video, Binh Pham from the YouTube channel Build With Binh demonstrates how to make LLMs more accessible. He has built a USB stick that runs a local LLM entirely offline, using nothing more than a modified Raspberry Pi Zero W. This device functions like a bootable flash drive, allowing users to interact with an LLM by simply creating a file with a specific name. The model then generates text content automatically and writes it to the file. Instead of using traditional chat interfaces or APIs, this innovative interaction method leverages USB mass storage and Linux gadget mode to detect and respond to file changes dynamically.

To achieve this, Binh modified the llama.cpp framework to compile on the Pi Zero’s ARM1176JZF CPU, which lacks ARM’s Neon instruction support. Standard builds of llama.cpp assumes newer ARM architectures, making them incompatible with older devices. By adjusting the CMake build files, he successfully compiled LLM models on the Pi Zero, despite its limited 512MB RAM and single-core processor. Benchmarks of the models running on this tiny system reveal the trade-offs of such low-power hardware: Tiny-15M generates tokens at 223ms per token, while the larger Lamini-T5-Flan-77M takes 2.5 seconds per token, and SmolLM2-136M requires 2.2 seconds per token. These results highlight both the feasibility and limitations of running LLMs on ultra-low-power devices. While the current performance is slow compared to modern AI accelerators, it opens up possibilities for localized, privacy-focused AI solutions that do not rely on cloud connectivity.

Add New Comment

Comment *

DigiKey featured products logo

	MCX A Series Microcontrollers Arm® Cortex®-M33 general purpose MCUs with low power & intelligent peripherals
	Analog Discovery Studio (ADS) Max Analog Discovery Studio Max is a comprehensive electronics laboratory solution
	MCP16701 High-Performance PMIC Boost efficiency with MCP16701: Compact, reliable DC-DC converter for optimized power solutions
	Würth Elektronik EMC Products The EMC product range includes PCB and Cable ferrites, inductors, capacitors, and more...
	1557 Series Modern-Style IP68 Enclosures for Harsh Environments Watertight plastic enclosures with modern rounded look available in ABS and polycarbonate plastic
	HDA1500 Series 1.5 kW Programmable AC/DC Power Supplies 1.5kW programmable power supply offers 12V-400V outputs, high efficiency, multiple digital protocols
	UAM Wi-Fi Triple Band Antennas TE Connectivity's UAM Wi-Fi Triple Band Antennas offer a compact alternative for terminal antennas
	Automotive PCB Mount Relay - EP1/EP2 Series KEMET's automotive PCB-mount relays unique structure offers high performance and productivity