Key Facts
- ✓ Z80-μLM is a character-level language model with 2-bit quantized weights.
- ✓ The entire system fits into a 40KB .COM file.
- ✓ It runs on a Z80 processor with 64KB RAM.
- ✓ The model can play a stripped-down version of 20 Questions.
- ✓ Training used quantization-aware training with straight-through estimators.
Quick Summary
A new project demonstrates the viability of conversational AI on legacy hardware. Z80-μLM is a character-level language model specifically designed to operate within the strict confines of a Z80 processor and 64KB of RAM. Unlike modern large language models that require gigabytes of memory and powerful GPUs, this model fits its entire operational stack into a compact 40KB .COM file. This allows it to run on real hardware or emulators supporting the CP/M operating system.
The model utilizes 2-bit quantized weights with values limited to {-2, -1, 0, +1}. While it lacks the capacity for general-purpose writing tasks, it is capable of playing a simplified version of 20 Questions and engaging in brief, personality-driven conversations. The achievement highlights how extreme constraints can drive innovative engineering solutions in AI development.
Technical Architecture and Constraints
Developing an AI model that runs on hardware from the late 1970s required a complete rethinking of modern deep learning techniques. The developer faced the challenge of fitting inference logic, model weights, and a chat user interface into a 40KB binary. To achieve this, the project relies on trigram hashing, a technique that is tolerant of typos but sacrifices word order. Additionally, the system uses 16-bit integer math rather than the floating-point arithmetic standard in contemporary AI.
The architecture was heavily influenced by the need to match the Z80's hardware limitations. Specifically, the developer had to account for the processor's 16-bit accumulator limits. The training process was designed to handle these constraints from the start, ensuring the model did not require post-training adjustments that might cause quantization collapse.
Training Methodology 🧠
The key to Z80-μLM's success lies in its unique training approach, known as quantization-aware training. Rather than training a standard model and compressing it later, the developer ran two forward passes in parallel during training: one using standard floating-point numbers and another using integer-quantized values. This allowed the system to score the model on how well its knowledge survived the quantization process.
The training loop actively pushed the weights toward the 2-bit grid using straight-through estimators. To prevent errors, the system applied overflow penalties that mirrored the Z80's 16-bit accumulator limits. This method ensured that by the end of training, the model had fully adapted to its target hardware constraints, eliminating the risk of post-hoc quantization collapse.
Data Generation and Capabilities
To teach the model how to play the game of 20 Questions, the developer needed a specific dataset. The project utilized the Claude API to generate this training data. A few dollars were spent on the API to create examples suitable for the stripped-down game format. This data allows the model to function as a conversational partner in a limited context.
Despite its small size, Z80-μLM is capable of maintaining the illusion of a conversation. It possesses a distinct personality and can engage in terse exchanges. However, its utility is strictly defined by its training data; it cannot generalize to tasks like email composition or complex reasoning, focusing instead on its specific conversational niche.
Conclusion
Z80-μLM represents a fascinating intersection of retro-computing and modern AI techniques. By strictly adhering to the limitations of 64KB RAM and a 40KB file size, the project proves that useful AI interactions are possible even on severely constrained hardware. The use of quantization-aware training and integer math offers a blueprint for future projects aiming to run AI on embedded systems or legacy devices. While it may not replace modern assistants, it stands as a significant technical achievement in code golf and efficient model design.