Tokenization from first principles

Byte-level BPE from first principles: what matters for speed and quality, how to implement it cleanly, and why a SuperBPE variant can lift sample efficiency.

October 7, 2025 · 16 min · 3224 words · George Grigorev