Hi, I’m George

Currently: exploring pre‑training LLM training efficiency, sample efficiency, building small coding LLMs, and finance LLM agents.

Previously I worked at Together AI running distributed training on hundreds of GPUs and building evals. At Snap I developed multimodal LLMs that reached 100M+ users, did diffusion model pre-training, built AI applications end-to-end, and contributed to research. I work across text, image, and speech.

Highlights

  • Large-scale fine-tuning and evaluation of open-source LLMs (1B–405B), with a focus on training efficiency and model quality.
  • Built distributed training platforms (60+ LLMs, 600+ GPUs) and LLM-as-a-judge evaluation pipelines (Flyte-orchestrated).
  • Long-context fine-tuning: custom sequence parallelism with flash_attn_varlen for 131k-context (LLaMA 3.1 70B) and 16k-context (LLaMA 3.1 405B).

See also: Publications

Tokenization from first principles

Byte-level BPE from first principles: what matters for speed and quality, how to implement it cleanly, and why a SuperBPE variant can lift sample efficiency.

October 7, 2025 · 16 min · 3224 words · George Grigorev