Craig Trim - Articles

Abstract nested pairs

Writing

The Elegant Hack Powering Modern AI

Understanding how LLMs transform text into tokens, and why this seemingly simple process has profound implications for cost, context limits, and model behavior.

12 min read 2025

The 30-Year Journey of an Algorithm That Accidentally Learned to Read

How a 1994 data compression algorithm became the foundation of modern AI. The untold story of Byte Pair Encoding's journey from C Users Journal to GPT-4.

15 min read 2025

Why Non-English Speakers Pay More for AI

Tamil speakers pay 7x more tokens than English speakers for the same meaning. The hidden cost of tokenization and why morphology sets a compression ceiling.

10 min read 2025

SolidGoldMagikarp and the Chaos Lurking in Your LLM

Reddit usernames that break GPT. Invisible characters that bypass filters. The edge cases where tokenization fails spectacularly.

8 min read 2025

Demos

The Evolution of Tokenization

Watch how text gets tokenized across different eras of NLP, from rule-based to modern subword methods.

BPE Compression: The Algorithm

Step through Byte Pair Encoding compression in real-time. See how frequent pairs get merged into single tokens.

The Multilingual Tax Calculator

Compare token costs across 31 languages. See firsthand how Tamil costs 7x more than English for the same sentence.

How Grammar Shapes Token Cost

Visualize how language structure affects BPE compression. Analytic vs Agglutinative vs Fusional vs Polysynthetic.

Why More Data Won't Fix This

Slide corpus dominance from 1% to 100% and watch morphology set an immovable ceiling on compression.

Images

NLP challenges with words

Why NLP Engineers Drink

Morphemes illustration

At least these have spaces between them.

Subword tokenization concept

The pieces are all there. Theoretically.

Long word tokenization

Pneumonoultramicroscopicsilicovolcanoconiosis enters the chat.

Philip Gage working

He never once looked up.

BPE simplicity

Twenty years later, guess who's running GPT-4?

1994 paper

Wait. This one's from 1994.

People kneeling before insect overlord

The cost of surrender depends on your alphabet.

Globe made of puzzle pieces with one hemisphere fragmented

The pieces are all there. Some just cost more.

Ask a linguist

Ask a Linguist, Waste an Afternoon