A token is the basic text unit processed by LLMs. A token can be a word, part of a word, or punctuation. Context window limits are measured in tokens, typically around 4 characters per token in English.
What is a Token?
LLMs don't process text character by character, but in 'tokens'. A token is a text segment that the model recognizes as a unit.
Tokenization Examples
- "Hello" = 1 token
- "Optimization" = 1-2 tokens
- "AI Labs Audit" = 3-4 tokens
Why Tokens Matter
For AEO, tokens influence:
- Context limits: How much content AI considers
- Cost: API pricing is per token
- Processing: Rare words use more tokens