T

Tokens

The basic units of text that LLMs process. Roughly 1 token = 4 characters or 0.75 words in English. Both input and output are measured in tokens.

In-Depth Explanation

Tokens are the fundamental units that LLMs process. Rather than working with characters or words directly, models break text into tokens - subword units that balance vocabulary size with expressiveness.

How tokenization works:

  • Text is split into tokens using a learned vocabulary
  • Common words often become single tokens
  • Rare words split into multiple tokens
  • Punctuation and spaces are also tokens
  • Different models use different tokenizers

Token rules of thumb (English):

  • 1 token ≈ 4 characters
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words
  • 1000 words ≈ 1300 tokens

Why tokens matter:

  • Pricing: API costs are per-token
  • Limits: Context windows measured in tokens
  • Computation: Processing time scales with tokens
  • Output control: max_tokens limits generation length

Tokenization quirks:

  • "Hello" = 1 token, "Hello!" = 2 tokens
  • Numbers can be surprising (384 might be 2 tokens)
  • Non-English text often uses more tokens
  • Code tokenizes differently than prose

Business Context

Token usage directly determines API costs. A typical 1000-word document is about 1300 tokens. Monitor token usage to control expenses.

How Clever Ops Uses This

We help US businesses understand and optimize token usage. Efficient prompting and smart caching can reduce AI costs by 50% or more while maintaining quality.

Example Use Case

"The word "hamburger" is 3 tokens: "ham", "bur", "ger". The word "the" is 1 token. Understanding this helps predict costs."

Frequently Asked Questions

Category

ai ml

Need Expert Help?

Understanding is the first step. Let our experts help you implement AI solutions for your business.

Ready to Implement AI?

Understanding the terminology is just the first step. Our experts can help you implement AI solutions tailored to your business needs.

FT Fast 500 Winner|500+ Implementations|Harvard-Educated Team