What are tokens :/ in AI world ?

Mohit Rakhade
2 min readApr 22, 2024
chatGPT tokenizer

Here in this article, we will understand what exactly the token is !!!

Tokens in the context of OpenAI relate to the fundamental textual unit that the language model uses to process and produce text that is human-like.

Large volumes of text data are used to train OpenAI language models, such GPT-3, which then utilises this training to produce new text. Each token in the input text represents a word, punctuation mark, or other character, and the model processes the text one token at a time.

As an illustration, the line “The curious cat chases the playful mouse.”

Deconstructing this into individual tokens gives us:

  • “The”
  • “curious”
  • “cat”
  • “chases”
  • “the”
  • “playful”
  • “mouse”

During the text generation process, the language model employs an autoregression technique to forecast the subsequent token by analysing the preceding tokens in the sequence. The machine can produce extremely cohesive, naturally-sounding prose that mimics human writing by predicting one character at a time.

To put it briefly, tokens are the fundamental text building blocks that the OpenAI language model uses. They are employed in text generation and processing to represent specific words, letters, and punctuation.

You don’t ask for tokens, but I haven’t seen that mistake you are receiving before. The text that the AI model feeds in and out is called a token.

Just to understand it in better way, I will explain with some examples.

You can think of tokens as word fragments. The input is divided into tokens before the API processes the prompts. These tokens can contain trailing spaces and even subwords. they are not broken up precisely where the words begin or finish. The following are some useful guidelines for comprehending token lengths:

4 English characters make up one token.

One token is equal to ½ words.

75 words ~= 100 tokens

or

30 tokens <= 1–2 sentences

100 tokens ~= 1 paragraph

2048 tokens ~= 1,500 words

There is A Short Lesson on Tokens and Text do check it out here:

If you came till here, to understand more go to: https://platform.openai.com/tokenizer

Thank you, hope this helped you… see you soon :)

--

--

Mohit Rakhade

Diving deep into Decentralized world of Blockchian to achieve transparency, enhanced security, Increased Efficiency & Improved Traceability ✨