What does HackerNews think of tiktoken?

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language: Python

Show HN: LLaMA tokenizer that runs in browser | Jun 2023

https://platform.openai.com/tokenizer or the official python library tiktoken https://github.com/openai/tiktoken or this JS port of tiktoken https://github.com/dqbd/tiktoken

List of all 100k GPT-4 tokens | Apr 2023

Expand Context ↕

Extracted via the GitHub Repo tiktoken [1]

After you try to decode a string the list on my computer shows up in /tmp/data-gym-cache/9b5ad71b2ce5302211f9c61530b329a4922fc6a4

[1] https://github.com/openai/tiktoken

What are transformer models and how do they work? | Apr 2023

Expand Context ↕

OpenAI have made their tokenizers public [1].

As someone has pointed out, with BPE you specify the vocab size, not the token size. It's a relatively simple algo, this Huggingface course does a nice job of explaining it [2]. Plus the original paper has a very readable Python example [3].

[1] https://github.com/openai/tiktoken

[2] https://huggingface.co/course/chapter6/5?fw=pt

[3] https://arxiv.org/abs/1508.07909

Why is GPT-3 15.77x more expensive for certain languages? | Apr 2023

Expand Context ↕

It's worth noting that this only for GPT-3. If you're using ChatGPT or GPT-4, both use a different tokenizer that's more robust and uses/generates about 10% fewer tokens. (unclear how well it performs for non-English languages)

You can test it offline using tiktoken: https://github.com/openai/tiktoken

OpenAI Tokenizer | Apr 2023

Hi folks – I work at OpenAI and helped build this page, awesome to see it on here! Heads up that it's a bit out of date as GPT4 has a different tokenizer than GPT3. I'd recommend checking out tiktoken (https://github.com/openai/tiktoken) or this other excellent app that a community member made (https://tiktokenizer.vercel.app)

OpenAI Tokenizer | Apr 2023

Expand Context ↕

https://github.com/openai/tiktoken

OpenAI Tokenizer | Apr 2023

Expand Context ↕

You can use https://github.com/openai/tiktoken

OpenAI Tokenizer | Apr 2023

OpenAI seems to use Tiktoken [0]. It also covers GPT-4 token encoding.

[0] https://github.com/openai/tiktoken