What confusing pricing[1]:

> Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.

Further down, in the FAQ[2]:

> For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

> To learn more about how tokens work and estimate your usage…

> Experiment with our interactive Tokenizer tool.

And it goes on. When most questions in your FAQ are about understanding pricing—to the point you need to offer a specialised tool—perhaps consider a different model?

[1]: https://openai.com/api/pricing/

[2]: https://openai.com/api/pricing/#faq-token

Haven't read the paper, but they are probably using something like sentencepiece with sub-word splitting and then charge by the number of resulting token.

https://github.com/google/sentencepiece