From their project page https://github.com/openlm-research/open_llama :

> For current version of OpenLLaMA models, our tokenizer is trained to merge multiple empty spaces into one before tokenization, similar to T5 tokenizer. Because of this, our tokenizer will not work with code generation tasks (e.g. HumanEval) since code involves many empty spaces. We are planning to open source long context models trained on more code data. Stay tuned.

Sounds like they're working on it.