What does HackerNews think of jsonformer?

A Bulletproof Way to Generate Structured JSON from Language Models

Language: Jupyter Notebook

IMO, the main reasons are (but are definitely not limited to):

- You can fine tune these models for very specific tasks, which GPT-4 might not be as good at.

- Open source models are free. You can use them as much as you want without worrying about a $xx,xxx bill at the end of the month which makes tinkering with them easier.

- Smaller models like this can run on consumer hardware, even phones, and can run offline.

- Privacy and not having to abide by a third parties terms. You don't have to deal with "As a large language model...", especially with uncensored models.

- Tools like jsonformer https://github.com/1rgs/jsonformer are not possible with OpenAIs API.

- It's also just really cool, let's be honest.

How does this compare in terms of latency, cost, and effectiveness to jsonformer? https://github.com/1rgs/jsonformer
You're correct with interpreting how the model works wrt it returning tokens one at a time. The model returns one token, and the entire context window gets shifted right by one to for account it when generating the next one.

As for model performance at different context sizes, it's seems a bit complicated. From what I understand, even if models are tweaked (for example using the superHOT RoPE hack or sparse attention) to be able to use longer contexts, they still have to be fined tuned on input of this increased context to actually utilize it, but performance seems to degrade regardless as input length increases.

For your question about fine tuning models to respond with only "yes" or "no", I recommend looking into how the jsonformers library works: https://github.com/1rgs/jsonformer . Essentially, you still let the model generate many tokens for the next position, and only accept the ones that satisfy certain criteria (such as the token for "yes" and the token for "no".

You can do this with openAI API too, using tiktoken https://twitter.com/AAAzzam/status/1669753722828730378?t=d_W... . Be careful though as results will be different on different selections of tokens, as "YES", "Yes", "yes", etc are all different tokens to the best of my knowledge

There's a lot of things which could be done to improve this:

1) It could use the JSONformer idea [0] where we have a model of the language which determines what are the valid next tokens; we only ask it to supply a token when the language model gives us a choice, and when considering possible next tokens, we immediately ignore any which are invalid given the model. This could go beyond mere syntax to actually considering the APIs/etc which exist, so if the LLM has already generated tokens "import java.util.", then it could only generate a completion which was a public class (or subpackage) of "java.util.". Maybe something like language servers could help here.

2) Every output it generates, automatically compile and test it before showing it to the user. If compile/test fails, give it a chance to fix its mistake. If it gets stuck in a loop, or isn't getting anywhere after several attempts, fall back to next most likely output, and repeat. If after a while we still aren't getting anywhere, it can show the user its attempts (in case they give the user any idea).

[0] https://github.com/1rgs/jsonformer

You simply sample tokens starting with the allowed characters and truncate if needed. It’s pretty efficient, there’s an implementation here: https://github.com/1rgs/jsonformer
This is the best implementation I've seen, but only for Hugging Face models: https://github.com/1rgs/jsonformer
The interface makes it look simple, but under the hood it follows a similar approach to jsonformer/clownfish [1] passing control of generation back and forth between a slow LLM and relatively fast python

Let's say you're halfway through a generation of a json blob with a name field and a job field and have already generated

  {
    "name": "bob"
At this point, guidance will take over generation control from the model to generate the next text

  {
    "name": "bob",
    "job":
If the model had generated that, you'd be waiting 70 ms per token (informal benchmark on my M2 air). A comma, followed by a newline, followed by "job": is 6 tokens, or 420ms. But since guidance took over, you save all that time.

Then guidance passes control back to the model for generating the next field value.

  {
    "name": "bob",
    "job": "programmer"
programmer is 2 tokens and the closing " is 1 token, so this took 210ms to generate. Guidance then takes over again to finish the blob

  {
    "name": "bob",
    "job": "programmer"
  }
[1] https://github.com/1rgs/jsonformer https://github.com/newhouseb/clownfish Note: guidance is way more general of a tool than these

Edit: spacing

It could still trigger a false positive given that for the time being there’s no way to “prove” that the model will reply in any given way. There are some novel ideas but they require access to the raw model. [0] [1]

It can be made to, and I think I stumbled upon a core insight that makes simple format coercion reproducible without fine-tuning or logit shenanigans, so yeah, this allows you to both reduce false positives and constrain failures to false positives or to task boundaries.

There’s also RHLF-derived coercion which is hilarious. [2]

[0] https://github.com/1rgs/jsonformer

[1] https://news.ycombinator.com/item?id=35790092

[2] https://twitter.com/goodside/status/1657396491676164096