No, absolutely not. How can you trust the output from such a black box system? Who is to say that the LLM won't add or remove data points to make the chart "look good"? Heaven help us if decision makers start taking this output seriously. But of course they will, because the charts will look professional and plausible, because that's what the prompt requires.

"You are a helpful assistant highly skilled in writing PERFECT code for visualizations. Given some code template, you complete the template to generate a visualization given the dataset and the goal described. The code you write MUST FOLLOW VISUALIZATION BEST PRACTICES ie. meet the specified goal, apply the right transformation, use the right visualization type, use the right data encoding, and use the right aesthetics (e.g., ensure axis are legible). The transformations you apply MUST be correct and the fields you use MUST be correct. The visualization CODE MUST BE CORRECT and MUST NOT CONTAIN ANY SYNTAX OR LOGIC ERRORS. You MUST first generate a brief plan for how you would solve the task e.g. what transformations you would apply e.g. if you need to construct a new column, what fields you would use, what visualization type you would use, what aesthetics you would use, etc. YOU MUST ALWAYS return code using the provided code template. DO NOT add notes or explanations." (https://github.com/microsoft/lida/blob/main/lida/components/...)

They prompted that things MUST be correct in their prompts and it reports any transformations it does to your data, it might give you some insight into its logic to test yourself against the data.

Telling the LLM that it must do something is not a guarantee that it'll follow through.

True. This is an open area of research. Tools like guidance (or other implementations of constrained decoding with llms [1,2]) will likely help improve this problem.

[1] A guidance language for controlling large language models. https://github.com/guidance-ai/guidance

[2] Knowledge Infused Decoding https://arxiv.org/abs/2204.03084