does mlc work for vision models? for example, the doc mentions --max-seq-len MAX_ALLOWED_SEQUENCE_LENGTH as a command line option. This seems to imply that it only accepts language models?

Also, it doesn't seem to say anything about the input model's format? pytorch weights? onnx?

Yup, here's their web stable diffusion repo: https://github.com/mlc-ai/web-stable-diffusion

The input is a model (weights + runtime lib) compiled via the mlc-llm project: https://mlc.ai/mlc-llm/docs/compilation/compile_models.html