This is very exciting! (I had suspected it would slip to 114)

WebGPU implementations are still pretty immature, but certainly enough to get started with. I've been implementing a Rust + WebGPU ML runtime for the past few months and have enjoyed writing WGSL.

I recently got a 250M parameter LLM running in the browser without much optimisation and it performs pretty well! (https://twitter.com/fleetwood___/status/1638469392794091520)

That said, matmuls are still pretty handicapped in the browser (especially considering the bounds checking enforced in the browser). From my benchmarking I've struggled to hit 50% of theoretical FLOPS, which is cut down to 30% when the bounds checking comes in. (Benchmarks here: https://github.com/FL33TW00D/wgpu-mm)

I look forward to accessing shader cores as they mentioned in the post.

Looking forward to your WebGPU ML runtime! Also, why not contribute back to WONNX? (https://github.com/webonnx/wonnx)