[0]: Also maintainer of https://github.com/LaurentMazare/tch-rs
I wrote a forgettable llama implementation for https://github.com/LaurentMazare/tch-rs (pytorch's torchlib rust binding). Still not ideal but at least you get the same GPU performance you would get on pytorch.
...And then I spotted Candle, a new ML framework by the same author: https://github.com/huggingface/candle
It's all in Rust, self contained, a huge undertaking, but it looks very promising. They already have a llama2 example!
https://github.com/LaurentMazare/tch-rs
I used this in the past to make a transformer-based syntax annotator. Fully in Rust, no Python required:
It checks a known registry or reads package metadata.
> I'm using https://github.com/LaurentMazare/tch-rs/ (rust bindings for pytorch's c++ api). Depending on the hardware I'm building for I either need to download a cpu build of pytorch, or a cuda build of pytorch. How would this work with riff?
Riff doesn't track which GPU you use at this time, so it can't track that which build of `pytorch` to use, sorry. It might be something we make it aware of if we have enough evidence it would be helpful.
> Is there any hope of a good cross compiling story? If so, that would be awesome (currently I just don't even try if there are significant C dependencies)!
Regarding cross compiling: We've spoken about it, and I certainly believe it would be desirable. I imagine it would come in a later version. We've taken some steps to ensure we don't rule out that feature later.
> How thoroughly are dependencies cached? Backed to the pytorch example it's a fairly big (and thus slow) download, I don't want that happening too frequently.
Dependencies are cached by Nix, so they should not become invalidated very often. If your tempdir erases we may lose the generated lock file (We've been discussing where to place these perhaps more permanently) which may force another download if there was an update.