What does HackerNews think of llama2.c?

Tiny Language Models Come of Age | Oct 2023

Tiny Stories trained models and alternative implementation based on llama 2: https://github.com/karpathy/llama2.c

WebLLM: Llama2 in the Browser | Aug 2023

Related. I built karpathy’s llama2.c (https://github.com/karpathy/llama2.c) without modifications to WASM and run it in the browser. It was a fun exercise to directly compare native vs. Web perf. Getting 80% of native performance on my M1 Macbook Air and haven’t spent anytime optimizing the WASM side.

Demo: https://diegomarcos.com/llama2.c-web/

Code: https://github.com/dmarcos/llama2.c-web

Lfortran: Modern interactive LLVM-based Fortran compiler | Aug 2023

Would be cool for there to be a `llama2.f`, like https://github.com/karpathy/llama2.c, to demo its capabilities

Llama2.c L2E LLM – Multi OS Binary and Unikernel Release | Aug 2023

Expand Context ↕

This is a fork of https://github.com/karpathy/llama2.c

karpathy's llama2.c is like llama.cpp but it is written in C and the python training code is available in that same repo. llama2.c's goal is to be a elegant single file C implementation of the inference and an elegant python implementation for training.

His goal is for people to understand how llama 2 and LLM's work, so he keeps it simple and sweet. As the project progresses, so will features and performance improvements added.

Currently it can infer baby (small) Story models trained by Karpathy at a fast pace. It can also infer Meta LLAMA 2 7b models, but at a very slow rate such as 1 token per second.

So currently this can be used for learning or as a tech preview.

Our friendly fork tries to make it portable, performant and more usable (bells and whistles) over time. Since we mirror upstream closely, the inference capabilities of our fork is similar but slightly faster if compiled with acceleration. What we try to do different is that we try to make this bootable (not there yet) and portable. Right now you can get binary portablity - use the same run.com on any x86_64 machine running on any OS, it will work (possible due to cosmopolitan toolchain). The other part that works is unikernels - boot this as unikernel in VM's (possible due unikraft unikernel & toolchain).

See our fork currently as a release early and release often toy tech demo. We plan to build it out into a useful product.

Beginner's Guide to Llama Models | Aug 2023

Expand Context ↕

I really enjoyed Anrej Kaparthy's llama2.c project (https://github.com/karpathy/llama2.c), which runs through creating and running a miniature Llama2 architecture model from scratch.

How to scale LLMs better with an alternative to transformers | Jul 2023

Expand Context ↕

Have you seen the recent work at TinyStories: - https://arxiv.org/abs/2305.07759

It got some nice attention here: - https://github.com/karpathy/llama2.c

I think there may be some applications in this limited space that are worth looking into. You won’t replicate GPT-anything but it may be possible to solve some nice problems very much more efficiently that one would expect at first.

A simple guide to fine-tuning Llama 2 | Jul 2023

Expand Context ↕

There was a post yesterday about a 500 line single-file C implmenetation of llama2 with no dependencies. The llama2 architecture is hard coded. It shouldn't be too hard to port to python.

Found the repo, couldn't easily find the HN thread.

https://github.com/karpathy/llama2.c