What does HackerNews think of serving?

A flexible, high-performance serving system for machine learning models

Language: C++

#18 in C++
#14 in Deep learning
#102 in Python
#8 in Tensorflow
Yet another TEDIOUS BATTLE: Python vs. C++/C stack.

This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".

NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp

[1] https://github.com/tensorflow/serving [2] https://github.com/pjreddie/darknet

Most likely it's a model server running something like https://github.com/tensorflow/serving and if there isn't a lot of load, the resource could kill some of its tasks. I wouldn't imagine it's sitting around pondering deep thoughts.
Who cares about this garbage if the tool isn't even open source? There are lots of ML deployment tools that are open source. I know haters will downvote my post, but it's the truth. If I can't actually fork and evaluate a tool, it is hyped up garbage to me.

Meanwhile, here is a list of open source ML deployment packages:

https://github.com/oracle/graphpipe

https://github.com/eliorc/denzel

https://github.com/tensorflow/serving

https://github.com/ucbrise/clipper

https://github.com/DLHub-Argonne/dlhub_sdk

https://github.com/kubeflow/pipelines

TensorFlow: https://www.tensorflow.org/

TensorFlow Serving: https://github.com/tensorflow/serving

ReCeption (actually they call in Inception v3. Not sure where I got the ReCeption name - though I'm sure I read it somewhere?): https://www.tensorflow.org/versions/r0.7/tutorials/image_rec...

Using a SVN on neural network extracted features: http://blog.christianperone.com/2015/08/convolutional-neural...

If you want a quick and dirty version here's some Python to create a web service that calls a Caffe based Image recognizer: https://gist.github.com/nlothian/c3519adb81b3452c1938