What does HackerNews think of ray?

Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper | Aug 2023

[Author] OpenAI uses Ray, the open source version of Anyscale Platform, for training their models (video evidence: https://www.youtube.com/watch?v=CqiL5QQnN64).

Ray (https://github.com/ray-project/ray) is available for anyone to use. You can also use Aviary (https://github.com/ray-project/aviary) to serve any of those models yourself.

Elixir Livebook now as a desktop app | Aug 2022

Expand Context ↕

I've wondered whether it's easier to add data analyst stuff to Elixir that Python seems to have, or add features to Python that Erlang (and by extension Elixir) provides out of the box.

By what I can see, if you want multiprocessing on Python in an easier way (let's say running async), you have to use something like ray core[0], then if you want multiple machines you need redis(?). Elixir/Erlang supports this out of the box.

Explorer[1] is an interesting approach, where it uses Rust via Rustler (Elixir library to call Rust code) and uses Polars as its dataframe library. I think Rustler needs to be reworked for this usecase, as it can be slow to return data. I made initial improvements which drastically improves encoding (https://github.com/elixir-nx/explorer/pull/282 and https://github.com/elixir-nx/explorer/pull/286, tldr 20+ seconds down to 3).

[0] https://github.com/ray-project/ray [1] https://github.com/elixir-nx/explorer

Python 3.11 is 25% faster than 3.10 on average | Jul 2022

Python has actually had concurrency since about 2019: https://docs.python.org/3/library/asyncio.html. Having used it a few times, it seems fairly sane, but tbf my experience with concurrency in other languages is fairly limited.

edit: ray https://github.com/ray-project/ray is also pretty easy to use and powerful for actual parallelism

Dkeras: Distributed Keras Engine | Nov 2019

Interesting use of ray (https://github.com/ray-project/ray). I think a lot of people are sleeping on that package, as it solves many of the difficulties in parallelizing ML/DL models.

Parallel Programming with Python | Aug 2018

IMO ray[1] is the greatest thing to happen in python parallelism since the invention of sliced bread.

Also includes best currently available hyperparameter tuning framework!

[1] https://github.com/ray-project/ray

Pandas on Ray – Make Pandas faster | Mar 2018

As a person who LOVES pandas, numpy, scikit, and all things SciPy, I really wish these kinds of posts would take a few seconds to include a link, or maybe just a quick paragraph, to answer one question:

What the %$#@ is Ray?

I make a habit of doing this myself whenever I do a post like this. Sure, I was able to look up Ray from the Riselab and figure this out myself, but I wish I didn't have to.

From the Ray homepage:

Ray is a high-performance distributed execution framework targeted at large-scale machine learning and reinforcement learning applications. It achieves scalability and fault tolerance by abstracting the control state of the system in a global control store and keeping all other components stateless. It uses a shared-memory distributed object store to efficiently handle large data through shared memory, and it uses a bottom-up hierarchical scheduling architecture to achieve low-latency and high-throughput scheduling. It uses a lightweight API based on dynamic task graphs and actors to express a wide range of applications in a flexible manner.

Check out the following links!

Codebase: https://github.com/ray-project/ray Documentation: http://ray.readthedocs.io/en/latest/index.html Tutorial: https://github.com/ray-project/tutorial Blog: https://ray-project.github.io Mailing list: ray[email protected]

Pandas on Ray – Make Pandas faster | Mar 2018

The one line of code is

  import ray.dataframe as pd

They've replaced many pandas functions with an identical API that runs actions in parallel on top of Ray, a task-parallel library:

https://github.com/ray-project/ray

Unlike Dask, Ray can communicate between processes without serializing and copying data. It uses a shared-memory object store within Apache Arrow:

http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-obj...

Worker processes (scheduled by Ray's computation graph) simply map the required memory region into their address space.

Pandas on Ray – Make Pandas faster | Mar 2018

Github link: https://github.com/ray-project/ray

Ray: A Distributed Framework for Emerging AI Applications | Dec 2017

https://github.com/ray-project/ray