What does HackerNews think of ray?

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Language: Python

#1 in Deployment
#10 in Java
#45 in Python
#3 in Tensorflow
[Author] OpenAI uses Ray, the open source version of Anyscale Platform, for training their models (video evidence: https://www.youtube.com/watch?v=CqiL5QQnN64).

Ray (https://github.com/ray-project/ray) is available for anyone to use. You can also use Aviary (https://github.com/ray-project/aviary) to serve any of those models yourself.

I've wondered whether it's easier to add data analyst stuff to Elixir that Python seems to have, or add features to Python that Erlang (and by extension Elixir) provides out of the box.

By what I can see, if you want multiprocessing on Python in an easier way (let's say running async), you have to use something like ray core[0], then if you want multiple machines you need redis(?). Elixir/Erlang supports this out of the box.

Explorer[1] is an interesting approach, where it uses Rust via Rustler (Elixir library to call Rust code) and uses Polars as its dataframe library. I think Rustler needs to be reworked for this usecase, as it can be slow to return data. I made initial improvements which drastically improves encoding (https://github.com/elixir-nx/explorer/pull/282 and https://github.com/elixir-nx/explorer/pull/286, tldr 20+ seconds down to 3).

[0] https://github.com/ray-project/ray [1] https://github.com/elixir-nx/explorer

Python has actually had concurrency since about 2019: https://docs.python.org/3/library/asyncio.html. Having used it a few times, it seems fairly sane, but tbf my experience with concurrency in other languages is fairly limited.

edit: ray https://github.com/ray-project/ray is also pretty easy to use and powerful for actual parallelism

Interesting use of ray (https://github.com/ray-project/ray). I think a lot of people are sleeping on that package, as it solves many of the difficulties in parallelizing ML/DL models.
IMO ray[1] is the greatest thing to happen in python parallelism since the invention of sliced bread.

Also includes best currently available hyperparameter tuning framework!

[1] https://github.com/ray-project/ray

As a person who LOVES pandas, numpy, scikit, and all things SciPy, I really wish these kinds of posts would take a few seconds to include a link, or maybe just a quick paragraph, to answer one question:

What the %$#@ is Ray?

I make a habit of doing this myself whenever I do a post like this. Sure, I was able to look up Ray from the Riselab and figure this out myself, but I wish I didn't have to.

From the Ray homepage:

Ray is a high-performance distributed execution framework targeted at large-scale machine learning and reinforcement learning applications. It achieves scalability and fault tolerance by abstracting the control state of the system in a global control store and keeping all other components stateless. It uses a shared-memory distributed object store to efficiently handle large data through shared memory, and it uses a bottom-up hierarchical scheduling architecture to achieve low-latency and high-throughput scheduling. It uses a lightweight API based on dynamic task graphs and actors to express a wide range of applications in a flexible manner.

Check out the following links!

Codebase: https://github.com/ray-project/ray Documentation: http://ray.readthedocs.io/en/latest/index.html Tutorial: https://github.com/ray-project/tutorial Blog: https://ray-project.github.io Mailing list: ray[email protected]

The one line of code is

  import ray.dataframe as pd
They've replaced many pandas functions with an identical API that runs actions in parallel on top of Ray, a task-parallel library:

https://github.com/ray-project/ray

Unlike Dask, Ray can communicate between processes without serializing and copying data. It uses a shared-memory object store within Apache Arrow:

http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-obj...

Worker processes (scheduled by Ray's computation graph) simply map the required memory region into their address space.