What does HackerNews think of ray?
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Ray (https://github.com/ray-project/ray) is available for anyone to use. You can also use Aviary (https://github.com/ray-project/aviary) to serve any of those models yourself.
By what I can see, if you want multiprocessing on Python in an easier way (let's say running async), you have to use something like ray core[0], then if you want multiple machines you need redis(?). Elixir/Erlang supports this out of the box.
Explorer[1] is an interesting approach, where it uses Rust via Rustler (Elixir library to call Rust code) and uses Polars as its dataframe library. I think Rustler needs to be reworked for this usecase, as it can be slow to return data. I made initial improvements which drastically improves encoding (https://github.com/elixir-nx/explorer/pull/282 and https://github.com/elixir-nx/explorer/pull/286, tldr 20+ seconds down to 3).
[0] https://github.com/ray-project/ray [1] https://github.com/elixir-nx/explorer
edit: ray https://github.com/ray-project/ray is also pretty easy to use and powerful for actual parallelism
Also includes best currently available hyperparameter tuning framework!
What the %$#@ is Ray?
I make a habit of doing this myself whenever I do a post like this. Sure, I was able to look up Ray from the Riselab and figure this out myself, but I wish I didn't have to.
From the Ray homepage:
Ray is a high-performance distributed execution framework targeted at large-scale machine learning and reinforcement learning applications. It achieves scalability and fault tolerance by abstracting the control state of the system in a global control store and keeping all other components stateless. It uses a shared-memory distributed object store to efficiently handle large data through shared memory, and it uses a bottom-up hierarchical scheduling architecture to achieve low-latency and high-throughput scheduling. It uses a lightweight API based on dynamic task graphs and actors to express a wide range of applications in a flexible manner.
Check out the following links!
Codebase: https://github.com/ray-project/ray Documentation: http://ray.readthedocs.io/en/latest/index.html Tutorial: https://github.com/ray-project/tutorial Blog: https://ray-project.github.io Mailing list: ray[email protected]
import ray.dataframe as pd
They've replaced many pandas functions with an identical API that runs actions in parallel on top of Ray, a task-parallel library:https://github.com/ray-project/ray
Unlike Dask, Ray can communicate between processes without serializing and copying data. It uses a shared-memory object store within Apache Arrow:
http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-obj...
Worker processes (scheduled by Ray's computation graph) simply map the required memory region into their address space.