What does HackerNews think of gym?

A toolkit for developing and comparing reinforcement learning algorithms.

Language: Python

A co-founder announced they disbanded their robotics team a couple years ago: https://venturebeat.com/business/openai-disbands-its-robotic...

That was the same time they depreciated OpenAI Gym: https://github.com/openai/gym

A lot depends on what you're interested in.

Some papers that are runnable on a laptop CPU (so long as you stick to small image sizes/tasks):

1) Generative Adversarial Networks (https://arxiv.org/abs/1406.2661). Good practice to have a custom training loops, different optimisers and networks etc.

2) Neural Style Transfer (https://arxiv.org/abs/1508.06576). Nice to be able to manipulate pretrained networks and intercept intermediate layers.

3) Deep Image Prior (https://arxiv.org/abs/1711.10925). Nice low-data exercise in building out an autoencoder.

4) Physics Informed Neural Networks (https://arxiv.org/abs/1711.10561). If you're interested scientific applications, this might be fun. It's good exercise in calculating higher order derivatives of neural networks and using these in loss functions.

5) Vanilla Policy Gradient (https://arxiv.org/abs/1604.06778) is the easiest reinforcement learning algorithm to implement and can be used as a black-box optimiser in a lot of settings.

6) Deep Q Learning (https://arxiv.org/abs/1312.5602) is also not too hard to implement and was the first time I had heard about DeepMind, as well as being a foundational deep reinforcement learning paper .

Open AI gym (https://github.com/openai/gym) would help get started with the latter two.

AlphaFold 2 solved the CASP protein folding problem that AFAIU e.g. Folding@home et. al have been churning at for awhile FWIU. From November 2020: https://deepmind.com/blog/article/alphafold-a-solution-to-a-...

https://en.wikipedia.org/wiki/AlphaFold#SARS-CoV-2 :

> AlphaFold has been used to a predict structures of proteins of SARS-CoV-2, the causative agent of COVID-19 [...] The team acknowledged that though these protein structures might not be the subject of ongoing therapeutical research efforts, they will add to the community's understanding of the SARS-CoV-2 virus.[74] Specifically, AlphaFold 2's prediction of the structure of the ORF3a protein was very similar to the structure determined by researchers at University of California, Berkeley using cryo-electron microscopy. This specific protein is believed to assist the virus in breaking out of the host cell once it replicates. This protein is also believed to play a role in triggering the inflammatory response to the infection (... Berkeley ALS and SLAC beamlines ... S309 & Sotrovimab: https://scitechdaily.com/inescapable-covid-19-antibody-disco... )

Is there yet an open implementation of AlphaFold 2? edit: https://github.com/search?q=alphafold ... https://github.com/deepmind/alphafold

How do I reframe this problem in terms of fundamental algorithmic complexity classes (and thus the Quantum Algorithm Zoo thing that might optimize the currently fundamentally algorithmically computationally hard part of the hot loop that is the cost driver in this implementation)?

To cite in full from the MuZero blog post from December 2020: https://deepmind.com/blog/article/muzero-mastering-go-chess-... :

> Researchers have tried to tackle this major challenge in AI by using two main approaches: lookahead search or model-based planning.

> Systems that use lookahead search, such as AlphaZero, have achieved remarkable success in classic games such as checkers, chess and poker, but rely on being given knowledge of their environment’s dynamics, such as the rules of the game or an accurate simulator. This makes it difficult to apply them to messy real world problems, which are typically complex and hard to distill into simple rules.

> Model-based systems aim to address this issue by learning an accurate model of an environment’s dynamics, and then using it to plan. However, the complexity of modelling every aspect of an environment has meant these algorithms are unable to compete in visually rich domains, such as Atari. Until now, the best results on Atari are from model-free systems, such as DQN, R2D2 and Agent57. As the name suggests, model-free algorithms do not use a learned model and instead estimate what is the best action to take next.

> MuZero uses a different approach to overcome the limitations of previous approaches. Instead of trying to model the entire environment, MuZero just models aspects that are important to the agent’s decision-making process. After all, knowing an umbrella will keep you dry is more useful to know than modelling the pattern of raindrops in the air.

> Specifically, MuZero models three elements of the environment that are critical to planning:

> * The value: how good is the current position?

> * The policy: which action is the best to take?

> * The reward: how good was the last action?

> These are all learned using a deep neural network and are all that is needed for MuZero to understand what happens when it takes a certain action and to plan accordingly.

> Illustration of how Monte Carlo Tree Search can be used to plan with the MuZero neural networks. Starting at the current position in the game (schematic Go board at the top of the animation), MuZero uses the representation function (h) to map from the observation to an embedding used by the neural network (s0). Using the dynamics function (g) and the prediction function (f), MuZero can then consider possible future sequences of actions (a), and choose the best action.

> MuZero uses the experience it collects when interacting with the environment to train its neural network. This experience includes both observations and rewards from the environment, as well as the results of searches performed when deciding on the best action.

> During training, the model is unrolled alongside the collected experience, at each step predicting the previously saved information: the value function v predicts the sum of observed rewards (u), the policy estimate (p) predicts the previous search outcome (π), the reward estimate r predicts the last observed reward (u). This approach comes with another major benefit: MuZero can repeatedly use its learned model to improve its planning, rather than collecting new data from the environment. For example, in tests on the Atari suite, this variant - known as MuZero Reanalyze - used the learned model 90% of the time to re-plan what should have been done in past episodes.

FWIU, from what's going on over there:

AlphaGo => AlphaGo {Fan, Lee, Master, Zero} => AlphaGoZero => AlphaZero => MuZero

AlphaGo: https://en.wikipedia.org/wiki/AlphaGo_Zero

AlphaZero: https://en.wikipedia.org/wiki/AlphaZero

MuZero: https://en.wikipedia.org/wiki/MuZero

AlphaFold {1,2}: https://en.wikipedia.org/wiki/AlphaFold

IIRC, there is not an official implementation of e.g. AlphaZero or MuZero with e.g. openai/gym (and openai/retro) for comparing reinforcement learning algorithms? https://github.com/openai/gym

What are the benchmarks for Applied RL?

From https://news.ycombinator.com/item?id=28499001 :

> AFAIU, while there are DLTs that cost CPU, RAM, and Data storage between points in spacetime, none yet incentivize energy efficiency by varying costs depending upon whether the instructions execute on a FPGA, ASIC, CPU, GPU, TPU, or QPU? [...]

> To be 200% green - to put a 200% green footer with search-discoverable RDFa on your site - I think you need PPAs and all directly sourced clean energy.

> (Energy efficiency is very relevant to ML/AI/AGI, because while it may be the case that the dumb universal function approximator will eventually find a better solution, "just leave it on all night/month/K12+postdoc" in parallel is a very expensive proposition with no apparent oracle; and then to ethically filter solutions still costs at least one human)

To be fair, all of their useful tools have been deprecated or in 'maintenance mode'. See https://github.com/openai/gym
The hard part is probably writing a program to encode the rules of the game. Then, you can figure out how to represent the state (i.e. the board and the player's cards/resources etc) as a matrix or array (i like how this is done in NLP, they tend to represent each character/word/token as a 1-hot vector and then reduce the dimensionality of these (normally really sparse) inputs.

OpenAI's gym is probably a good place to start, as you can crib how they do it for a whole bunch of games. https://github.com/openai/gym