I'm curious to ask what's next for this series of AI at Deepmind? Is there other more challenging problems they are tackling at the moment? I read they have already master StarCraft 2 even.
Is it the case that they have stopped this series of AI and going all in on protein folding at the moment?
This blog post mentions testing on Atari and being applied to chemistry and quantum physics
https://deepmind.com/blog/article/muzero-mastering-go-chess-...
https://en.wikipedia.org/wiki/AlphaFold#SARS-CoV-2 :
> AlphaFold has been used to a predict structures of proteins of SARS-CoV-2, the causative agent of COVID-19 [...] The team acknowledged that though these protein structures might not be the subject of ongoing therapeutical research efforts, they will add to the community's understanding of the SARS-CoV-2 virus.[74] Specifically, AlphaFold 2's prediction of the structure of the ORF3a protein was very similar to the structure determined by researchers at University of California, Berkeley using cryo-electron microscopy. This specific protein is believed to assist the virus in breaking out of the host cell once it replicates. This protein is also believed to play a role in triggering the inflammatory response to the infection (... Berkeley ALS and SLAC beamlines ... S309 & Sotrovimab: https://scitechdaily.com/inescapable-covid-19-antibody-disco... )
Is there yet an open implementation of AlphaFold 2? edit: https://github.com/search?q=alphafold ... https://github.com/deepmind/alphafold
How do I reframe this problem in terms of fundamental algorithmic complexity classes (and thus the Quantum Algorithm Zoo thing that might optimize the currently fundamentally algorithmically computationally hard part of the hot loop that is the cost driver in this implementation)?
To cite in full from the MuZero blog post from December 2020: https://deepmind.com/blog/article/muzero-mastering-go-chess-... :
> Researchers have tried to tackle this major challenge in AI by using two main approaches: lookahead search or model-based planning.
> Systems that use lookahead search, such as AlphaZero, have achieved remarkable success in classic games such as checkers, chess and poker, but rely on being given knowledge of their environment’s dynamics, such as the rules of the game or an accurate simulator. This makes it difficult to apply them to messy real world problems, which are typically complex and hard to distill into simple rules.
> Model-based systems aim to address this issue by learning an accurate model of an environment’s dynamics, and then using it to plan. However, the complexity of modelling every aspect of an environment has meant these algorithms are unable to compete in visually rich domains, such as Atari. Until now, the best results on Atari are from model-free systems, such as DQN, R2D2 and Agent57. As the name suggests, model-free algorithms do not use a learned model and instead estimate what is the best action to take next.
> MuZero uses a different approach to overcome the limitations of previous approaches. Instead of trying to model the entire environment, MuZero just models aspects that are important to the agent’s decision-making process. After all, knowing an umbrella will keep you dry is more useful to know than modelling the pattern of raindrops in the air.
> Specifically, MuZero models three elements of the environment that are critical to planning:
> * The value: how good is the current position?
> * The policy: which action is the best to take?
> * The reward: how good was the last action?
> These are all learned using a deep neural network and are all that is needed for MuZero to understand what happens when it takes a certain action and to plan accordingly.
> Illustration of how Monte Carlo Tree Search can be used to plan with the MuZero neural networks. Starting at the current position in the game (schematic Go board at the top of the animation), MuZero uses the representation function (h) to map from the observation to an embedding used by the neural network (s0). Using the dynamics function (g) and the prediction function (f), MuZero can then consider possible future sequences of actions (a), and choose the best action.
> MuZero uses the experience it collects when interacting with the environment to train its neural network. This experience includes both observations and rewards from the environment, as well as the results of searches performed when deciding on the best action.
> During training, the model is unrolled alongside the collected experience, at each step predicting the previously saved information: the value function v predicts the sum of observed rewards (u), the policy estimate (p) predicts the previous search outcome (π), the reward estimate r predicts the last observed reward (u). This approach comes with another major benefit: MuZero can repeatedly use its learned model to improve its planning, rather than collecting new data from the environment. For example, in tests on the Atari suite, this variant - known as MuZero Reanalyze - used the learned model 90% of the time to re-plan what should have been done in past episodes.
FWIU, from what's going on over there:
AlphaGo => AlphaGo {Fan, Lee, Master, Zero} => AlphaGoZero => AlphaZero => MuZero
AlphaGo: https://en.wikipedia.org/wiki/AlphaGo_Zero
AlphaZero: https://en.wikipedia.org/wiki/AlphaZero
MuZero: https://en.wikipedia.org/wiki/MuZero
AlphaFold {1,2}: https://en.wikipedia.org/wiki/AlphaFold
IIRC, there is not an official implementation of e.g. AlphaZero or MuZero with e.g. openai/gym (and openai/retro) for comparing reinforcement learning algorithms? https://github.com/openai/gym
What are the benchmarks for Applied RL?
From https://news.ycombinator.com/item?id=28499001 :
> AFAIU, while there are DLTs that cost CPU, RAM, and Data storage between points in spacetime, none yet incentivize energy efficiency by varying costs depending upon whether the instructions execute on a FPGA, ASIC, CPU, GPU, TPU, or QPU? [...]
> To be 200% green - to put a 200% green footer with search-discoverable RDFa on your site - I think you need PPAs and all directly sourced clean energy.
> (Energy efficiency is very relevant to ML/AI/AGI, because while it may be the case that the dumb universal function approximator will eventually find a better solution, "just leave it on all night/month/K12+postdoc" in parallel is a very expensive proposition with no apparent oracle; and then to ethically filter solutions still costs at least one human)