It only takes 3 seconds for a single GPU to run thousands of environments and 8 million steps of simulation. Stanford has developed a super game engine

At this stage, AI agents seem to be omnipotent, playing games and imitating humans to complete various tasks, and these agents are basically trained in complex environments. Not only that, but as the learning task becomes more complex, the complexity of the simulated environment also increases, thereby increasing the cost of the simulated environment.

Even for companies and institutions with supercomputing-scale resources, training a usable agent can take days to complete.

This hinders progress in the field and reduces the practicality of training advanced AI agents. To address the high cost of environment simulation, recent research efforts have fundamentally redesigned simulators to achieve greater efficiency when training agents. These works share the idea of batch simulation, which is the simultaneous execution of many independent environments (training instances) within a single simulator engine.

In this paper, researchers from Stanford University and other institutions,** they proposed a reinforcement learning game engine called Madrona, which can run thousands of environments in parallel on a single GPU, reducing the training time of agents from hours to hours. minute**.

* Paper address:

  • Paper homepage:

Specifically, Madrona is a research game engine designed for creating learning environments that can run thousands of environment instances simultaneously on a single GPU, and at very high throughput (millions of aggregation steps per second )implement. The goal of Madrona is to make it easier for researchers to create new high-performance environments for a variety of tasks, thereby speeding up the training of AI agents by orders of magnitude.

Madrona has the following features:

  • GPU batch simulation: thousands of environments can run on a single GPU;
  • Entity Component System (ECS) architecture;
  • Easily interoperable with PyTorch.

Example Madrona environment:

As we have mentioned above, the study utilized the ECS design principles, and the specific process is as follows:

Using the Madrona framework, the researchers implemented multiple learning environments, showing two to three orders of magnitude speedups on GPUs compared to an open-source CPU baseline, and speedups compared to a strong baseline running on a 32-thread CPU. 5-33 times. In addition, the research also implemented OpenAI's "hide and seek 3D" environment in the framework, and each simulation step performed rigid body physics and ray tracing, achieving a speed of more than 1.9 million steps per second on a single GPU.

One of the authors, Kayvon Fatahalian, an associate professor of computer science at Stanford University, said that on Overcooked, a cooking game for multiple agents to play, with the help of the Madrona game engine, the time to simulate 8 million environmental steps was shortened from one hour to three seconds.

Currently, Madrona requires C++ to write game logic. Madrona only provides visualization rendering support, and while it can simulate thousands of environments simultaneously, the visualizer can only view one environment at a time.

**What are the environmental simulators based on Madrona? **

Madrona itself is not an RL environment simulator, but a game engine or framework. It makes it easier for developers to implement their own new environment simulators, achieving high performance by running batch simulations on the GPU and tightly coupling the simulation output with the learning code.

Below are some environment simulators based on Madrona.

Madrona Escape Room

Madrona Escape Room is a simple 3D environment that uses Madrona's ECS API as well as physics and rendering capabilities. In this simple task, the agent must learn to press a red button and push boxes of other colors to move through a series of rooms.

Overcooked AI

The Overcooked AI environment, a collaborative video game-based multi-agent learning environment (multiplayer collaborative cooking game), is here rewritten in a high-throughput Madrona rewrite.

Source:

Hide and Seek

In September 2019, the OpenAI agent staged a hide-and-seek offensive and defensive battle, creating its own routines and anti-routines. The "Hide and Seek" environment is reproduced here using Madrona.

Hanabi

Hanabi is an implementation of the Hanabi card game based on the Madrona game engine and a cooperative Dec-POMDP. The environment is based on DeepMind's Hanabi environment and supports part of the MAPPO implementation.

Cartpole

Cartpole is a typical RL training environment with the same dynamics as a gym implementation built on top of the Madrona game engine.

GitHub address:

Overcooked cooking game: train the best agent in a minute

Overcooked in Thousands of Kitchens: Training Top Performing Agents in Under a Minute

Stanford undergraduate Bidipta Sarkar, one of the paper's authors, wrote a blog detailing the process of training an agent to play the Overcooked cooking game. Overcooked is a popular cooking game that also serves as a benchmark for collaborative multiagent research.

In Sarkar's RL research, the high cost of simulating virtual environments has always been a major obstacle to training agents for him.

In the case of the Overcooked cooking game, approximately 8 million game steps are required to train a pair of agents that converge to a stable equilibrium strategy in the Overcooked narrow room layout (below). The open-source implementation of Overcooked is written in Python and runs at 2000 steps per second on an 8-core AMD CPU, so generating the necessary agent experience takes over 1 hour.

In contrast, performing all other operations required for training (including policy inference for all 8 million simulation steps, backpropagation for policy training) takes less than 1 minute on an NVIDIA A40 GPU. Obviously, training Overcooked agents is limited by the speed of the Overcooked environment simulator.

Considering Overcooked is a simple environment, it seems silly to struggle with simulation speed. So Sarkar tried to see if the speed of the Overcooked environment simulation could be improved, which required the use of the Madrona game engine.

Using the Madrona game engine, Sarkar gets a plug-and-play GPU-accelerated replacement of the original Overcooked Python implementation. When simulating 1000 Overcooked environments in parallel, the GPU-accelerated implementation can generate 3.5 million steps per second of experience on an A40 GPU.

As a result, the time to simulate 8 million environment steps was reduced from 1 hour to 3 seconds, enabling a policy to be trained in as little as 1 minute using an A40 GPU.

The speed of the simulator opens up new possibilities for performing extensive hyperparameter sweeps in Overcooked, especially the possibility of training multiple policies in the time previously required to train a single policy.

In the end, Sarkar realized that porting Overcooked to Madrona was a much smoother process than existing alternatives for creating GPU-accelerated environments such as PyTorch, Taichi Lang, Direct CUDA C++.

Blog details:

Reference link:

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)