We are holding a competition on sample-efficient reinforcement learning using human priors. Standard methods require months to years of game time to attain human performance in complex games such as Go and StarCraft. In our competition, participants develop a system to obtain a diamond in Minecraft using only four days of training time.

The MineRL competition offeres a set of Gym environments paired with human demonstrations to provide participants with the ability to tackle the difficult Minecraft sample efficiently. This year we are introducing a new vectorized action and observation space that obscures the agent’s actions to prevent participants from using domain knowlege to solve the ObtainDiamond task!

Sample snippets of the dataset.





Top Submissions













Competition Overview

All submissions are through AIcrowd. There you can find detailed rules and as well as the leaderboard. Additionally, Preferred Networks is releasing reference RL implementations available soon! Previous baselines can be found on github.

Round 1

Participants train their agents to play Minecraft. During the round, they submit trained models for evaluation to determine leaderboard ranks. At the end of the round, participants submit source code. The models at the top of the leaderboard are re-trained (from scratch) for four days to compute the final score used for ranking. 20 participants move on to the second round, 15 from the main track and 5 from the data only track.

Round 2

Participants may submit code up to four times. Each submission is trained for four days to compute score. Final ranking is based on best submission for each participant. The top participants will present their work at a workshop at NeurIPS 2020.

The Task: Obtain Diamond in Minecraft

Minecraft is a 3D, first-person, open-world game centered around the gathering of resources and creation of structures and items. These structures and items have prerequisite tools and materials required for their creation. As a result, many items require the completion of a series of natural subtasks.

The procedurally generated world is composed of discrete blocks that allow modification. Over the course of gameplay, players change their surroundings by gathering resources and constructing structures.

In this competition, the goal is to obtain a diamond. The agent begins in a random starting location without any items, and receives rewards for obtaining items which are prerequisites for diamond.

The stages of obtaining a diamond.

Gather

Wood Create

Wood Pickaxe Mine Stone

and Create

Stone Pickaxe Mine

Iron Ore

Create

Furnace Smelt Iron

and Create

Iron Pickaxe Search Mine

Diamond

Prizes

Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements.

Team

The organizing team consists of:

William H. Guss (OpenAI and Carnegie Mellon University)

Brandon Houghton (OpenAI and Carnegie Mellon University)

Stephanie Milani (Carnegie Mellon University)

Nicholay Topin (Carnegie Mellon University)

Ruslan Salakhutdinov (Carnegie Mellon University)

John Schulman (OpenAI)

Mario Ynocente Castro (Preferred Networks)

Crissman Loomis (Preferred Networks)

Keisuke Nakata (Preferred Networks)

Shinya Shiroshita (Preferred Networks)

Avinash Ummadisingu (Preferred Networks)

Sharada Mohanty (AIcrowd)

Sam Devlin (Microsoft Research)

Noboru Sean Kuno (Microsoft Research)

Oriol Vinyals (DeepMind)

The advisory committee consists of:

Fei Fang (Carnegie Mellon University)

Zachary Chase Lipton (Carnegie Mellon University)

Manuela Veloso (Carnegie Mellon University and JPMorgan Chase)

David Ha (Google Brain)

Chelsea Finn (Google Brain and UC Berkeley)

Anca Dragan (UC Berkeley)

Sergey Levine (UC Berkeley)

Contact

If you have any questions, please feel free to contact us:

competition@minerl.io

Citation