Challenge Protocol
Overview
The challenge is divided into two stages:
- Pre-Stage: The pre-stage of the challenge serves as an open qualification round where everyone can participate. This stage is done completely in simulation, so no access to the robots is needed.
- Robot Stage: The best teams of the pre-stage are admitted to the real-robot stage where the same tasks have to be solved on the real robots. For this the teams will get remote access to our TriFinger robot cluster.
The goal of this year’s challenge is to solve dexterous manipulation tasks with offline reinforcement learning (RL) or imitation learning. The participants are provided with datasets containing dozens of hours of robotic data and can evaluate their policies locally in simulation (pre-stage) or remotely on a cluster of real TriFinger robots (robot stage).
There are two tasks of different difficulty:
- Push a cube to a target location on the ground.
- Lift a cube to match a target pose (position and orientation) in the air.
In the pre-stage, participants only get access to datasets with data collected in simulation and also evaluate their policies only in simulation. In the real robot stage, participants get access to dataset containing data collected on 6 robots.
For more detailed information on how to participate, please see the software documentation.
Join our GitHub Discussions
We are using GitHub Discussions as a means for us to make challenge-related announcements and for participants to connect and ask questions. So if anything is unclear, please go there and let us know.
We also recommend that you “watch” the discussions (via the “watch”-button of the corresponding repository) to make sure you don’t miss important announcements!
Prizes
The top three teams of the real-robot stage will receive the following prizes:
- 2500 USD
- 1500 USD
- 1000 USD
The ranking is determined by two factors:
(i) The score in the final evaluation round after the end of the real-robot stage (October 7th, 14:00 UCT).
(ii) The report (submission deadline October 14th, AoE).
Important: In order to receive prize money, winning teams are required to publish their report and source code under an open source license (see rules below).
Rules
- Any algorithmic approach may be applied that learns the behavior only from the provided data and does not make use of any hard-coded/engineered behavior. As an example, two prominent algorithmic approaches meeting this criteria are: offline reinforcement learning and imitation learning.
- It is not permitted to use data collected during evaluation rollouts or obtained from other sources.
- It is not permitted to use data provided for one task to train a policy for an other task (e.g. use simulation data for the real robot or the "expert" dataset for the "mixed" task).
- It is not permitted to filter the datasets based on the position of a sample in the dataset. However, you may filter based on the properties of a transition or an episode if you want to.
- Participants may participate alone or in teams.
- Individuals are not allowed to participate in multiple teams.
- Each team needs to nominate a contact person and provide an email address through which they can be reached.
- Cash prizes will be paid out to an account specified by the contact person of each team. It is the responsibility of the team's contact person to distribute the prize money according to their team-internal agreements.
- To be eligible to win prizes, participants agree to release their code under an OSI-approved license and to publish a report describing their method in a publicly accessible way (e.g. on arXiv).
- Participants may not alter parameters of the simulation (e.g. the robot model) for the evaluation of the pre-stage.
- The organizers reserve the right to change the rules if doing so is absolutely necessary to resolve unforeseen problems.
- The organizers reserve the right to disqualify participants who are violating the rules or engage in scientific misconduct.
Data Publication
When submitting jobs to the real robots, information like actions and observations as well as output of the user's application are recorded and stored. During the challenge, participants can only access the data of their own submissions.
After the challenge, we may publish and use this data in anonymised form (i.e. excluding user-specific information/output) for research purposes.
By submitting jobs to the robots, users agree to this use of the generated data.