OCRTOC is a cloud-based benchmark for robotic grasping and manipulation. The benchmark focuses on the object rearrangement problem, specifically on table organization tasks, which is are essential capabilities for service robots to understand its environment, make long-horizon decisions and perform robust interactions with the physical world. With the OCRTOC benchmark, we aim to lower the barrier of conducting reproducible research and accelerate progress in embodied AI. With the rise of LLM and foundation models we hope participants can leverage the recent technology to make breakthrough in this challenge.
Challenge structure
This year, we are organizing OCRTOC as a two-phase challenge: Qualification and Real-world Evaluation. During the Qualification Phase, we provide a simulation environment implemented in MuJoCo for participants to develop their solutions. After testing with the provided training tasks, participants will submit their code to a Docker registry. We evaluate the solutions by running the submitted code on a set of evaluation tasks on a remote server, which participants do not have access to. The top-ranked teams will qualify for the Real-world Evaluation Phase. In the Real-world Evaluation Phase, we will use objects from the IPA-3D1K dataset to conduct real-world evaluations. We will provide the qualified teams with a dataset collected from the real world to enable Sim2Real transfer. The top 10 teams will be awarded a prize.
Challenge Timeline
Call for participation 15/05/2024
Qualification phase 15/05/2024 - 15/07/2024
Preparation for real world 15/7/2024 - 30/7/2024
Evaluation in real world 1/8/2024 - 15/8/2024
Award ceremony To be announced...
Prizes
Gold : 1 Team Prize: 1000 euro
Silver : 3 Teams Prize: 500 euro per team
Bronze : 5 Teams Prize: 200 euro per team
Language conditioned rearrangement tasks
In the language-conditioned rearrangement task, a natural language instruction that specifies the goal of the manipulation will be given (e.g., “Pick one screwdriver and place it in the top right basket”). We provide a demonstration dataset collected by an expert policy, which includes robot observations, actions, and language annotations, so that participants can use it to train their models.
Third view camera
Eye in hand camera
Pose conditioned rearrangement tasks
The robot is required to place objects presented in the scene from initial pose configurations to given target pose configurations. The overlapping region between the desired and actual objects is used to calculate the task’s success rate. No demonstration dataset is provided for this task. In the subsequent figures, the transparent objects specify the target pose configurations.
Pick objects into the basket
Pick object into a basket and move the basket to a desired position
Organize YCB tools to desired poses
Prepare dining table
Citation
“OCRTOC: A Cloud-Based Competition and Benchmark for Robotic Grasping and Manipulation”,
Ziyuan Liu, Wei Liu, Yuzhe Qin, Fanbo Xiang, Songyan Xin, Maximo A Roa, Berk Calli, Hao Su, Yu Sun, Ping Tan.
This paper is available at https://arxiv.org/pdf/2104.11446.pdf
Organizers and Contributors (alphabetical order)
Abdalla Swikir
TUM
Dong Chen
Huawei German Research center
Florian Jordan
IPA Fraunhofer Institute
Jochen Lindermayr
IPA Fraunhofer Institute
Peter So
TUM
Ping Tan
HK UST
Sami Haddadin
TUM
Yu Sun
University of South Florida
Zhen Chen
TUM
Ziyuan Liu
Huawei German Research center