EPFL-Smart-Kitchen-30 Collected data
Understanding behavior requires datasets that capture humans while carrying out complex tasks. The kitchen is an excellent environment for assessing human motor and cognitive function, as many complex actions are naturally exhibited in kitchens from chopping to cleaning. Here, we introduce the EPFL-Smart-Kitchen-30 dataset, collected in a noninvasive motion capture platform inside a kitchen environment. Nine static RGB-D cameras, inertial measurement units (IMUs) and one head-mounted HoloLens~2 headset were used to capture 3D hand, body, and eye movements. The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset with synchronized exocentric, egocentric, depth, IMUs, eye gaze, body and hand kinematics spanning 29.7 hours of 16 subjects cooking four different recipes. Action sequences were densely annotated with 33.78 action segments per minute. Leveraging this multi-modal dataset, we propose four benchmarks to advance behavior understanding and modeling through
- a vision-language benchmark,
- a semantic text-to-motion generation benchmark,
- a multi-modal action recognition benchmark,
- a pose-based action segmentation benchmark.
> ⚠️ 3D pose and action annotations can be found at https://zenodo.org/records/15551913
6a3e43c6-7386-4317-bb97-2e4803700cda
EPFL
EPFL
EPFL
EPFL
Microsoft (Switzerland)
EPFL
EPFL
Microsoft ; ETH Zurich
EPFL
2025-05-28
1
CC BY
| Funder | Funding(s) | Grant NO | Grant URL |
Swiss National Science Foundation | Joint behavior and neural data modeling for naturalistic behavior | 10000950 | |
| Relation | Related work | URL/DOI |
IsSupplementTo | EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D | |
IsContinuedBy | EPFL-Smart-Kitchen-30 Annotations and Poses | |
IsVersionOf | ||