Musculoskeletal motor control with reinforcement learning
Animals, including humans, interact with the external environment primarily through motion. Replicating their motor control skills in artificial embodied agents is a major objective of artificial intelligence research. This thesis presents a collection of studies focused on different aspects of artificial embodied intelligence, linked by one common underlying research question: the learnability of human-level motor control policies. Using biologically realistic computational models of the human musculoskeletal system, we can study motor skill learning and adaptation in simulation with unprecedented detail and efficiency. Advanced biomechanical simulators allow us to train policies that face the same hurdles as animals and humans, namely, dealing with a complex, high-dimensional system such as the human body. Throughout this thesis, we addressed the problems of adaptation and skill acquisition. In particular, we considered which inductive biases enable policy networks to deal with variable body shapes and adapt a locomotion strategy in real time. To this end, we devised DMAP, a policy network implementing principles of biological motor control, to facilitate extracting a representation of the agent's body from sensory input. In the second part of the thesis, we shifted our focus to motor control via muscle actuators. Through curriculum learning, a framework in which an agent faces progressively more difficult tasks, we trained policies to control realistic models of the human arm and perform dexterous object manipulation. These policies conquered the first two NeurIPS MyoChallenges, demonstrating for the first time that artificial neural networks can control a model of the human arm and rotate, grasp and throw objects. Solving these complex problems required innovating over existing learning algorithms. We devised Lattice, an exploration strategy to deal with the complexity of musculoskeletal environments with tens of degrees of freedom and actuators. Pairing Lattice with reinforcement and imitation learning, we obtained policies achieving human-level object manipulation and locomotion. We used these policies to compare the number of control dimensions that are necessary to perform different tasks, finding that they exceed previous estimations and that the low-dimensional control spaces transfer poorly across tasks. Finally, we distilled all the policies trained in these works into Arnold, a single, generalist motor control policy, with the flexibility to deal with multiple parts of the human body. Overall, our work introduced new training and analysis techniques that will facilitate future studies of the human motor control system, where data from human subjects can be complemented by analysis in simulation.
EPFL_TH11218.pdf
Main Document
http://purl.org/coar/version/c_be7fb7dd8ff6fe43
openaccess
N/A
46.19 MB
Adobe PDF
7b0ee0fb9b1c51822591a84002351952