EPFL-SCR No 8 Nov.96

The Complexity of Simulating Virtual Humans

by Daniel Thalmann - EPFL - Computer Graphics Lab

Cet article décrit l'utilisation de modèles géométriques, physiques et comportementaux pour animer des humains virtuels. Il montre comment le processus est complexe et coûteux en terme de calcul. Différentes situations de scènes virtuelles sont décrites incluant la simulation du mouvement d'un simple humain virtuel dans un environnement, l'interaction entre deux humains virtuels et l'interaction entre un humain virtuel et un humain réel. L'utilisation de techniques de parallélisme est aussi présentée.


This paper discusses the use of geometrical, physical and behavioural models for animating virtual humans. It shows how the process is complex and computationally expensive. Various situations of a virtual scene are described, including the simulation of the motion of a single virtual human in an environment, the interaction between two virtual humans and the interaction between a virtual human and a real human. The use of parallelism techniques is also presented.


Summary


Geometry, physics and behaviour

An important part of the current animation consists in simulating the real world. To achieve a simulation, the animator has two principal techniques available. The first is to use a model that creates the desired effect. A good example is the growth of a green plant. The second is used when no model is available. In this case, the animator produces by hand the real world motion to be simulated. Until recently most computer-generated films have been produced using the second approach: traditional computer animation techniques like keyframe animation, spline interpolation, etc. Automatic motion control techniques have been proposed, but they are strongly related to mechanics-based animation and do not take into account the behaviour of characters. The long-term objective of our research is the visualization of the simulation of the behaviour of virtual humans in a given environment, interactively decided by the user. The ultimate reason for developing realistic-looking synthetic actors is to be able to use them in virtually any scene that re-creates the real world. However, a virtual scene Ñ beautiful though it may be Ñ is not complete without people... Virtual people, that is. Scenes involving synthetic actors imply many complex problems we have been solving for several years [1]. With the new developments of digital and interactive television [2] and multimedia products, there is also a need for systems that provide designers with the capability for embedding real-time simulated humans in games, multimedia titles and film animations. The classification of approaches to computer animation can help us impose conceptual order in an area characterized by rapid and piece-meal innovation in many directions simultaneously, to systematically analyze the differences and similarities among these approaches, and to better understand the way in which the field has been evolving.

In this paper, we show the evolution over time of research into our animation models. The first computerized models to be defined in Computer Animation were mainly geometric. Since computer animation derives from traditional animation, the first trend was to imitate how traditional animators produce traditional films. The accent was put more on the graphic result rather than models, for example for the creation of a synthetic actor. To make the movement more realistic, physics-based models have been then introduced. The problem with these models is that all actors behave the same way. Because humans do not act solely according to physical laws, behavioural models have been introduced more recently to take into account the individuality of a character. Besides physical laws, another kind of control is necessary for simulating human motions. This behavioural approach allows the creation of truly autonomous virtual humans, able to live by themselves, corresponding to the new trend of Artificial Life. Concurrently with the evolution of motion control models, there have been major developments in the relationship between virtual humans and their environment. The emergence of techniques like A.I., object-oriented programming, new increases of computer speed and new interactive devices have made it possible to take into account the interactions of a virtual human with his environment, interactions between virtual humans, and interactions between virtual humans and real ones, a typical Virtual Reality situation. These kinds of interaction and our three categories of models (geometric, physical and behavioural) give rise to a classificatory array with four rows and three columns (see Table 1).

Table 1 - A classification of animation techniques according to motion control method and interaction of virtual humans

go to the summary

Geometric methods

Before moving a virtual human, it is essential to model his shape. Our approach is based on a multi-layered model [3]. It contains a skeleton layer, intermediate layers which simulate the physical behaviour of muscle, bone, fat tissue, etc., and a skin layer. Implicit surfaces are employed to simulate the gross behaviour of bone, muscle, and fat tissue. They are attached to the proximal joints of the skeleton, arranged in an anatomically-based approximation. The skin surfaces are automatically constructed using cross-sectional sampling. The method combines the advantages of implicit, parametric and polygonal surface representation, producing realistic body deformations. Figure 1 shows a body built using this method.

Figure 1 - Body built using metaballs and B-splines

Geometric motion control of a single virtual human

Body motion control is normally performed by animating a skeleton. Using geometric techniques, the skeleton is locally controlled and defined in terms of coordinates, angles, velocities, or accelerations. The simplest approach is motion capture, but key frame animation is still another popular technique in which the animator explicitly specifies the kinematics by supplying keyframe values whose in-between frames are interpolated by the computer. Inverse kinematics [4] is a technique coming from robotics, where the motion of links of a chain is computed from the end link trajectory. Geometric methods are efficient and may be easily performed in real-time. However, even if keyframe animation is in real-time, the design of the keys should be prepared in advance.

Geometric interaction with the environment

In geometric motion control systems, motion may be determined based on the environment using geometric operations like the intersection of the virtual human with the decor. Consider, for example, the problem of walking without collision among obstacles. One strategy is based on the Lozano-Perez algorithm [5]. The first step consists of forming a visibility graph. Vertices of this graph are composed of the vertices of the obstacles, the start point S and the goal point G. Edges are included if a straight line can be drawn joining the vertices without intersecting any obstacle. The shortest collision-free path from S to G is the shortest path in the graph from S to G. The geometric interaction between virtual humans essentially corresponds to a geometric detection of collision. The geometric interaction with the user does not offer a lot of interest.

go to the summary

Physics-based methods

Physics-based Motion Control of a Single Virtual Human

Kinematic-based systems are generally intuitive and lack dynamic integrity. The animation does not seem to respond to basic physical facts like gravity or inertia. Only the modelling of objects as they move under the influence of forces and torques can be realistic. The motion is obtained by the dynamic equations of motion relating the forces, torques, constraints and the mass distribution of objects. We use the Armstrong-Green algorithm, based on Newton-Euler formulations (see [6] for details). To get a simplified form in order to avoid the inversion of matrices larger than three by three, two hypotheses have been made. The first assumes the linear relationship between the linear acceleration of the link and the amount of angular acceleration it undergoes. The second is the linear relationship between the linear acceleration and the reactive force on its parent link. The algorithm can be divided into two opposite processes, inbound and outbound. The inbound calculates some matrices and vectors from the geometric and physical structure of the system, and propagates force and torque along each link from leaves to the root of the hierarchy. The outbound calculates kinematics quantities, the linear and angular acceleration of each link, then to get the linear and angular velocity by numerical integration for updating the whole structure. It is a recursive algorithm along the whole time of dynamic simulation. The kinematic results of the former time step are used as an initial value of the next step calculation.

Euler method is used for numerical integration in [6] at each step. In our work, we found this is not enough when simulating the motion for a long time period. The errors due to the numerical integration will accumulate from step to step. At the end, unrealistic motion will happen. To get a higher precision, we use fourth-order Runge-Kutta method for numerical integration. To obtain the angular velocity for each step, we use the inbound-outbound process four times to calculate the change rate of angular velocity. By doing so, we find the stability of algorithm is improved.

The inverse dynamics problem is to find at each joint the force and torque that generate the desired motion of the structure. Various forms of robot arm motion equations are derived mainly from Lagrange-Euler and Newton-Euler formulations. The motion equations are equivalent to each other in the sense that they describe the dynamic behaviour of the same physical robot manipulator. However the structure of these equations may differ as they are obtained for various reasons and purposes. Among many formulations, we select a kind of recursive formulation based on Newton-Euler equations for their computational efficiency. The best advantage is that the computation time is linearly proportional to the number of joints of the robot arm and independent of the robot arm configuration. It is based on Newton-Euler equations. Using inverse dynamics in a closed form with direct dynamic simulation is a key for getting the desired motion. In Figure 2, we describe the closed-form control with frames. The desired motion is represented by joint variables and ; time_step is the period to use inverse dynamics.

Figure 2 -The closed-form control with inverse dynamics

go to the summary

A Physics-based approach to the interaction with the environment

The reaction of an actor to the environment may also be considered using dynamic simulation in the processing of interactions between bodies. The interaction is first identified and then a response is generated. The most common example of interaction with the environment is the collision. As a typical example, we will consider the use of a finite element method for simulating deformations of objects and the hand of a synthetic character during a grasping task [7]. Elements are parametric volumes and the boundary object surface is a parametric surface. To show how the physical modelling of deformable objects can contribute to human animation, we present in this section an example of a contact problem dealing with the pressing of a ball. Starting with the envelopes of ball and fingers obtained using a modeller, we meshed the objects to create full 3D bodies or shell bodies depending on the application. After 3D finite element solving, the deformed envelopes are extracted from the data base used in our calculations and restored to graphics structures for visualization. In this way, visual realism is always ensured by the image synthesis system. The ball can be modeled by a shell with internal pressure, or can be fully meshed in its volume. The finger tissue is meshed in a volume around the bones. Bones are connected to the link and joint skeleton and the degrees of freedom of nodal points situated on bones are prescribed.

We use the following equation:

(1)

where is the [NB*NB] stiffness matrix, a function of material and flesh constitution s the [NB*1] load vector including the effects of the body forces, surface tractions and initial stresses, and is the [NB*1] displacement vector from the unloaded configuration. Relation (1) is valid in static equilibrium and in pseudo-static equilibrium at instants ti. Instants ti are considered as variables which represent different load intensities. In this paper, we do not deal with dynamics when loads are applied rapidly. Here, true time, inertia and damping, displacement velocity and acceleration must be added to (1). Contact modelling is not easy because the equilibrium equations (1) are obtained on the assumption that the boundary conditions stay unchanged during each time ti. Two kinds of boundary conditions exist: geometric boundary conditions corresponding to prescribed displacements, and force boundary conditions corresponding to prescribed boundary tractions. We cannot control a single degree of freedom in both position and force, no more than one can specify both voltage and current across a resistor. So, the unknown displacement will correspond to known prescribed force and conversely known prescribed displacement will correspond to unknown force. Boundary conditions can change during grasping and pressing when the prescribed forces are enough significant to strongly deform the ball. This situation creates other contact points. So, the calculations are more complicated because the number of unknown displacements and reacting forces will vary depending on the number of contact points which prescribe. We show in Figure 3 the surface contact appearance between a finger and a ball. Ball nodes are always maintained outside the finger. The overlap suppression creates reaction forces on the ball surface, which are applied to the skin. At equilibrium, these forces maintain compatible surface displacements between the two deformable bodies. Flesh of finger is deformed by contact forces acting on interface ball-finger.

Figure 3 - Surface contact appearance between a finger and a ball

go to the summary

Behavioural motion control

Behaviour-based motion control of a sSingle virtual human

The use of primitive methods like keyframe animation allows the animator to specify every detail of a motion. However, this is an extremely tedious task. Research in automatic motion control provides ways and tools for reducing this problem such as task-level command languages. But, these raise another problem: how to introduce individual differences into the generic activities which are generated automatically ? For example, in the task of walking, everybody walks more or less the same way, following more or less the same laws. It is the more or less which is difficult to model. Even the same person does not walk the same way everyday. If he is tired, or happy, or has just received some good news, the way of walking will appear somewhat different. As in traditional animation, an animator can create a lot of keyframes to simulate a tired character walking, but this is a very costly and time-consuming task. To individualize human walking, we have developed [8] a model built from experimental data based on a wide range of normalized velocities. The model is structured on two levels. At a first level, global spatial and temporal characteristics (normalized length and step duration) are generated. At the second level, a set of parameterized trajectories produce both the position of the body in space and the internal body configuration, in particular the pelvis and the legs. This is performed for a standard structure and an average configuration of the human body. The experimental context corresponding to the model is extended by allowing continuous variation of the global spatial and temporal parameters for altering the motion to try to achieve the effect desired by the animator. The model is based on a simple kinematic approach designed to preserve the intrinsic dynamic characteristics of the experimental model. But what is important is that this approach allows individualization of the walking action in an interactive real-time context in most cases.

Control of Autonomous Actors based on Virtual Sensors

Autonomous actors are able to have a behaviour, which means they must have a manner of conducting themselves. Behaviour is not only reacting to the environment but should also include the flow of information by which the environment acts on the living creature as well as the way the creature codes and uses this information. If we consider the synthetic environment as made of 3D geometric shapes, one solution to this problem is to give the actor access to the exact position of each object in the complete environment database corresponding to the synthetic world. This solution could work for a very small world, but it becomes impracticable when the number of objects increases. Moreover, this approach does not correspond to reality where people do not have knowledge about the complete environment. In a typical behavioural animation scene, the actor should perceive the objects and the other actors in the environment through visual, tactile (including haptic) and auditory sensors. These virtual sensors [9] should be used as a basis for implementing everyday human behaviour such as visually directed locomotion, handling objects, and responding to sounds and utterances. Based on the perceived information, the actor's behavioural mechanism will determine the actions he will perform. Actions may be at several degrees of complexity. An actor may simply evolve in his environment or he may interact with this environment or even communicate with other actors. In this latter case, we will consider the actor as a interactive perceptive actor.

We first introduced the concept of synthetic vision [10] as a main information channel between the environment and the virtual actor. More recently, several authors have adopted this approach for simulating the behaviour of groups [11], fishes[12], dog [13]. In [10], each pixel of the vision input has the semantic information giving the object projected on this pixel, and numerical information giving the distance to this object. So, it is easy to know, for example, that there is a table just in front at 3 meters. The synthetic actor perceives his environment from a small window in which the environment is rendered from his point of view. As he can access z-buffer values of the pixels, the color of the pixels and his own position, he can locate visible objects in his 3D environment. Noser et al. [14] also propose the use of an octree as the internal representation of the environment seen by an actor because it offers several interesting features. The octree has to represent the visual memory of an actor in a 3D environment with static and dynamic objects. Objects in this environment can grow, shrink, move or disappear. To illustrate the capabilities of the synthetic vision system, the authors have developed several examples: the actor going out of a maze, walking on sparse foot locations and playing tennis. In real life, the behaviour of people or animals is very often influenced by sounds. To recreate the synthetic audition, in a first step, we had to model a sound environment where the synthetic actor can directly access to positional and semantic sound source information of a audible sound event. Now, our virtual actors are able to hear [15]. The sound renderer takes into account the real time constraints. So it is capable to render each time increment for each microphone in real time by taking into account the final propagation speed of sound and the moving sound sources and microphones.

Figure 4 - Examples of grasping a hammer

go to the summary

Our aim is to build a behavioural model based on tactile sensory input received at the level of skin from the environment. This sensory information can be used in tasks as touching objects, pressing buttons or kicking objects. A very important, but special case is the contact between the hands and objects during the grasping process. Our approach [16] is based on sphere multi-sensors. These sensors are considered as a group of objects attached to the articulated figure. A sensor is activated for any collision with other objects or sensors. Each sphere sensor is fitted to its associated joint shape with different radii. This configuration is important in our method because when a sensor is activated in a finger, only the articulations above it stop moving, while others can still move. By doing this way, all the fingers are finally positioned naturally around the object. These multisensors have been integrated in a general methodology for automatic grasping of objects by the synthetic actors. This methodology is based on the following steps:

  1. Based on virtual sensors, typically the vision, the actor decides which object to grasp.
  2. The actor may have to walk in order to get near the object
  3. Inverse kinematics is used to find the final arm posture
  4. Based on a grasp taxonomy [17], the system decides the way the actor grasps the object (see Figure 4). For example, it decides to use a pinch when the object is too small to be grasped by more than two fingers or to use two hands when the object is large.
  5. Using multi-sensors, the fingers are adjusted in order to have an accurate grasping and to give feedback to the actor.

Virtual vision is especially useful for navigation. The navigation can be local or global. With a local navigation, the actor goes straight on to his goal and it is possible that he cannot reach it. With a global navigation, the actor first tries to find a path to his goal and if the path exists, the actor follows it until he reaches the goal position or until he detects a collision by his vision. During global navigation the actor memorizes his perceived environment by voxelizing it, based on his virtual vision. When the actor evolves in his environment, a simple walking model is not sufficient, the actor has to adapt his trajectory based on the variations of terrain by bypassing, jumping or climbing the obstacles he meets. The bypassing of obstacles consists in changing the direction and velocity of the walking of the actor. Jumping and climbing correspond to more complex motion. These actions should generate parameterized motion depending on the height and the length of the obstacle for a jump and the height and location of the feet for climbing the obstacle. These characteristics are determined by the actor from his perception.

go to the summary

Intercommunication between virtual humans

Behaviours may be also dependent on the emotional state of the actor. We also developed a model of nonverbal communication [18]. The believability of virtual actors is improved by their capability to interpret and use a nonverbal language. A nonverbal communication is concerned with postures and their indications on what people are feeling. Postures are the means to communicate and are defined by a specific position of the arms and legs and angles of the body. Usually, people don't use consciously nonverbal communication, but they instinctively understand it to a considerable extent and respond to it without any explicit reasoning. These nonverbal communication is essential to drive the interaction between people without contact (Figure 5).

go to the summary

Figure 5 - Nonverbal intercommunication

Interaction between virtual and real humans

Virtual Environments are real-time by definition and there is no sense to consider the problem of frame-by-frame actors in this case. However, we should mention the limitation in the complexity of actor models for immersive environments. Although this is possible to consider up to 8000 polygons for a model rendered on a Onyx, the use of most HMDs should drastically limit the number of polygons. For example, the procedure to represent the deformations of virtual humans has been simplified (see Figure 6).

Figure 6 - Body surface with texturing

For creating scenes involving virtual actors in the real world, we should really take into account the real world during the generation of the images by the computer. For example, consider a virtual actor passing behind a real tree: for some images of the actor, part of the body should be hidden. For more realism, the shadow of the actor should be cast on the real floor. This means that the computer-generated images are dependent on the real world. One way of solving these problems is to create virtual objects similar to the real ones and a virtual camera corresponding to the real camera. However, this correspondence is generally hard to establish. The process is tedious and requires composition stages [19]. We also developed a system for real-time insertion of synthetic actors in our lab. The system is working on an Onyx with the Sirius video card. With this system, we are able to display a synthetic real-time Marilyn in one part of our lab. This Marilyn could be together with people of the lab. She has a simplified shadow on the real floor and may be hidden by specific objects in the lab. Figure 7 shows an example.

Figure 7 - Marilyn visits our lab in real-time

The real people are of course easily aware of the actions of the Virtual Humans through VR tools like Head-mounted displays, but one major problem to solve is to make the virtual actors conscious of the behaviour of the real people. Virtual actors should sense the participants through their virtual sensors. Autonomous and perceptive actors can have different degrees of autonomy and different sensing channels to the environment. For example, the participant places an object into the Virtual Space using a CyberGlove and the autonomous virtual actor will try to grasp it and put it on a virtual table for example. The actor interacts with the environment by grasping the object and moving it. At the beginning of interactive grasping, only the hand centre sensor is active. The six palm values from the CyberGlove are used to move it towards the object. Inverse kinematics update the arm postures from hand centre movement. After the sensor is activated, the hand is close enough to the object final frame. The hand centre sensor is deactivated and multi-sensors on hand are now used, to detect sensor object collision. As another example, consider a fight between a real person and an autonomous actor. The motion of the real person is captured using a Flock of Birds. The gestures are recognized by the system and the information is transmitted to the virtual actor who is able to react to the gestures and decide which attitude to do. Figure 8 shows an example.

Figure 8 - Fight between an autonomous and an avatar

Parallelization issues

In our environment, parallelization may be used in order to speed up the animation tasks. This is especially important when the performance of the graphics workstation becomes unsatisfactory as the computational requirements for the system increase, especially with complex models and environments. The parallel machine is seen by the front-end SGI workstation as a black-box accelerator for computation-intensive animation tasks. The following tasks are performed during each frame:

  1. Motion generation,
  2. Body deformation,
  3. Collision detection,
  4. Facial animation.

The system should be flexible enough to permit the user to run above tasks on a parallel machine optionally (for example, the system should also work efficiently if the deformation module is used, without the dynamics and collision detection modules). Therefore, we have designed the task distribution such that the processors are not dedicated to a special task, but rather each task is computed by all the processors. The second requirement is to handle multiple human models. Thus, for different virtual actors we have different groups of processors defined in the beginning of the animation. For integration of the modules, we try to achieve maximum locality for different tasks for the same body part in order to decrease the volume of communication between the processors for different modules. For the forward dynamics module we exploit the Armstrong-Green algorithm [6], because it is an efficient algorithm for articulated bodies with rotational joints. The parallelization of this algorithm is based on the fact that different branches of the tree-shaped limb hierarchy of the figure can be processed in parallel. Additionally, different actors can be processed in parallel by different processor subgroups.

For the deformations module on the parallel computer, we selected as an atomic task, the generation and intersection of the rays with the list of metaballs corresponding to a body part. The computation of deformations in some body parts depends on the deformations in some other parts. For example, for computing deformations for the left shoulder, the deformations for left arm and the upper torso should have been computed. For this reason, the generation of the Bspline-Net for every body part of the virtual human (e.g. left arm) is performed concurrently by the processors, rather than simultaneously deforming several body parts of a virtual human.

go to the summary

The future of actors

The ultimate objective in creating realistic and believable synthetic actors is to build intelligent autonomous virtual humans with adaptation, perception and memory. These actors should be able to act freely and emotionally. Ideally, they should be conscious and unpredictable. But, how far are we from such a ideal situation ? Our interactive perceptive actors are able to perceive the virtual world, the people living in this world and in the real world. They may act based on their perception in an autonomous manner. Their intelligence is constrained and limited to the results obtained in the development of new methods of Artificial Intelligence. However, the representation under the form of virtual actors is a way of visually evaluating the progress. Intelligent actors are able to learn or understand very simple situations. Memory is generally defined as the power or process of reproducing or recalling what has been learned and retained especially through associative mechanisms. We have seen that emotional aspects may be important in nonverbal intercommunication. Emotions are also essential in facial animation. However, a real emotion should be considered as a state of feeling, a psychic and physical reaction subjectively experienced as strong feeling and physiologically involving changes that prepare the body for immediate vigorous action. In this case, we are far from implementing truly emotional actors. Finally, actors in the future should be adaptive, conscious and free. An actor is adaptive as long as it may survive in more or less unpredictable and dangerous environments. According to Alexander [20], a conscious actor should be aware especially of something within himself or characterized by sensation, emotion, volition, and thought. An actor may be considered as free if his future behaviour is unpredictable to somebody. From the above considerations, it is clear that future virtual actors will be extremely expensive in terms of CPU and will require more and more computer ressources and development of powerful tools to exploit parallel computing. Moreover, we hope to see many actors in a same scene (see Figure 9)

Figure 9 - Typical scene in a virtual garden

go to the summary

Acknowledgments

The author is grateful to the people who contributed to this work, in particular Pascal Bécheiraz, Ronan Boulic, Tolga Çapin, Mireille Clavien, Zhyong Huang, Shen Jianhua, Alexis Kappauf, and Hansrudi Noser. The research was supported by the Swiss National Science Research Foundation, the Federal Office for Education and Science, and is part of the Esprit Project HUMANOID-2 and the ACTS project COVEN.

go to the summary

References

go to the summary

refer to contents ©EPFL-SCR # 8 - 1996
your comments