Video surveillance is currently undergoing a rapid growth. However, while thousands of cameras are being installed in public places all over the world, computer programs that could reliably detect and track people in order to analyze their behavior are not yet operational. Challenges are numerous, ranging from low image quality, suboptimal scene lighting, changing appearances of pedestrians, occlusions with environment and between people, complex interacting trajectories in crowds, etc. In this thesis, we propose a complete approach for detecting and tracking an unknown number of interacting people from multiple cameras located at eye level. Our system works reliably in spite of significant occlusions and delivers metrically accurate trajectories for each tracked individual. Furthermore, we develop a method for representing the most common types of motion in a specific environment and learning them automatically from image data. We demonstrate that a generative model for detection can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. We then advocate that multi-people tracking can be achieved by detecting people in individual frames and then linking detections across frames. We formulate the linking step as a problem of finding the most probable state of a hidden Markov process given the set of images and frame-independent detections. We first propose to solve this problem by optimizing trajectories independently with Dynamic Programming. In a second step, we reformulate the problem as a constrained flow optimization resulting in a convex problem that can be solved using standard Linear Programming techniques and is far simpler formally and algorithmically than existing techniques. We show that the particular structure of this framework lets us solve it equivalently using the k-shortest paths algorithm, which leads to a much faster optimization. Finally, we introduce a novel behavioral model to describe pedestrians motions, which is able to capture sophisticated motion patterns resulting from the mixture of different categories of random trajectories. Due to its simplicity, this model can be learned from video sequences in a totally unsupervised manner through an Expectation-Maximization procedure. We show that this behavior model can be used to make tracking systems more robust in ambiguous situations. Moreover, we demonstrate its ability to characterize and detect atypical individual motions.