Over the past few years, there have been fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. The amount of annotated data drastically increased and supervised deep discriminative models exceeded human-level performances in certain object detection tasks. The increasing availability in quantity and complexity of unlabelled data also opens up exciting possibilities for the development of unsupervised learning methods. Among the family of unsupervised methods, deep generative models find numerous applications. Moreover, as real-world applications include high dimensional data, the ability of generative models to automatically learn semantically meaningful subspaces makes their advancement an essential step toward developing more efficient algorithms. Generative Adversarial Networks (GANs) are a family of unsupervised generative algorithms that have demonstrated impressive performance for data synthesis and are now used in a wide range of computer vision tasks. Despite this success, they gained a reputation for being difficult to train, which results in a time-consuming and human-involved development process to use them. In the first part of this thesis, we focus on improving the stability and the performances of GANs. Foremost, we consider an alternative training process to the standard one, named SGAN, in which several adversarial “local” pairs of networks are trained independently so that a “global” supervising pair of networks can be trained against them. Experimental results on both toy and real-world problems demonstrate that this approach outperforms standard training in terms of better mitigating mode collapse, stability while converging and that it surprisingly, increases the convergence speed as well. To further reduce the computational footprint while maintaining the stability and performance advantages of SGAN, we focus on training a single pair of adversarial networks using variance reduced gradient. More precisely, we study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges. We address this issue with two stochastic variance-reduced gradient and extragradient optimization algorithms for GANs, named SVRG-GAN and SVRE, respectively. We observe empirically that SVRE performs similarly to a batch method on the MNIST dataset, while being computationally cheaper, and that SVRE yields more stable GAN training on standard datasets. In the second part of the thesis we present our work on people detection. People detection methods are highly sensitive to occlusions between pedestrians, and using joint visual information from multiple synchronized cameras gives the opportunity to improve detection performance. We address the problem of multi-view people occupancy map estimation using an end–to–end deep learning algorithm called DeepMCD that jointly utilizes the correlated streams of visual information. DeepMCD empirically outperformed the classical approaches by a large margin. Finally, we present a new large-scale and high-resolution dataset, named WILDTRACK. We provide an accurate joint calibration, as well as a series of benchmark results using baseline algorithms published over the recent months for multi-view detection with deep neural networks, and trajectory estimation using a non-Markovian model.