Pedestrian image generation in the desired pose can be used in a wide range of applications e.g., person re-identification and tracking which are among the fundamental challenges in self-driving cars. This is a hard task because it should be invariant to a set of nuisances such as body poses, illuminations, or changes in camera viewpoint. In this work, we want to study the task of synthesizing a latent canonical view of a pedestrian that will potentially be robust to the mentioned factors of nuisances. Our goal is to generate the unique frontalized view of a pedestrian observed in the wild. The generated image should visually be the same regardless of the body pose. We propose a new generative framework that goes beyond the 1 to 1 supervision commonly used. We propose to jointly reason on multiple inputs and outputs thanks to a carefully chosen loss function acting as a regularizer. Our experiments show the benefits of our framework on challenging low-resolution datasets.