This paper presents an architecture for solving generically the problem of extracting the relevant features of a given task in a programming by demonstration framework and the problem of generalizing the acquired knowledge to various contexts. We validate the architecture in a series of experiments, where a human demonstrator teaches a humanoid robot simple manipulatory tasks. Extracting the relevant features of the task is solved in a two-step process of dimensionality reduction. First, the combined joint angles and hand path motions are projected into a generic latent space, composed of a mixture of Gaussians (GMM) spreading across the spatial dimensions of the motion. Second, the temporal variation of the latent representation of the motion is encoded in a Hidden Markov Model (HMM). This two- step probabilistic encoding provides a measure of the spatio-temporal correlations across the different modalities collected by the robot, which determines a metric of imitation performance. A generalization of the demonstrated trajectories is then performed using Gaussians Mixture Regression (GMR). Finally, to generalize skills across contexts, we compute formally the trajectory that optimizes the metric, given the new context and the robot's specific body constraints.