We present a Programming by Demonstration (PbD) framework for generically extracting the relevant features of a given task and for addressing the problem of generalizing the acquired knowledge to different contexts. We validate the architecture through a series of experiments in which a human demonstrator teaches a humanoid robot some simple manipulatory tasks. A probability based estimation of the relevance is suggested, by first projecting the joint angles, hand paths, and object-hand trajectories onto a generic latent space using Principal Component Analysis (PCA). The resulting signals were then encoded using a mixture of Gaussian/Bernoulli distributions (GMM/BMM). This provides a measure of the spatio-temporal correlations across the different modalities collected from the robot which can be used to determine a metric of the imitation performance. The trajectories are then generalized using Gaussian Mixture Regression (GMR). Finally, we analytically compute the trajectory which optimizes the imitation metric and use this to generalize the skill to different contexts and to the robot's specific bodily constraints.