Effective information analysis generally boils down to properly identifying the structure or geometry of the data, which is often represented by a graph. In some applications, this structure may be partly determined by design constraints or pre-determined sensing arrangements, like in road transportation networks for example. In general though, the data structure is not readily available and becomes pretty difficult to define. In particular, the global smoothness assumptions, that most of the existing works adopt, are often too general and unable to properly capture localized properties of data. In this paper, we go beyond this classical data model and rather propose to represent information as a sparse combination of localized functions that live on a data structure represented by a graph. Based on this model, we focus on the problem of inferring the connectivity that best explains the data samples at different vertices of a graph that is a priori unknown. We concentrate on the case where the observed data is actually the sum of heat diffusion processes, which is a quite common model for data on networks or other irregular structures. We cast a new graph learning problem and solve it with an efficient nonconvex optimization algorithm. Experiments on both synthetic and real world data finally illustrate the benefits of the proposed graph learning framework and confirm that the data structure can be efficiently learned from data observations only. We believe that our algorithm will help solving key questions in diverse application domains such as social and biological network analysis where it is crucial to unveil proper geometry for data understanding and inference.