We study the problem of learning constitutive features for the effective representation of graph signals, which can be considered as observations collected on different graph topologies. We propose to learn graph atoms and build graph dictionaries that provide sparse representations for classes of signals, which share common spectral characteristics but reside on the vertices of different graphs. In particular, we concentrate on graph atoms that are constructed on polynomials of the graph Laplacian. Such a design permits to abstract from the precise graph topology and to design dictionaries that can be trained and eventually used on different graphs. We cast the dictionary learning problem as an alternating optimization problem where the dictionary and the sparse representations of training signals are updated iteratively. Experimental results on synthetic graph signals representing common processes on graphs show that our dictionaries are able to capture the important components in graph signals. Further experiments on traffic data confirm the benefits of our dictionaries in the sparse approximation of signals capturing traffic bottlenecks.