State-of-the-art data analysis tools have to deal with high-dimensional data. Fortunately, the inherent dimensionality of data is often much smaller, as it has an internal structure limiting its degrees of freedom. In most cases, this structure can be approximated using a graph, i.e, a set of nodes connected by edges. Based on this idea, graphs have been largely used in semi-supervised and unsupervised learning. The canonical assumption when incorporating a graph prior is to assume that the signal is smooth with respect to the graph, i.e, that connected pieces of data have similar values. Nevertheless, this hypothesis is insufficient to characterize more complex relations between the data and its structure, the graph. To this end, the field of Graph Signal Processing (GSP) extends the graph smoothness hypothesis to a more general assumption. Its key idea is to think of the signal as a sum of harmonic modes, which are obtained from the eigen-decomposition of the Laplacian, an operator representing the second order derivative. Thus, the essence of GSP resides in the Fourier transform defined by the projection on the Laplacian eigenvectors basis, which allows us to filter graph signals. However, GSP suffers from a counterintuitive irregularity. Contrarily to the classical Fourier basis, the graph Laplacian eigenvectors are not spreading uniformly on the vertex set and can be highly concentrated in a region of the graph. This non-uniformity results in a lack of any intuitive translation operator and consequently poses challenges to generalize some of the classical signal processing theory, e.g., stationarity, uncertainty principles, or sampling. In this thesis, we answer these difficulties by accepting that each node is unique and should be treated specially according to its surrounding structure. This is achieved using an operator called localization that shifts a kernel defined in the spectral domain around a specific node while adapting to the local graph structure. In the first part of the thesis, we focus on harmonic analysis in GSP. We start by making use of the norm of localized kernels to capture the spectral local features of the graph vertices. To illustrate the relevance of this analysis we then introduce structural clustering, an algorithm that groups nodes according to the role they play in the graph. Then, we bound the vertex-frequency concentration of graph filter banks atoms. Because of the irregular structure of graphs, we construct a local uncertainty principle that is, again, driven by the norm of a localized kernel. In the second part of the thesis, we tackle GSP problems from a machine learning perspective. We use the localization operator to extend the notion of stationarity to graph signals. It consists of a graph signal probabilistic framework allowing us to optimally solve inverse problems. Finally, we show how the graph total variation can be used in semi-supervised learning and prove its consistency with respect to the underlying graph manifold.