In many signal processing, machine learning and computer vision applications, one often has to deal with high dimensional and big datasets such as images, videos, web content, etc. The data can come in various forms, such as univariate or multivariate time series, matrices or high dimensional tensors. The goal of the data mining community is to reveal the hidden linear or non-linear structures in the datasets. Over the past couple of decades matrix factorization, owing to its intrinsic association with dimensionality reduction has been adopted as one of the key methods in this context. One can either use a single linear subspace to approximate the data (the standard Principal Component Analysis (PCA) approach) or a union of low dimensional subspaces where each data class belongs to a different subspace. In many cases, however, the low dimensional data follows some additional structure. Knowledge of such structure is beneficial, as we can use it to enhance the representativity of our models by adding structured priors. A nowadays standard way to represent pairwise affinity between objects is by using graphs. The introduction of graph-based priors to enhance matrix factorization models has recently brought them back to the highest attention of the data mining community. Representation of a signal on a graph is well motivated by the emerging field of signal processing on graphs, based on notions of spectral graph theory. The underlying assumption is that high-dimensional data samples lie on or close to a smooth low-dimensional manifold. Interestingly, the underlying manifold can be represented by its discrete proxy, i.e. a graph. A primary limitation of the state-of-the-art low-rank approximation methods is that they do not generalize for the case of non-linear low-rank structures. Furthermore, the standard low-rank extraction methods for many applications, such as low-rank and sparse decomposition, are computationally cumbersome. We argue, that for many machine learning and signal processing applications involving big data, an approximate low-rank recovery suffices. Thus, in this thesis, we present solutions to the above two limitations by presenting a new framework for scalable but approximate low-rank extraction which exploits the hidden structure in the data using the notion of graphs. First, we present a novel signal model, called `Multilinear low-rank tensors on graphs (MLRTG)' which states that a tensor can be encoded as a multilinear combination of the low-frequency graph eigenvectors, where the graphs are constructed along the various modes of the tensor. Since the graph eigenvectors have the interpretation of \textit{non-linear} embedding of a dataset on the low-dimensional manifold, we propose a method called `Graph Multilinear SVD (GMLSVD)' to recover PCA based linear subspaces from these eigenvectors. Finally, we propose a plethora of highly scalable matrix and tensor based problems for low-rank extraction which implicitly or explicitly make use of the GMLSVD framework. The core idea is to replace the expensive iterative SVD operations by updating the linear subspaces from the fixed non-linear ones via low-cost operations. We present applications in low-rank and sparse decomposition and clustering of the low-rank features to evaluate all the proposed methods. Our theoretical analysis shows that the approximation error of the proposed framework depends on the spectral properties of the graph Laplacians