This master thesis provides in-depth explanations of how deep learning and graph theory can be used together to perform pointwise classification in 3D point clouds obtained by combinations of geospatial images. That scene understanding problem arises in a number of practical scenarios, e.g. for governments to survey deforestation from aerial images taken by drones. After an introduction on the problem statement and the main assets of typical architectures used for images, this thesis introduces the neural network architecture we developed, and describes how to build its main elements together with the graphs. To assess its performances, our architecture based on graph convolutions was tested under two different scenarios and compared to traditional machine learning algorithms.