Uncertainty in deep learning has recently received a lot of attention in research. While stateof- the-art neural networks have managed to break many benchmarks in terms of accuracy, it has been shown that by applying minor perturbations to the input data, they are susceptible to fooling, yielding unreasonably high confidence scores while being wrong. While some research has gone into the design of new architectures that are probabilistic in nature, such as Bayesian Neural Networks, other researchers have tried to model uncertainty of standard architectures heuristically. This work presents a novel method to assess uncertainty in Convolutional Neural Networks, based on fitting a forests of randomized Decision Trees to the network activations before the final classification layer. Experimental results are provided for patch classification on the MNIST dataset and for semantic segmentation on satellite imagery used for land cover classification. The land cover dataset consists of overhead imagery of the city of Zurich in Switzerland taken in 2002, with corresponding manually annotated ground truth. The Density Forest confidence estimation method is compared to a number of baselines based on softmax activations and pre-softmax activations. All methods are evaluated with respect to novelty detection. The study shows that using pre-softmax activations of the Fully Connected layer provides a better overall confidence estimate than just using the softmax activations. For the MNIST dataset, softmax measures outperform pre-softmax based novelty detection measures, while in the Zurich dataset, pre-softmax based methods not only show better performance in detecting the left-out class, but they also manage to identify particular objects for which no class exists in the ground truth. Among the main explanations for the varying performance of pre-softmax measures, we find the curse of dimensionality when working with high-dimensional activation vectors and class separability issues due to partially trained networks. Future research should go into studying the influence of the activation vector dimensionality on novelty detection methods, applying them to more diverse datasets and evaluating different novelty detection measures in practical applications, such as Active Learning.