We propose a data-driven artificial viscosity model for shock capturing in discontinuous Galerkin methods. The proposed model trains a multi-layer feedforward network to map from the element-wise solution to a smoothness indicator, based on which the artificial viscosity is computed. The data set for the training of the network is obtained using canonical functions. The compactness of the data set, which is critical to the success of training the network, is ensured by normalization and the adjustment of the range of the smoothness indicator. The network is able to recover the expected smoothness much more reliably than the original averaged modal decay model. Several smooth and non-smooth test cases are considered to investigate the performance of this approach. Convergence tests show that the proposed model recovers the accuracy of the corresponding linear schemes for smooth regions. For non-smooth flows, the model is observed to suppress spurious oscillations well.