Learning-based algorithms have gained great popularity in communications since they often outperform even carefully engineered solutions by learning from training samples. In this paper, we show that the selection of appropriate training examples can be important for the performance of such learning-based algorithms. In particular, we consider non-linear 1-bit precoding for massive multi-user MIMO systems using the C2PO algorithm. While previous works have already shown the advantages of learning critical coefficients of this algorithm, we demonstrate that straightforward selection of training samples that follow the channel model distribution does not necessarily lead to the best result. Instead, we provide a strategy to generate training data based on the specific properties of the algorithm, which significantly improves its error floor performance.