In this paper, we will investigate the performance of Raptor codes on arbitrary binary input memoryless symmetric channels (BIMSCs). In doing so, we generalize some of the results that were proved before for the erasure channel. We will generalize the stability condition to the class of Raptor codes. This generalization gives a lower bound on the fraction of output nodes of degree 2 of a Raptor code if the error probability of the belief- propagation decoder converges to zero. Using information-theoretic arguments, we will show that if a sequence of output degree distributions is to achieve the capacity of the underlying channel, then the fraction of nodes of degree 2 in these degree distributions has to converge to a certain quantity depending on the channel. For the class of erasure channels this quantity is independent of the erasure probability of the channel, but for many other classes of BIMSCs, this fraction depends on the particular channel chosen. This result has implications on the "universality" of Raptor codes for classes other than the class of erasure channels, in a sense that will be made more precise in the paper. We will also investigate the performance of specific Raptor codes which are optimized using a more exact version of the Gaussian approximation technique.