I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this field's theory. Bayesian networks are a combination of two different mathematical areas: graph theory and probability theory. So, I first give the basic definition of Bayesian networks. This is followed by an elaboration of the underlying graph theory that involves the arrangements of nodes and edges in a graph. Since Bayesian networks encode one's beliefs for a system of variables, I then proceed to discuss, in general, how to update these beliefs when one or more of the variables' values are no longer unknown (i.e., you have observed their values). Learning algorithms involve a combination of learning the probability distributions along with learning the network topology. I then conclude Part I by showing how Bayesian networks can be used in various domains, such as in the time-series problem of automatic speech recognition. In Part II I then give in more detail some of the algorithms needed for working with Bayesian networks.