In this work, we address the transport of high quality voice over the Internet with a particular concern for delays. Transport of interactive audio over IP networks often suffers from packet loss and variations in the network delay (jitter). Forward Error Correction (FEC) mitigates the impact of packet loss at the expense of an increase of the end-to-end delay and the bit rate requirement of an audio source. Furthermore, adaptive playout buffer algorithms at the receiver compensate for jitter, but again this may come at the expense of additional delay. As a consequence, existing error control and playout adjustment schemes often have end-to-end delays exceeding 150 ms, which significantly impairs the perceived quality, while it would be more important to keep delay low and accept some small loss. We develop a joint playout buffer and FEC adjustment scheme for Internet Telephony that incorporates the impact of end-to-end delay on perceived audio quality. To this end, we take a utility function approach. We represent the perceived audio quality as a function of both the end-to-end delay and the distortion of the voice signal. We develop a joint rate/error/playout delay control algorithm which optimizes this measure of quality and is TCP-Friendly. It uses a channel model for both loss and delay. We validate our approach by simulation and show that (1) our scheme allows a source to increase its utility by avoiding increasing the playout delay when it is not really necessary and (2) it provides better quality than the adjustment schemes for playout and FEC that were previously published. We use this scheme in the framework of non-elevated services which allow applications to select a service class with reduced end-to-end delay at the expense of a higher loss rate. The tradeoff between delay and loss is not straightforward since audio sources may be forced to compensate the additional losses by more FEC and hence more delay. We show that the use of non-elevated services can lead to quality improvements, but that the choice of service depends on network conditions and on the importance that users attach to delay. Based on this observation, we propose an adaptive service choosing algorithm that allows audio sources to choose in real-time the service providing the highest audio quality. In addition, when used over the standard IP best effort service, an audio source should also control its rate in order to react to network congestion and to share the bandwidth in a fair way. Current congestion control mechanisms are based on packets (i.e., they aim to reduce or increase the number of packets sent per time interval to adjust to the current level of congestion in the network). However, voice is an inelastic traffic where packets are generated at regular intervals but packet size varies with the codec that is used. Therefore, standard congestion control is not directly applicable to this type of traffic. We present three alternative modifications to equation based congestion control protocols and evaluate them through mathematical analysis and network simulation.