Highly data-centric AI workloads require that new computing paradigms be adopted because the performance of traditional CPU- and GPU-based systems are limited by data access and transfer. Training deep neural networks with millions of tunable parameters takes days or even weeks using relatively powerful heterogeneous systems, and it consumes more than hundreds of kilowatts of power. In-memory computing with memristive devices is a very promising avenue to accelerate deep learning because computations take place within the memory itself, eliminating the need to move the data around. The synaptic weights can be represented by the analog conductance states of memristive devices organized in crossbar arrays. The computationally expensive operations associated with deep learning training can be performed in place by exploiting the physical attributes and state dynamics of memristive devices and circuit laws. Memristive cores are also of particular interest due to the non-volatility, scalability, CMOS-compatibility and fast access time of the constituent devices. In addition, the multi-level storage capability of certain memristive technologies is especially attractive for increasing the information storage capacity of such cores. Large-scale demonstrations that combine memristive synapses with digital or analog CMOS circuitry indicate the potential of in-memory computing to accelerate deep learning. However, these implementations are highly vulnerable to non-ideal memristive device behavior. In particular, the limited weight representation capability, intra-device and array-level variability and temporal variations of conductance states pose significant challenges to achieving training accuracies that are comparable to conventional von Neumann implementations. Design solutions that can address these non-idealities without introducing significant implementation complexities will be critical for future memristive systems. This thesis proposes a novel synaptic architecture that can overcome a multitude of the aforementioned device non-idealities. In particular, it investigates the use of multiple memristive devices as a single computational primitive to represent a neural network weight and examines experimentally how such a compute primitive can improve non-desired memristive behavior. We propose a novel technique to arbitrate between the constituent devices of the synapse that can easily be implemented in hardware and adds only minimal energy overhead. We explore the proposed concept over various networks such as conventional non-spiking and spiking neural networks. The efficacy of this synaptic architecture is demonstrated for different training approaches including fully memristive and mixed-precision in-memory training by means of experiments using more than 1 million phase-change memory devices. Furthermore, we show that the proposed concept can be a key enabler to exploit binary memristive devices for deep-learning training.