The increase in the number of cores in processors has been an important trend over the past decade. In order to be able to efficiently use such architectures, modern software must be scalable: performance should increase proportionally to the number of allotted cores. While some software is inherently parallel, with threads seldom having to coordinate, a large fraction of software systems are based on shared state, to which access must be coordinated. This shared state generally comes in the form of a concurrent data structure. It is thus essential for these concurrent data structures to be correct, fast and scalable, regardless of the scenario (i.e.,different workloads, processors, memory units, programming abstractions). Nevertheless, few or no generic approaches exist that result in concurrent data structures which scale in a large spectrum of environments. This dissertation introduces a set of generic methods that allows to build - irrespective of the deployment environment - fast and scalable concurrent data structures. We start by identifying a set of sufficient conditions for concurrent search data structures to scale and perform well regardless of the workloads and processors they are running on.We introduce âasynchronized concurrencyâ, a paradigm consisting of four complementary programming patterns, which calls for the design of concurrent search data structures to resemble that of their sequential counterparts. Next, we show that there is virtually no practical situation in which one should seek a âtheoretically wait-freeâ algorithm at the expense of a state-of-the-art blocking algorithm in the case of search data structures: blocking algorithms are simple, fast, and can be made "practically wait-free". We then focus on the memory unit, and provide a method yielding fast concurrent data structures even when the memory is non-volatile, and structures must be recoverable in case of a transient failure. We start by introducing a generic technique that allows us to avoid doing expensive writes to non-volatile memory by using a fast software cache. We also study memory management, and propose a solution tailored to concurrent data structures that uses coarse-grained memory management in order to avoid logging. Moreover, we argue for the use of lock-free algorithms in this non-volatile context, and show how by optimizing them we can avoid expensive logging operations. Together, the techniques we propose enable us to avoid any form of logging in the common case, thus significantly improving concurrent data structure performance when using non-volatile RAM. Finally, we go beyond basic interfaces, and look at scalable partitioned data structures implemented through a transactional interface. We present multiversion timestamp locking (MVTL),a new genre of multiversion concurrency control algorithms for serializable transactions. The key idea behind MVTL is simple and novel: lock individual time points instead of locking objects or versions. We provide several MVTL-based algorithms, that address limitations of current concurrency-control schemes. In short, by spanning workloads, processors, storage abstractions, and system sizes, this dissertation takes a step towards concurrent data structures that are universally scalable.