Virtual memory (VM) is a crucial abstraction in modern computer systems at any scale, from handheld devices to datacenters. VM provides programmers the illusion of an always sufficiently large and linear memory, making programming easier. Although the core components of VM have remained largely unchanged since early VM designs, the design constraints and usage patterns of VM have radically shifted from when it was invented. Today, computer systems integrate hundreds of gigabytes to a few terabytes of memory, while tightly integrated heterogeneous computing platforms (e.g., CPUs, GPUs, FPGAs) are becoming increasingly ubiquitous. As there is a clear trend towards extending the CPU's VM to all computing elements in the system for an efficient and easy to use programming model, the continuous demand for faster memory accesses calls for fast translations to terabytes of memory for any computing element in the system. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of today's translation caches, such as TLBs. In this thesis, we provide fundamental insights into the reason why address translation sits on the critical path of accessing memory. We observe that the traditional fully associative flexibility to map any virtual page to any page frame precludes accessing memory before translating. We study the associativity in VM across a variety of scenarios by classifying page faults using the 3C model developed for caches. Our study demonstrates that the full associativity of VM is unnecessary, and only modest associativity is required. We conclude that capacity and compulsory misses---which are unaffected by associativity---dominate, while conflict misses rapidly disappear as the associativity of VM increases. Building on the modest associativity requirements, we propose a distributed memory management unit close to where the data resides to reduce or eliminate the TLB miss penalty.