Distributed transactions on modern RDMA clusters promise high throughput and low latency for scale-out workloads. As such, they can be particularly beneficial to large OLTP workloads, which require both. However, achieving good performance requires tuning the physical layout of the data store to the application and the characteristics of the underlying hardware. Manually tuning the physical design is error-prone, as well as time-consuming, and it needs to be repeated when the workload or the hardware change. In this paper we present SPADE, a physical design tuner for OLTP workloads in FaRM, a main memory distributed computing platform that leverages modern networks with RDMA capabilities. SPADE automatically decides on the partitioning of data, tunes the index and storage parameters, and selects the right mix of direct remote data accesses and function shipping to maximize performance. To achieve this, SPADE combines information derived from the workload and the schema with low-level hardware and network performance characteristics gathered through micro-benchmarks. Using SPADE, the tuned physical design achieves significant throughput and latency improvements over a manual design for two widely used OLTP benchmarks, TATP and TPC-C, sometimes using counter-intuitive tuning decisions.