Resistive Random Access Memory (RRAM)-based routing multiplexers, built using a one-level structure, are significantly more delay efficient than state-of-art SRAM-based implementations thanks to their lower achievable on-state resistance. In addition, the delay of RRAM-based multiplexers scales better with respect to input size than SRAM-based multiplexers. This property allows RRAM-based FPGA architectures to employ larger multiplexers than their SRAM-based counterparts, without generating any delay overhead. In this paper, we first evaluate at the circuit-level the delay improvements of a state-of-art RRAM-based multiplexer. Then, to unlock the potential of RRAM-based multiplexers, we propose three related FPGA architecture optimizations: (a) The routing tracks should be interconnected to Look-Up Table (LUT) inputs via a one-level crossbar, instead of through Connection Blocks and local routing; (b) The Switch Block (SB) should employ larger multiplexers; (c) Length-2 wires should be used instead of length-4 wires. When a classical architecture is considered for both SRAM and RRAM technologies, RRAM-based FPGAs can reduce area by 17% and delay by 32% with zero power overhead for a 40nm technology. The proposed architectural enhancements can further improve area by 15%, delay by 10% and channel width by 13%. Combining RRAM technology and architectural enhancements, the proposed RRAM-based FPGA architecture improves Area-Delay Product by 57% and Delay-Power Product by 38%, as compared to a SRAM-based FPGA exploiting a classical architecture.