Stochastic optimization is a popular modeling paradigm for decision-making under uncertainty and has a wide spectrum of applications in management science, economics and engineering. However, the stochastic optimization models one faces in practice are intractable, and numerical solutions necessitate approximations. The mainstream approach for making a stochastic optimization model amenable to numerical solution is to discretize the probability distribution of the uncertain problem parameters. However, both the accuracy of the approximation as well as the computational burden of solving the approximate problem scale with the number of scenarios of the approximate distribution. An effective means to ease the computational burden is to use scenario reduction, which replaces an accurate initial distribution accommodating many scenarios with a simpler distribution supported on only few scenarios that is close to the initial distribution with respect to a probability metric. Using the Wasserstein distance as measure of proximity between distributions, we provide new insights into the fundamental limitations of scenario reduction, and we propose the first polynomial-time constant-factor approximations for a popular scenario reduction problem from the literature. As scenario reduction is equivalent to clustering, it suffers from two well-known shortcomings. Namely, it suffers from outlier sensitivity and may produce highly unbalanced clusters. To mitigate both shortcomings, we formulate a joint outlier detection and clustering problem, where the clusters must satisfy certain cardinality constraints. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to feasible solutions for the MILP. We also prove that these solutions are optimal in the MILP if a cluster separation condition holds. Finally, we develop a highly efficient scenario reduction method for a large-scale hydro scheduling problem. Specifically, we study the optimal operation of a fleet of interconnected hydropower plants that sell energy on both the spot and the reserve markets, and we propose a two-layer stochastic programming framework for its solution. The outer layer problem (the planner's problem) optimizes the end-of-day reservoir filling levels over one year, whereas the inner layer problem (the trader's problem) selects optimal hourly market bids within each day. We prove that the trader's problem admits a scenario reduction that dramatically reduces its complexity without loss of optimality, which in turn facilitates an efficient approximation of the planner's problem.