Improving the energy sustainability of our cities involves the integration of multiple renewable energy technologies into existing energy infrastructure, stretching the capabilities of traditional energy systems to the limit. To consider the transition take place in distributed energy system sector, complex cyber physical interactions need to be adequately modelled, which is not possible with currently used white box models. Furthermore, the volatility of climate conditions and frequent extreme climate events due to climate change as well as climate phenomena at urban scale make it essential to improve the robustness and resilience of energy systems. Again, white box models do not allow sufficient flexibility in the modelling approach. To address these limitations, thesis seeks to optimize distributed energy system design with the help of grey- and black-box techniques. A grey box model based on fuzzy logic is introduced to consider the dispatch strategy when designing electrical hubs. The grey box model shows better performance when optimizing electrical hubs. It has been shown that the method can achieve a renewable energy integration level of up to 80%. However, the grey box model fails to handle complex energy flows within the energy system. Therefore, a black-box method based on reinforcement learning is introduced to consider complex energy systems catering multi-energy services. Reinforcement learning based on a fully connected neural network (FNN), outperforms the grey box model by improving the objective function values by 60%. A convolution neural network improves the objective function values further (by up to 20% compared to FNN). The results reveal that black box models are competent when conducting optimization for complex energy systems. Distributed optimization is introduced to move from a single energy system to an energy internet consisting of several interacting energy systems. The energy internet is optimized considering fully cooperative and non-cooperative scenarios. The optimization algorithm shows a good capability to reach the Epsilon-Nash equilibrium when conducting the optimization. Finally, supervised and transfer learning methods have been introduced when conducting energy system optimization at the regional and national scale, which reduced the computation time by 84%. Stochastic and robust programing methods are introduced to improve the climate flexibility of energy systems. A hybrid stochastic-robust optimization algorithm is developed by extending the novel approach to consider both climate uncertainty and extreme events. A regional climate model provides climate scenarios for the stochastic optimization. The results of the study show that renewable energy technologies such as solar PV and wind can be used to cater 50% of the annual energy demand while guaranteeing a robust operation of the energy system during extreme climate events. The model is then further extended by integrating computational models for urban energy simulation considering urban climate. Results of the analysis show that a performance gap of up to 40 % can be observed when neglecting the influences of urban climate in the design of urban energy systems. In the final part of this thesis, a multi-criterion decision making technique is introduced into the energy system optimization model. This helps decision makers to weight a number of conflicting objectives and to consider impacts at the urban scale.