A total of 16 global chemistry transport models and general circulation models have participated in this study; 14 models have been evaluated with regard to their ability to reproduce the near-surface observed number concentration of aerosol particles and cloud condensation nuclei (CCN), as well as derived cloud droplet number concentration (CDNC). Model results for the period 2011–2015 are compared with aerosol measurements (aerosol particle number, CCN and aerosol particle composition in the submicron fraction) from nine surface stations located in Europe and Japan. The evaluation focuses on the ability of models to simulate the average across time state in diverse environments and on the seasonal and short-term variability in the aerosol properties. There is no single model that systematically performs best across all environments represented by the observations. Models tend to underestimate the observed aerosol particle and CCN number concentrations, with average normalized mean bias (NMB) of all models and for all stations, where data are available, of −24 % and −35 % for particles with dry diameters >50 and >120 nm, as well as −36 % and −34 % for CCN at supersaturations of 0.2 % and 1.0 %, respectively. However, they seem to behave differently for particles activating at very low supersaturations (<0.1 %) than at higher ones. A total of 15 models have been used to produce ensemble annual median distributions of relevant parameters. The model diversity (defined as the ratio of standard deviation to mean) is up to about 3 for simulated N3 (number concentration of particles with dry diameters larger than 3 nm) and up to about 1 for simulated CCN in the extra-polar regions. A global mean reduction of a factor of about 2 is found in the model diversity for CCN at a supersaturation of 0.2 % (CCN0.2) compared to that for N3, maximizing over regions where new particle formation is important. An additional model has been used to investigate potential causes of model diversity in CCN and bias compared to the observations by performing a perturbed parameter ensemble (PPE) accounting for uncertainties in 26 aerosol-related model input parameters. This PPE suggests that biogenic secondary organic aerosol formation and the hygroscopic properties of the organic material are likely to be the major sources of CCN uncertainty in summer, with dry deposition and cloud processing being dominant in winter. Models capture the relative amplitude of the seasonal variability of the aerosol particle number concentration for all studied particle sizes with available observations (dry diameters larger than 50, 80 and 120 nm). The short-term persistence time (on the order of a few days) of CCN concentrations, which is a measure of aerosol dynamic behavior in the models, is underestimated on average by the models by 40 % during winter and 20 % in summer. In contrast to the large spread in simulated aerosol particle and CCN number concentrations, the CDNC derived from simulated CCN spectra is less diverse and in better agreement with CDNC estimates consistently derived from the observations (average NMB −13 % and −22 % for updraft velocities 0.3 and 0.6 m s−1, respectively). In addition, simulated CDNC is in slightly better agreement with observationally derived values at lower than at higher updraft velocities (index of agreement 0.64 vs. 0.65). The reduced spread of CDNC compared to that of CCN is attributed to the sublinear response of CDNC to aerosol particle number variations and the negative correlation between the sensitivities of CDNC to aerosol particle number concentration (∂Nd/∂Na ) and to updraft velocity (∂Nd/∂w). Overall, we find that while CCN is controlled by both aerosol particle number and composition, CDNC is sensitive to CCN at low and moderate CCN concentrations and to the updraft velocity when CCN levels are high. Discrepancies are found in sensitivities ∂Nd/∂Na and ∂Nd/∂w; models may be predisposed to be too “aerosol sensitive” or “aerosol insensitive” in aerosol–cloud–climate interaction studies, even if they may capture average droplet numbers well. This is a subtle but profound finding that only the sensitivities can clearly reveal and may explain inter-model biases on the aerosol indirect effect.