Solar signals in CMIP‐5 simulations: the stratospheric pathway

The 11 year solar‐cycle component of climate variability is assessed in historical simulations of models taken from the Coupled Model Intercomparison Project, phase 5 (CMIP‐5). Multiple linear regression is applied to estimate the zonal temperature, wind and annular mode responses to a typical solar cycle, with a focus on both the stratosphere and the stratospheric influence on the surface over the period ∼1850–2005. The analysis is performed on all CMIP‐5 models but focuses on the 13 CMIP‐5 models that resolve the stratosphere (high‐top models) and compares the simulated solar cycle signature with reanalysis data. The 11 year solar cycle component of climate variability is found to be weaker in terms of magnitude and latitudinal gradient around the stratopause in the models than in the reanalysis. The peak in temperature in the lower equatorial stratosphere (∼70 hPa) reported in some studies is found in the models to depend on the length of the analysis period, with the last 30 years yielding the strongest response.


Introduction
In addition to direct solar heating of the Earth's surface, the stratosphere provides a key link for variations in solar forcing to interact with the tropospheric circulation (e.g. review by Gray et al., 2010). As well as the dominant annual cycle in solar variability, there is also an 11 year solar cycle with longer underlying modes of variability. Due to the presence of the ozone layer in the stratosphere, which absorbs the ultraviolet (UV) component of incoming solar irradiance, the thermal structure of the stratosphere can vary greatly with the solar forcing. The enhancement of the meridional temperature gradient through the direct solar effect in the upper stratosphere can alter the dynamics of the stratosphere-troposphere system and ultimately project on to surface climate variability. Figure 1 shows a schematic of a series of proposed mechanisms whereby solar forcing is proposed to influence surface climate variability through the stratosphere-troposphere pathway. The initial perturbation to the stratosphere at solar maximum (1) occurs through increased heating in the summer hemisphere and equatorial region at the stratopause, due to more ozone absorption of UV radiation and increased ozone concentrations (Haigh, 1994;Gray et al., 2009). The ensuing enhancement of the latitudinal temperature gradient between the Equator and the winter pole leads to a strengthening of the stratospheric jet, which causes waves to be refracted more equatorward (Kodera and Kuroda, 2002). The resultant anomalous Eliassen-Palm (E-P) flux divergence in the midlatitude upper stratosphere leads to (2) a weakening of the Brewer-Dobson circulation. A positive feedback between planetary waves and the mean flow gives rise to (3a) the descent of the resulting polar wind anomaly to the tropopause. This can be thought of as a solar modulation of the polarnight jet oscillation (Kuroda and Kodera, 2001). The anomalous circulation in this region can then (4) influence tropospheric and surface weather patterns over the next 1-2 months (e.g. Ambaum and Hoskins, 2002;Baldwin and Dunkerton, 2001). In addition, the weakened Brewer-Dobson circulation can also cause (3b) anomalous warming in the lower equatorial stratosphere, as well as increased ozone concentrations. The resultant change in the latitudinal temperature gradient of the lower stratosphere leads to a response in the synoptic-scale eddy momentum fluxes in the midlatitude tropopause region and consequently (4) to changes in the zonal wind and temperature throughout the troposphere (Simpson et al., 2009). A response to solar forcing has been identified in the Southern Annular Mode (SAM: Roscoe and Haigh, 2007;Gillett and Fyfe, 2013), the Northern Annular Mode (NAM: Shindell et al., 2001;Matthes et al., 2006) and the North Atlantic Oscillation (NAO: Kodera, 2003;Woollings et al., 2010;Gray et al., 2013).
The mechanism described above that involves solar UV heating and ozone formation in the stratosphere is referred to as the 'topdown' mechanism. It is well established and has been shown in a number of model studies. Additionally a 'bottom-up' mechanism has been proposed, involving the direct heating of the subtropical (relatively cloud-free) sea surface from increased solar irradiance during solar maximum conditions. The small variations in seasurface temperature (SST) response over the 11 year solar cycle in this region are thought to be amplified by air-sea coupling in the tropical Pacific, which strengthens the Hadley and Walker circulations (Meehl et al., 2009). Meehl et al. (2009) showed for the first time in a comprehensive model study that both mechanisms, the top-down and the bottom-up, have to be taken into account in a climate model in order to obtain a surface solar signal that is comparable in magnitude with observations. Figure 1 provides a simplified view of a potential stratospheric-tropospheric pathway to solar forcing, but the situation is undoubtedly more complex. Some of the features presented in the schematic may be present for other reasons, such as oceanic variability (Misios and Schmidt, 2013), or through a modulation of the quasi-biennial oscillation (QBO: Gray et al., 2010). The tropical equatorial tropopause region may also be influenced by changes in tropospheric circulation in response to solar forcing (Shindell et al., 2006). There is also evidence for a nonlinear interaction between the 11 year solar cycle and the QBO: for example Labitzke (1987) showed that the strength of the Arctic polar vortex, a key component of stratosphere-troposphere coupling (e.g. Baldwin and Dunkerton, 2001;Mitchell et al., 2013), could be influenced by the solar forcing but the response is modulated by the phase of the QBO (see also Gray et al., 2001Gray et al., , 2004Matthes et al., 2004;Lu et al., 2008). This relationship has been shown to hold, at least for the QBO-W phase, right up to recent years (Camp and Tung, 2007;Labitzke and Kunze, 2009;Matthes et al., 2010Matthes et al., , 2013, although there are winters, such as that of 2008/2009, that do not agree so well. A further mechanism for solar influence on climate comes from the direct interaction of solar energetic particles with stratospheric constituents such as ozone and nitric oxide (see for instance Funke et al., 2011, for a model intercomparison). Such particles are formed during major solar magnetic events and enter the atmosphere in the polar regions. If the particles have sufficient energy, they can penetrate deep into stratosphere and lead to destruction of stratospheric ozone and hence less absorption of short-wave radiation in these regions (Solomon et al., 1982).
Early solar model intercomparison studies with atmospheric general circulation models (GCMs) and prescribed solar-induced ozone effects under constant solar maximum and minimum conditions highlighted the importance of a realistic model climatology to reproduce dynamically enhanced solar signals, comparable to observations (Matthes et al., 2003). The first intercomparison study using chemistry-climate models (CCMs) and therefore interactive solar-induced ozone effects revealed good correspondence for the annual mean solar signal in temperature and ozone. The comparison of solar signals in the SPARC CCMVal-2 project (CCMVal, 2010) under realistic timevarying solar-cycle variability and interactive ozone concluded the need for a relatively fine resolved short-wave radiation scheme in order to reproduce solar signals (see also Austin et al., 2008). Ermolli et al. (2013) investigated the effects of stronger solar UV forcing on the surface response in a number of different global models. With stronger solar UV forcing, the polar night jet in January is stronger throughout the stratosphere and down to the  Table 1. Details of the CMIP-5 models used in this study. Details include the high-top (H), low-top (L) or mid-top (M) classification, whether ozone is prescribed (P), calculated interactively (I) or semi-offline (S), the number of ensemble members in all forcing and the solar-only forcing simulations, the prescribed solar irradiance data, the number of bands that represent the short-wave component of irradiance in the model's radiation scheme and whether the model has a spontanous (Sp), nudged (N) or absent (-) QBO. The information in this table was collected from model development articles, CMIP-5 meta data and by contacting the individual modelling groups. Data that were not available through these channels are marked with a cross.  (Wang et al., 2005) x CanESM2 CCCMA (Canada) M P 5 5 (Wang et al., 2005) 4 -CCSM4 NCAR (USA) L S 5 - (Wang et al., 2005) 19 -CESM1-BGC NSF-DOE-NCAR (USA) L S 1 - (Wang et al., 2005) 19 -CESM1-CAM5 NSF-DOE-NCAR (USA) L S 3 - (Wang et al., 2005) x -CESM1-WACCM NSF-DOE-NCAR (USA) H I 1 - (Wang et al., 2005) 19 N CMCC-CM CMCC (Italy) L P 1 - (Wang et al., 2005) (Wang et al., 2005) 19 -NorESM1-ME NCC (Norway) L S 1 - (Wang et al., 2005) 19 -Models that did not include a representation of the solar cycle in their ozone. Models that scaled the TSI by 0.9965 to agree with the Total Irradiance Monitor (TIM) measurements (Kopp et al., 2005). upper troposphere, where a significant positive Arctic-oscillationlike signal occurs. These indirect dynamical solar-cycle-induced effects have not been compared and studied in detail in recent climate models, nor in CCMs nor in coupled atmosphere-ocean models, and are the subject of the current article. This article is part 1 of an analysis undertaken as part of the High Energy Particle Precipitation in the Atmosphere -Solar Influence for Stratospheric Processes and their Role in Climate (SPARC) [SOLARIS -HEPPA] Solar Model Inter-comparison Project (SolarMIP). SolarMIP was set up with the aim of analyzing the solar contribution to climate in the Coupled Model Inter-comparison Project, fifth phase (CMIP-5) models. In this study, we characterize the effect of solar variability on climate in the CMIP-5 models, with a specific focus on the dynamical variability caused through the stratospheric pathway. Follow-on studies will deal explicitly with the 'bottom-up' mechanism and the impact of solar irradiance on the surface and subsurface. They will also focus, in more detail, on the stratospheric response for the six high-top CMIP-5 models that included interactive ozone chemistry. The purpose of these studies and the broader SolarMIP project is as follows:

Model
1. to characterize the 11 year solar cycle signal in the CMIP-5 models; 2. to improve understanding of possible solar-related mechanisms for influencing climate; and 3. to assess the impacts of solar forcing on global and regional climate.
The layout of this study is as follows. In section 2, we describe the CMIP-5 data used and assess regression techniques. In section 3, we use multiple linear regression to extract the solar signal from climate variability. The stratosphere is initially assessed, followed by an examination of potential dynamical surface feedbacks. Finally, in section 4, we present discussions on where the latest global climate models seem to be deficient in terms of reproducing the solar variability effects on climate.

Data
This study makes use of data from GCMs and reanalyses. The model data are from CMIP-5 (Taylor et al., 2012). Up-to-date details of the models at the time of writing are given in Table 1. Any models where we were unable to determine the solar forcing are excluded from our analysis, because we do not know what the correct predictor is. The following models fall into this category: BCC-CSM1.1, BCC-CSM1.1-M, CESM1-FASTCHEM, FGOALS-g2 and FIO-ESM. In this study, every model is treated independently from every other. In reality, some of the models are based on the same dynamical and radiative code, or are from the same family of models, but making choices a priori regarding this introduces a human bias to our analysis that we would like to avoid.
We use all available ensemble members of the historical simulations that have the required data for this study (namely monthly mean surface temperature, mean sea-level pressure (MSLP), zonal mean air temperature and zonal mean zonal wind). The historical simulations extend from ∼1850-2005, have coupled oceans and are externally forced by well-mixed greenhouse gases, ozone changes, aerosols and solar irradiance changes. Most modelling groups used the World Climate Research Programme (WCRP) SPARC SOLARIS-HEPPA recommendation for the solar irradiance (data available from http://solarisheppa.geomar.de/solarisheppa/cmip5), which follows the reconstruction from Wang et al. (2005). However, this was not always the case and more complete details are given in Table 1 and noted in the relevant analyses. The total solar irradiance (TSI) is available as annual means from 1850-1881 and as monthly means thereafter, with spectrally resolved data available for modelling groups with appropriate short-wave radiation codes. We choose not to analyze the future scenario data, as modelling groups have dealt with the solar cycle in a range of different ways, making the analysis non-trivial. For example, the SOLARIS-HEPPA recommendation for future TSI was to repeat the last observed solar cycle (cycle 23); however, some modelling groups have repeated the average of the last four solar cycles, due to concerns that cycle 23 was not representative of the TSI as a whole. Note that none of the GCMs includes solar effects from solar energetic particles, except for CESM-WACCM1.
The models have been subdivided into those that resolve the stratosphere well (high-top models) and those that do not (lowtop models). We define a high-top model as one with a model lid height above the stratopause; all other models are defined as low-top. In reality, all models classed as high-top have a model lid height of at least 0.1 hPa and so can resolve signals associated with solar irradiance in this region. We further stipulate that high-top models must archive data above 10 hPa, so we can assess the stratospheric response in detail. This subdivision leads to 13 models classed as high-top and 17 as low-top. We present results principally from the high-top models for conciseness and under the assumption that they are more likely to simulate the stratospheric pathway, but also discuss low-top model results when appropriate. For validating the model simulations, we make use of multiple reanalysis datasets that have been identified as part of the SPARC Stratospheric Reanalysis Inter-comparison Project (S-RIP: Fujiwara and Jackson, 2013). We use the three most recent reanalysis datasets from the project: Modern-Era Retrospective analysis for Research and Applications (MERRA), ERA-Interim and the Japanese 55-year Reanalysis (JRA-55), which are compared in detail (along with older reanalyses) in Mitchell et al. (2014). All three reanalyses provide the required data variables as monthly mean gridded data, spanning at least the surface to the stratopause, and share a common period of 1979-2009.

Methods
The principal tool for analysis used in this study is multiple linear regression (MLR). MLR has been employed to extract the solar signal from many model, observational and reanalysis datasets (e.g. Haigh, 2003;Crooks and Gray, 2005;Soukharev and Hood, 2006;Austin et al., 2008;Frame and Gray, 2010;Hood et al., 2010). The MLR model takes the following form: where y t is the observed variable at time t, β i is the ith regression coefficient, x is a matrix comprised of n predictors for the regression and 0,t is the noise term associated with the observed climate, y t . Note that we assume the predictors, x, are noise-free (i.e. the predictors are exact) in this model. In this study, we regress de-seasonalized, monthly-mean surface temperature, MSLP, zonal-mean air temperature and zonal-mean wind on to six separate predictors (n = 6 in Eq. (1)): the TSI (Wang et al., 2005); the global aerosol optical depth at 550 nm, which gives a measure of volcanic aerosol (available from http://data.giss.nasa.gov/modelforce/strataer/); two QBO terms, which are calculated from the first two empirical orthogonal functions (EOFs) of the equatorial zonal wind throughout the of solar spectral variability, which differs somewhat in temporal behaviour from TSI. We adopt the TSI index for simplicity because the short-wave component of solar irradiance was not represented uniformly in the models ( Table 1). Note that the long-term solar trend is removed from the TSI to leave only the 11 year solar cycle variations, since this is the solar signal of prime interest. The trend is removed from the TSI time series using a multi-taper spectral filter (Lees and Park, 1995). The regression analysis was tested with and without the long-term solar trend included in the TSI. Results involving surface and ocean variables were found to be dependent on whether or not it was included, but results involving atmospheric variables were not.
One common issue in regression analyses is autocorrelation in the time series, which can result in a poorly fitted statistical model. There are a number of methods to remove autocorrelation from a time series (known as prewhitening). To ensure that our regression technique is suitable, we compare the multiple linear regression without prewhitening with two commonly used methods that include prewhitening (Tiao et al., 1990;Box et al., 2013). The Tiao et al. (1990) method (see also Cochrane and Orcutt, 1949) does this by representing the residual term in Eq. (1) as an auto-regressive process (in this study we found that an AR1 process sufficed and higher order autoregressive processes did not change the significance of the results). The Box et al. (2013) method prewhitens the y and x components from (1) directly, by using the first-order autocorrelation coefficients of y (a full description of these two methods, as well as some implications for their use, is also given in Chiodo et al. (2014); readers are referred to their appendix for details). Figure 3 shows the stratospheric temperature response to solar forcing in the equatorial region (25 • N-25 • S) for the three separate methods using the HadGEM2-CC GCM. The values should be interpreted as the difference in temperature between the solar maximum and the solar minimum of a typical solar cycle. We define a typical solar cycle as a change of 1 W m −2 in the TSI, which is equivalent to an increase of ∼130 sunspots or ∼130 units of 10.7 cm radio flux (for a full comparison of metrics of solar variability, see Lockwood and Fröhlich, 2007). Figure 3 shows very consistent results across each of the three regression methods, increasing confidence that the signals reported in this study and previous literature are robust. There are differences, however, in the estimates of statistical significance (filled circles denote that the signal is statistically significant at the 95% level). The method used by Tiao et al. (1990) and in this study has the least significant results. For the HadGEM2-CC model displayed in Figure 3, it is only the temperature response near the stratopause that is significant, whereas other methods show significant results at other levels. Applying this analysis to other models yielded similar results; in some cases the differences are larger, although the significant response at the stratopause is always present. This sensitivity test demonstrates that, of the three methods tested, the Tiao method is the most conservative estimate and suggests that the other two methods may be overconfident.
Another common issue in regression analysis is crosscorrelation of predictors, i.e. if multiple predictors vary in a similar way, the regression method could attribute the variability incorrectly. One potential issue in this study could be due to the large volcanic eruptions occurring at the same stage of the 11 year solar cycle. For example, El Chichon and Mt Pinatubo both occurred just after a solar maximum (see Figure 2). Chiodo et al. (2014) suggested that the temperature and ozone response to solar forcing in the lower equatorial stratosphere was greatly overestimated in regression analyses that used short time series and this was related to the aliasing between volcanic signals and TSI. However, Frame and Gray (2010) carried out a series of sensitivity tests to check this in a time series with an additional solar cycle that did not have a concurrent volcanic eruption and found that their results were insensitive to the presence of the eruption.
In this study, to test the sensitivity of the regression coefficients to cross-correlation in the predictors, we perform the regression analysis on two parallel sets of CMIP-5 simulations: (i) simulations forced only with solar irradiance and (ii) simulations forced with all the CMIP-5 recommended forcings (including solar irradiance), as detailed earlier in this section. The use of 150 years of model data compared with the much shorter period of data used in observational studies will also allow for better separation of solar and volcanic signals. Throughout the CMIP-5 archive, there are four such models that resolve the stratosphere and have the required simulations, while also having at least three ensemble members, which is important to obtain a more robust estimate of internal model variability (CanESM2, HadGEM2-CC, GISS-E2-H, GISS-E2-R; * see Table 1). A comparison of the solar regression coefficients of these models for both types of simulation (not shown) reveals very good agreement, especially in the equatorial region, and therefore adds faith that crosscorrelation between the solar predictor and other predictors is not an issue here. The agreement between the two types of simulation was less clear in the polar regions, although still well within the 95% error margins. Figure 4 shows the annual-mean equatorial temperature response to a typical 11 year solar cycle (details of which are given in the methods) for (a) all high-top models, (b) all low-top models, (c) all models with prescribed ozone chemistry and (d) all models that calculate ozone chemistry interactively or semi-offline (see Eyring et al., 2013, for a more detailed discussion on the ozone in CMIP-5 models). The multi-reanalysis mean and range are also plotted in black. To assess the sensitivity of regression to the temporal length of data, the analysis is performed using model data for the shorter period of 1979-2005 (top) and the longer period of ∼1850-2005 (bottom).

Examination of the solar signal in the stratosphere
Throughout the troposphere and lower stratosphere, both low-top and high-top models reveal a warming in the solar maximum and the spread in responses from both sets of models is large (columns (a) and (b)). The majority of reanalyses estimate a localized lower stratospheric (∼70 hPa) peak in temperature increase of around 0.25 K (e.g. Crooks and Gray, 2005;Mitchell et al., 2014) that may be caused by a slowing in the Brewer-Dobson circulation in the solar maximum from decreased planetary-wave driving near the stratopause (Kodera * These two versions of the GISS model differ in their ocean vertical coordinate system only.  and Kuroda, 2002). Therefore, one might expect the high-top models that resolve the stratosphere and hence Brewer-Dobson circulation well to reproduce the lower stratospheric warming better. However, when considering the shorter 1979-2005 period (Figure 4(a) and (b), top), this is not the case and not much difference is seen between the low-and high-top models. To understand this more clearly, we subdivide the models into those with prescribed ozone, interactive ozone and semi-offline ozone † (Figure 4(c) and (d), top). It is clear that, in the group with interactive ozone or semi-offline ozone (d), the peak in lower stratospheric temperatures is more distinct in around half of the models, suggesting that chemistry plays a role. The same analysis, but on the ozone fields for the interactive chemistry models over the period 1979-2005, shows that for half the models a significant change in Ozone is observed in the lower stratosphere (∼50-70 hPa). However, this could also be due to a possible † Semi-offline ozone refers to the primary GCM being forced by ozone from a coupled chemistry model that is run separately; see Eyring et al. (2013) for more details.
aliasing effect with volcanic eruptions during their regression (L. Hood, 2015;personal communication). Other factors, such as nonlinear interference with the QBO, SSTs and volcanic signals, might also contribute to the secondary maximum and warrant further investigation. Interestingly, when the regression is performed on the longer period of ∼1850-2005 (bottom panels), the temperature evolution in the lower stratosphere changes to be more continuous, rather than peaked. The temperature response in the upper stratosphere remains similar over both time periods considered. This points towards drawbacks when considering regression-based techniques over short time periods for the solar signal in the lower stratosphere, possibly due to aliasing between QBO, volcanic and the 11 year solar cycle signals (see Lee and Smith, 2003;Chiodo et al., 2014).
In the upper stratosphere, where only high-top model data are available (Figure 4(a)), the models broadly show a maximum temperature response at ∼1 hPa, which is lower in altitude than the reanalysis peak at ∼0.5 hPa. The peak in temperature is expected to be around the stratopause, due to a combination   Figure S1. of direct solar irradiance increase and increased photochemical production of ozone in this region (Gray et al., 2009). While the magnitude of the temperature response at 1 hPa between the models and reanalyses is consistent, suggesting a temperature difference over the solar cycle of around 1 K, the response at 0.5 hPa in the reanalyses extends to over double this (albeit with a large error bar). The reason for this discrepancy is not immediately apparent, but may come from a poor prescription of ozone above 1 hPa in the models (or another absorber of solar radiation, such as oxygen: see Sukhodolov et al., 2014), an underestimation of the UV component of TSI in the models ‡ (see Ermolli et al., 2013) or deficiencies in the reanalyses above 1 hPa, where data assimilation is poorly constrained. The latter point is likely to account for at least some of the discrepancy, as the analysis of Mitchell et al. (2014) showed an artificial peak in two out of three of the reanalyses (MERRA and ERA-I), due to the introduction of new instruments into the reanalysis schemes, that maximizes at the same time as the 11 year solar cycle maximizes. Hence this jump may well be incorrectly attributed to solar variability and Mitchell et al. (2014) showed that for ERA-I, if the step change was accounted for, the solar regression coefficient near the stratopause was significantly reduced. A comparison of solar ‡ Ermolli et al. (2013) showed that our current observations of the UV component were probably an underestimate. Therefore the prescribed ozone fields may also be an underestimate. regression coefficients between the SSU and ERA-40 also supports this conclusion (see figure 8.11 of CCMVal, 2010). Figure 5 shows the annual-mean 11 year solar cycle temperature response for each of the high-top models individually, using the full period of data (∼1850-2005). First inspections show that all of the models have a distinctive maximum at around 1 hPa, with the exception of the two GISS models, which show a smoother transition that is smaller in magnitude. Regression analysis of the ozone fields for these models also shows a smoother transition than other models (L. Hood, 2015;personal communication). The GISS models do, however, show a significant response throughout the depth of the equatorial stratosphere that is not observed to be significant in the other models. The MIROC-ESM model shows the least significant response at the stratopause (despite being larger than the response from the GISS models) and this could be because MIROC-ESM has no solar cycle variations in prescribed ozone (Table 1), whereas the other models do.
CESM1-WACCM shows the largest response of all the models in the equatorial lower stratosphere (∼70 hPa), although this is not significant. When comparing the multi-model mean (MMM) with the multi-reanalysis mean (MRM: Figure 5 (bottom)), § it is § Note that the MRM is from data that start in 1979, whereas all other panels use data that start from ∼1850. Note also that the hatching in the MRM and MMM corresponds to locations where all the reanalyses (or models) agree on the sign of the result, not where the regression coefficients are statistically significant at the 95% level, as shown in the other panels (also in Mitchell et al., 2014).  Figure S2.
clear that the magnitudes at all latitudes around the stratopause are smaller in the models, as well as the latitudinal temperature gradients. The temperature response at high latitudes is more complex, especially in the reanalyses, leading to large variations among them (Mitchell et al., 2014). This could come from the way in which the reanalyses assimilate the data, the shorter periods used by reanalyses or the higher variability in the polar regions (e.g. Andrews et al., 1987;Mitchell et al., 2011). The analysis is very similar when the shorter period  is used for the regression (see Figure S1). The magnitudes of the temperatures are a little larger around the stratopause and this is in agreement with Figure 4. There are generally no statistically significant regions at high latitudes, but, even so, the agreement between the analysis of the two time periods is very similar.
A similar analysis, but using zonal winds rather than temperature (Figure 6), reveals a far weaker signal in the models when compared with the reanalyses at all latitudes, with poor agreement in the location of the wind anomalies. This is most likely due to the weak latitudinal temperature gradients near the stratopause observed in Figure 5. The CMCC-CESM model shows the strongest response, which also corresponds to the largest latitudinal temperature gradient, although other models consistently show strengthened southern polar winds of around 3 m s −1 . To understand these changes in more detail, it is useful to consider the dynamical evolution on a month-by-month basis during the winter. To condense all the data, we show only the multi-model means of the high-top models for the regression  Figures S3 and S4 for temperature and zonal wind, respectively. Kodera and Kuroda (2002) identified a subtropical wind anomaly associated with the solar cycle near the stratopause in early winter using National Centres for Environmental Prediction (NCEP) reanalysis data. They showed that this signal moves poleward and downward throughout winter. Subsequently, Mitchell et al. (2014) showed that the only part of that signal that is statistically significant across all the latest reanalysis datasets (JRA-55, ERA-I and MERRA) is a weakening of the NH polar vortex in February, which results in an increase in temperature of ∼20 K and a decrease in zonal winds of ∼10 m s −1 . In the analysis presented here (Figure 7), the zonal winds show a poleward and downward positive anomaly during December-February (DJF), which is in qualitative agreement with the evolution found in Kodera and Kuroda (2002). However, the modelled zonal wind anomalies appear shifted by one month compared with reanalyses and other observation-based studies (Kodera and Kuroda, 2002;Ineson et al., 2011;Mitchell et al., 2014) and during March a significant negative wind anomaly with an amplitude of ∼2.5 m s −1 is observed in the upper stratosphere when the 1979-2005 data are used (Figure 7, top). This anomaly is only ∼25% of the observed anomaly and even less than this when the longer period of data is used (Figure 7, bottom). The magnitudes of the modelled positive wind responses throughout this analyses are also weaker than in observations when the ∼1850-2005 period is used, but in reasonable agreement when 1979-2005 is used (Figure 7, top). The reason for the discrepancies between the models and observation-based studies may be biases in the winter stratospheric climatology or a probable underestimate of the solar UV component of TSI used in the current CMIP-5 models (Ermolli et al., 2013), which Kodera and Kuroda (2002) note is essential for the enhanced poleward and downward evolution of solar-related anomalies. The way in which different modelling groups partition the spectral irradiance is also of potential concern (see Table 1). For the individual models presented here, the poleward and downward wind progression of NH positive zonal winds throughout winter is observed in five out of the 13 high-top models considered here ( Figure S4). This behaviour is observed to be statistically significant in around half of the models considered here, namely CMCC-CMS, MIROC-ESM-CHEM, MPI-ESM-MR, MRI-CGCM3, MRI-ESM1 and marginally in CESM-WACCM. All of these models have high spectral resolution in the short-wave radiation band (see Table 1), which is known to be important in producing this dynamical evolution. However, GFDL-CM3 is an example of a different model with high spectral resolution, where this effect is not observed.
This solar influence on the polar vortex winds, particularly in February, has been known about for some time (Labitzke, 1987). There also appears to be an interaction between the solar and QBO signal at high latitudes, although none of the reanalyses used in this studys show a statistically significant correlation between solar activity and Arctic vortex strength over the common period of 1979-2009, when partitioned into different QBO phases (not shown). Studies have shown, however, that for a longer period the relationship is statistically significant in reanalyses (Camp and Tung, 2007).
As the variability of the polar vortex is important for stratosphere-troposphere coupling Mitchell et al., 2013), we assess the Labitzke solar-vortex relationship in the CMIP-5 models with a QBO. Five of the models from this study have an internally generated QBO (CMCC-CMS, ¶ HadGEM2-CC, MIROC-ESM, MIROC-ESM-CHEM and MPI-ESM-MR) and one has a nudged QBO (CESM1-WACCM). To maximize the significance of potential signals, we use all 150 years (∼1850-2005) of all available ensemble members and consider each of the winter months individually. No statistically significant relationship linking the Arctic vortex strength to the solar-cycle activity (and QBO phase) is found in any of the models in any month. Arguments could be made that the models with internally generated QBOs do not give rise to QBO phases that are the same as observations and this is why they do not reproduce the Labitzke (1987) result. However, the CESM1-WACCM model has a nudged QBO and still does not reproduce this relationship; therefore these arguments hold less weight.
The correlations of the solar-vortex relationship at 50 hPa range from −0.30 to 0.10 in QBO-W and −0.25 to 0.05 in QBO-E, compared with 0.49 and −0.30 in the QBO-W and -E phases, respectively, reported in Labitzke and Kunze (2009) using observations from 1948-2009. The relationship was also examined in 600 years of four ensemble members (each spanning 1860-2010) of HadGEM2-CC simulations that are forced only by changes in solar irradiance. This is particularly useful, because it means the only forcings on the polar vortex come from solar irradiance or internal climate variability such as the QBO and ENSO. Using the same length of data as in Labitzke and Kunze (2009), we randomly subsample periods of 68 years and correlate the vortex strength, but at no point do we find any statistically significant relationship. Kren et al. (2014) performed ¶ While the CMCC-CMS model is found to have a QBO, the evolution of the QBO is unrealistic in both the time spent in QBO-E with respect to QBO-W phases and the descent of the equatorial wind anomalies with time. We use polar-cap (60-90 • N) averaged temperature at 10 hPa to define the vortex strength. However, we experimented with different heights and area averages, as well as different metrics of vortex strength, such as geopotential height, vortex area and vortex circulation. a similar analysis, but using the WACCM model and randomly subsampling 40 year periods (as opposed to the 68 year periods used here). In their study, they did find occasions where a statistically significant correlation existed in both QBO-E and W phases, although the correlations were of either sign. Repeating our analysis, but using the shorter period from Kren et al. (2014), we find similar results over a few randomly sampled periods, i.e. a strong solar-vortex correlation could appear by chance in a 40 year period. However, we expand on their result by noting that the correlation reported in Labitzke and Kunze (2009) in a 68 year period could not appear by chance in HadGEM2-CC. It is important to note that, while the QBO and vortex variability are in good agreement with observations for HadGEM2-CC Mitchell et al., 2012), this does not necessarily mean that the mechanisms behind the Labitzke (1987) relationship are well simulated by this model. For example, the Holton-Tan mechanism in this model is known to be weak (Watson and Gray, 2014).

Dynamically driven solar signals at the surface
One of the primary reasons for studying the stratospheric response to solar forcing is to understand better the surface response of anomalies that have come via the stratosphere, the mechanisms of which are described in section 1 (see also Figure 1). The regions we usually associate with stratosphere-troposphere dynamical coupling are the high northern latitudes in winter Mitchell et al., 2013). The high northern latitudes are also regions where we often see features linked with changing solar forcing. For instance, many authors have reported a distinct pressure anomaly in the Aleutian low region, with higher pressure at solar maximum than solar minimum, and this often has a subtropical counterpart of opposite sign (e.g. van Loon et al., 2007;Meehl et al., 2008;Roy and Haigh, 2010). A weakening of the Aleutian low under solar maximum conditions with roughly the right amplitude has been found in at least two modelling studies that used relatively strong upper stratospheric solar forcing. Ineson et al. (2011) obtained a weakening of several hPa (see their figure 1(a)) by applying a very large change in solar UV. Hood et al. (2013) analyzed GCM simulations that were forced using a moderate solar UV change but with a series of prescribed ozone changes imposed in the upper stratosphere. They found that the simulation that applied the largest ozone changes in the upper stratosphere simulated a weakening of the Aleutian low, especially during a selected 100 yr period of the simulation. These two results also suggest that a high-top model is necessary to reproduce the North Pacific response. A solar signature in the NAO has also been reported in observational studies (e.g. Kodera and Kuroda, 2002;Woollings et al., 2010) and modelling studies (e.g. Shindell et al., 2001;Tourpali et al., 2005;Matthes et al., 2006Matthes et al., , 2010Chiodo et al., 2012;Matthes et al., 2013). For instance, Tourpali et al. (2005) showed, using a chemistry-climate model, that the change in UV radiation during solar maximum led to a change in the structure of the Arctic Oscillation. Gray et al. (2013) further showed that the solar-NAO link was maximized ∼3-4 years after the solar maximum in their study of SST and MSLP observations for the extended period 1870-2012; therefore, lagging the TSI with respect to other predictors was a necessary step in regression analysis.
We begin our analysis of the surface response by reproducing the results of Gillett and Fyfe (2013), who showed that in the CMIP-5 models there was no significant response to solar forcing in the NAO or NAM during any season. They did, however, show that there was a significant positive shift during the solar maximum in the SAM in all seasons apart from September-November (SON). We repeat their analysis using the same definitions for the NAO, NAM and SAM, but also look at the response for the high-top and low-top models separately. Figure 8 shows the response to the 11 year solar cycle for NAO (top), NAM show the 5-95% confidence ranges in the mean, as derived by dividing the sample standard deviation across the subset of models by the square root of the number of models in each subset and multiplying by the 5% cut-off value of a Student t-distribution with df degrees of freedom, following exactly the methodology of Gillett and Fyfe (2013), where df equals the number of models minus one in each subset.
(middle) and SAM (bottom), for each season individually. Crosses show the response for individual models (the left sets are high-top models, the right sets are low-top models) and the thick black lines show the uncertainty in the multi-model mean, following exactly the methodology of Gillett and Fyfe (2013) (see also the Figure 8 caption). Our results are in good agreement with those of Gillett and Fyfe (2013), especially given the differences in our analyses. * * We show no significant response in the NAM, but a significant response in some seasons for the SAM, although we do not see a response in DJF, as reported in Gillett and Fyfe (2013), and * * Gillett and Fyfe (2013) choose to use the future RCP4.5 data for each of the models, as well as using some models that we do not (see the Data and Methods sections for our reasons for excluding these data). In addition, due to differences in the availability of data between their study and ours, ∼25% of the models are different, leading to many more high-top models in our study. Crosses show individual model responses. Thick black lines show the 5-95% confidence ranges in the mean, as derived by dividing the sample standard deviation across the subset of models by the square root of the number of models in each subset and multiplying by the 5% cut-off value of a Student t-distribution with df degrees of freedom, following exactly the methodology of Gillett and Fyfe (2013), where df equals the number of models minus one in each subset.
we do see a response in SON, mainly driven by the high-top models. There is, however, more of a response in the NAO region, which always comes from the high-top models. This is observed in all seasons except DJF, which is contrary to what we know about the stratosphere-troposphere connection in terms of modulation of the PJO (Kuroda and Kodera, 2001), but might lend support to the mechanism proposed by Simpson et al. (2009). Their mechanism (see Figure 1 and the corresponding discussion) involves temperature perturbations in the lower equatorial stratosphere, which are evident all year round and could influence tropospheric annular modes at any time.
Note that some observational studies have reported a positive NAO during DJF at lag zero during the solar maximum (e.g. Kodera, 2003;Matthes et al., 2006;Ineson et al., 2011); however, they considered relatively short periods (∼50 years). Gray et al. (2013) used the longer period of 150 years, consistent with our study, and found a negative (although not significant) NAO response during the solar maximum. The NAO response and specifically the Azores component of it switches sign when the analysis fields are lagged with respect to the 11 year solar cycle and the maximum response is observed at lags of 3-4 years (see also Hood et al., 2013, for lags of up to 2 years). Figure 9 shows the same analysis as presented in the previous figure, but for the Azores region during DJF. The x-axis shows the response at different lags covering a whole solar cycle. In general, the responses are not significant; however, three significant positive responses are observed. At a lag of -1 year, an increase in the Azores pressure is evident from the low-top models, which is not seen in the observations . At a lag of 1 year, a significant increase in the Azores pressure is simulated in the high-top models, which is in agreement with the sign of the observations. Finally, at a lag of 3 years, a significant response is observed in the high-top models, which is in agreement with the largest signal in the North Atlantic in the observations. The modelled response is, however, weaker than the observed response and no single model (crosses) simulates the observed value of ∼2 hPa . However, an evolution at all lags similar to that seen in observations is found in one particular model, MIROC-ESM-CHEM ( Figure S5). For this model, the strong negative AO and NAO are clearly significant at a lag of 0 years and the magnitude of the response is consistent with observations (although note that the observations are statistically insignificant here). At lags of 3-4 years, the predominant statistically significant signal is the high-pressure anomaly in the Azores region, rather than the low-pressure anomaly over Iceland. This behaviour is also consistent with Gray et al. (2013).
The evolution of the response in the North Atlantic region for MIROC-ESM-CHEM is similar enough to observations that it is unlikely to have been captured by chance, even when sub-selecting from all 31 models used here, because of the complexity of the spatial patterns. However, it is perhaps not surprising that this model reproduces the surface dynamical effects of solar variability well. The model characteristics for MIROC-ESM-CHEM seem particularly favourable for the simulation of solar signals. It has the highest spectral resolution of any of the models in CMIP-5 examined here, as well as having a well-resolved stratosphere with an internally generated QBO and interactive chemistry (see Table 1). These are all key features that are thought to be important for solar interactions with climate. Examples of models with similar characteristics (although lower spectral resolution), but where the surface response is not well captured, are also found (see for instance CESM1-WACCM and GFDL-CM3 in Figure S5). Therefore, it cannot be said that simply having the key model characteristics is enough to reproduce the desired surface effects. There are undoubtedly other important model characteristics, such as adequate air-ocean interactions, which Scaife et al. (2013) suggest are important for reproducing the 3 year NAO lag in response to solar forcing.
To understand the spatial structure of the response in more detail, Figure 10 shows composites of MSLP for the solar response from (left) high-top models and (right) low-top models during DJF at different lags. Stippled regions show where the high-top and low-top models are significantly different at the 95% level. The Azores region shows a significant response at lags 1-3 years and is consistent with the analyses presented in the previous figure but shows that the high-pressure anomaly over the western Atlantic is dominating the signal. However, the largest response is simulated in the high-top models in the North Pacific, consistent in sign but not magnitude with Roy and Haigh (2010) at lag 0 (but not so at later lags: Hood et al., 2013;Gray et al., 2013). The response is, however, not significantly different from the low-top models and this is because it is dominated by one particular model (CMCC-CESM; Figure S5). The high-top models show more of a response in the North Pacific at lags greater than 0, but this is not consistent with observations. Individual models ( Figure S5) do simulate a high-pressure anomaly of 2-4.5 hPa in the North Pacific during the solar maximum which is statistically significant, i.e. from the NCC and CMCC family of models, as well as in the MPI-ESM-MR model. This particular subset of models does not have similar radiation schemes, spectral resolution, model resolution or chemistry implementation (see Table 1). As such, the specific reasons why they perform well in this region are not understood, probably because the mechanisms behind the weakened Aleutian Low are not well understood. Other models also show the correct response in this region, although they are not statistically significant (e.g. CESM1-WACCM, INMCM4, MRI-CGCMs and MRI-ESM1), making this the most commonly reproduced observed surface feature from solar activity in models, at least from a dynamical perspective.  Figure 10. Composites of the lagged MSLP response in DJF to (left) all high-top models and (right) all low-top models. Stippled areas show where the two sets of models are statistically significantly different from each other at the 95% level using a Student t-test.

Conclusions
In this study we have examined the 11 year solar signal in both the stratosphere and the region of the surface that may be influenced by the stratospheric pathway, described in section 1. The principal analysis has been performed on data from ∼1850-2005 using 31 CMIP-5 models, with the 13 stratosphere-resolving models being the main focus. Data over the 1979-2005 period are also used for a comparison. The latest reanalysis datasets are used for validation purposes.
There is much variability in the modelled equatorial temperature response profiles during the solar maximum. The lower stratospheric response (∼70 hPa) appears to be partially influenced by the coupled-chemistry scheme and is sensitive to the length of the regression period. Near the stratopause, the modelled peak temperature response is slightly lower in altitude compared with reanalyses and far smaller in magnitude than the corresponding reanalysis response (∼1 K over a typical solar cycle, compared with ∼2 K in reanalyses). This could be due to misattribution of an artificial step change in most reanalyses to the solar cycle, as the peak in the step change and peak in the 11 year solar cycle are coincident (Mitchell et al., 2014).
A lack of latitudinal variations in the annual heating response at the stratopause in the models leads to a weak zonal wind response, which may hinder communication with the surface in winter for some models (see the schematic in Figure 1; Kuroda and Kodera, 2001). In general, a modulation of the PJO is simulated more accurately when modelled data from the 1979-2005 period are used, although there is a delayed response in the models. The models that do capture the wintertime zonal wind response all have high spectral resolution in the short-wave band. No statistically significant relationship between solar forcing and the Arctic polar vortex strength is observed when model data are partitioned into different phases of the QBO when considering the same period as Labitzke and Kunze (2009). This was probably due to the weakness of the upper stratospheric response to solar forcing in the models, a lack of the required physical mechanisms (such as the Holton-Tan mechanism) in the models or simply the fact that the Labitzke and Kunze (2009) relationship is due to a chance occurrence (e.g. Kren et al., 2014).
Finally, not all models were able to reproduce the dynamical surface impacts associated with the 11 year solar cycle, thought to come via the stratospheric pathway. A fraction of the models, including those with low tops as well as high tops, appeared to simulate the surface response in the North Pacific during northern winter (weakened Aleutian low) approximately. However, no common characteristic distinguished the successful models, so it is not clear what the mechanism of this response is and hence what the model requirements are to reproduce it. There was evidence that the high-top models simulated the observed lagged North Atlantic response  better than the low-top models, but the magnitudes were much reduced. The significant pressure anomaly (over the Azores in observations) at a lag of 3 years was also more westward in the models as a whole. The MIROC-ESM-CHEM model, which happened to have the highest spectral resolution of all the models considered here, as well as having a well-resolved stratosphere with an internally generated QBO and interactive chemistry scheme, showed particular promise in this regard. These characteristics may be important for modelling groups to focus on for improvement in the modelling of the effect of solar variability on surface climate. In addition, many of the issues presented in this article are consistent with too-weak forcing from the UV component of TSI, although this was not explicitly shown. It is expected that models will improve in their representation of solar influences on climate if the underlying solar forcing in models is improved. This is an area that should also be examined from a model development framework. for helping with reanalysis data products, Laura Wilcox for advice about CMIP-5 ozone forcing data, James Anstey for help with CMIP-5 data classifications and Nathan Gillett and Gavin Schmidt for useful discussions. SolarMIP is part of WCRP-SPARC SOLARIS-HEPPA. DMM and LJG are funded by NERC. SM is partially supported by the SOLID (FP7-SPACE-2012-313188) project. Some of this work was supported by STSM grants from COST Action ES1005 'TOSCA' (www.tosca-cost.eu) awarded to DMM and SM. ER has been partially supported by the Swiss National Science Foundation under grant CRSII2-147659 (FUPSOL II) and by State Secretariat for Education, Research and Innovation of Swiss Confederation under grant C11.01124 (SOVAC). Work at GEOMAR Helmholtz Centre for Ocean Research Kiel is partly supported within the Helmholtz University Young Investigators Group NATHAN funded by the Helmholtz Association through the President's Initiative and Networking Fund and the GEOMAR in Kiel. Work at the University of Arizona was supported by the US National Science Foundation under grant 1251092. Figure S1. As Figure 5, but using data from 1979-2005. Figure S2. As Figure 6, but using data from 1979-2005. Figure S3. Temperature response to the 11 year solar cycle for monthly averaged data during NH winter (November-April) for all models plotted individually. Data used are from ∼1850-2005. Stippled areas show the response is significant at the 95% level. The black line shows the zero wind contour. Figure S4. Zonal wind response to the 11 year solar cycle for monthly averaged data during NH winter (November-April) for all models plotted individually. Data used are from ∼1850-2005. Stippled areas show the response is significant at the 95% level. The black line shows the zero wind contour. Figure S5. The MSLP response in winter (DJF) to the 11 year solar cycle for each model plotted individually. The different rows show the MSLP lagged for 0-3 years after solar maximum. Stippled areas show the response is significant at the 95% level. Figure S6. The surface temperature response in winter (DJF) to the 11 year solar cycle for each model plotted individually. The different rows show the surface temperature lagged for 0-3 years after solar maximum. Stippled areas show the response is significant at the 95% level.