The biophysics, ecology, and biogeochemistry of functionally diverse, vertically and horizontally heterogeneous ecosystems: the Ecosystem Demography model, version 2.2 – Part 2: Model evaluation for tropical South America

The Ecosystem Demography model version 2.2 (ED-2.2) is a terrestrial biosphere model that simulates the biophysical, ecological, and biogeochemical dynamics of vertically and horizontally heterogeneous terrestrial ecosystems. In a companion paper (Longo et al., 2019a), we described how the model solves the energy, water, and carbon cycles, and verified the high degree of conservation of these properties in long-term simulations that include long-term (multi-decadal) vegetation dynamics. Here, we present a detailed assessment of the model’s ability to represent multiple processes associated with the biophysical and biogeochemical cycles in Amazon forests. We use multiple measurements from eddy covariance towers, forest inventory plots, and regional remote-sensing products to assess the model’s ability to represent biophysical, physiological, and ecological processes at multiple timescales, ranging from subdaily to century long. The ED-2.2 model accurately describes the vertical distribution of light, water fluxes, and the storage of water, energy, and carbon in the canopy air space, the regional distribution of biomass in tropical South America, and the variability of biomass as a function of environmental drivers. In addition, ED-2.2 qualitatively captures sevPublished by Copernicus Publications on behalf of the European Geosciences Union. 4348 M. Longo et al.: Biophysical and biogeochemical cycles in ED-2.2 – Part 2 eral emergent properties of the ecosystem found in observations, specifically observed relationships between aboveground biomass, mortality rates, and wood density; however, the slopes of these relationships were not accurately captured. We also identified several limitations, including the model’s tendency to overestimate the magnitude and seasonality of heterotrophic respiration and to overestimate growth rates in a nutrient-poor tropical site. The evaluation presented here highlights the potential of incorporating structural and functional heterogeneity within biomes in Earth system models (ESMs) and to realistically represent their impacts on energy, water, and carbon cycles. We also identify several priorities for further model development.

Instead, we focused on thoroughly evaluating the model in the Amazon, using two data-rich sites with decadal-long series of measurements that quantify several water, energy, and carbon components fluxes and storage terms predicted by ED-2.2. We also appraised the model ability to represent both the regional distribution of biomass and forest structure and the mechanisms that drive the variability in carbon stocks and structure, by comparing the model results with independent field measurements 5 and remote-sensing estimates. Together, these analyses aim to verify the model consistency and the potential for the current model to be applied in both short-term and long-term studies.

Assessment of short-term fluxes
For most evaluations of biophysical and biogeochemical cycles, we ran ED-2.2 for two sites in the Amazon where both eddy 10 flux towers and forest inventories were available for a long period: the Guyaflux tower (5 • 17 N; 52 • 55 W) at Paracou, French Guiana (GYF; Bonal et al., 2008), and the Tapajos National Forest site (2 • 51 S; 54 • 58 W), located in central Amazon (TNF;Hutyra et al., 2008;Pyle et al., 2008). Both data sets underwent multiple-stage quality control; in addition, variables used as input for ED-2.2 were gap filled, following Longo (2014). Net ecosystem productivity (NEE) was processed using the approach by Hayek et al. (2018), which corrects flux bias due to lack of turbulence and turbulent-independent divergence of 15 CO 2 .
To ensure that model and observations at or near eddy covariance flux towers could be directly compared, and that the observed signal was strongly related to actual environment conditions, we aggregated the model results to polygon level hourly averages, and only used the model output for the time period when the each variable of interest was measured. Gross primary productivity (GPP) and ecosystem respiration (Ṙ Eco ) are not measured but statistically modeled, therefore we compared all 20 times in which the net ecosystem productivity (NEE) could be estimated from tower observations. We also required that the 24-hour period preceding any given time had less than 24 gap filled values among all seven driver variables.
To evaluate the in-canopy radiation profile, we compared model results against measured profiles of photosynthetically active radiation at two sites in the Brazilian Amazon: Jaru Biological Reserve (RJA: 10 • 05 S; 61 • 56 W) near Ji-Paraná, and Adolpho meteorological variables needed by ED-2.2 during the period when the radiation profile data were collected. Also, because the diurnal cycle of any point measurement within the canopy depends on local heterogeneities can be dramatically affected by the Sun's azimuth and zenith angles (e.g. sun flecks when the sensor is aligned with an opening in the canopy), we only used the average daily radiation relative to the top of canopy to compare with the model results. The tree area index was estimated from published data at RJA (Simon et al., 2005) and near MDK (McWilliam et al., 1993).

Evaluation of long-term dynamics
To evaluate the model ability to represent the long-term dynamics, we carried out multiple simulations intended to test the model's ability to describe regional variability as well as the structural and functional diversity of ecosytems in tropical South America. First, to assess the model ability to represent the biome distributions in regional scale, we ran ED-2.2 starting from near-bare ground conditions and carried out a 500-year simulation across the Amazon ecoregion. We then resumed the simu- 10 lation in 1900, applying anthropogenic disturbance and ran the model until 2002, using a combination of land use transition matrices from Hurtt et al. (2006), nudged to match the initial conditions from Soares-Filho et al. (2006) in the Amazon. We initialized soils with texture obtained from Quesada et al. (2011) for the Amazon, RADAMBRASIL (de Negreiros et al., 2009) for non-Amazonian areas of Brazil, and IGBP (Tempel et al., 1996) for non-Amazonian areas elsewhere, and the meteorological forcing from the Princeton University Global Meteorological Forcing Dataset (PGMF, Sheffield et al., 2006) for 1969 to 15 2008, which was recycled multiple times to simulate a period equivalent to 1500 through 2002.
To estimate the model sensitivity to light and water availability across the region, we used the annual average downwelling shortwave irradiance from the Clouds and the Earth's Radiant Energy System's Energy Balanced And Filled product (CERES-EBAF; Kato et al., 2013) 20 between 1998 and 2017. For maximum cumulative water deficit (MCWD, mm), we assumed a constant monthly evapotranspiration (ET 0 = 100 mm mo −1 ) and monthly precipitation (P , mm month −1 ) from TMPA-3B43, following Malhi et al. (2009a). For any month t, we defined: where ∆t = 1 month. The maximum of 1200 mm was imposed to avoid run-away water deficit at the most arid regions, 25 where precipitation is never sufficient to bring water deficit back to zero because of the high baseline evapotranspiration. We reprojected the estimates of shortwave irradiance, precipitation, and water deficit to the same grid as ED-2.2 using spatial averaging. For each environmental variable, we divided the grid cells into 20 quantile-based groups (0 − 0.05; 0.05 − 0.10; . . . ; 0.95 − 1), and obtained the average and the 90% quantile range within each bin for the ED-2.2 model and the three remotesensing estimates of aboveground biomass. To evaluate the model's ability to predict emergent properties, we used published 30 values of biomass and mortality obtained from the RAINFOR field inventory network in the Amazon (Phillips et al., 2004;Baker et al., 2004a, b) and the results from long-term simulations near the field inventory sites (Levine et al., 2016).      and 19% lower than field estimates, respectively. As a result, the total autotrophic respiration at GYF is within one standard error from the expected rate based on the bottom-up analysis. In TNF, the reference leaf and root respiration are nearly half the magnitude for GYF, and as a result, ED-2.2 autotrophic respiration is 57% higher than the estimates by Malhi et al. (2009b).

Regional patterns of biomass
The model correctly predicts the extension of the Amazon forest (Fig. 7a), and it also represents the regional distribution of 5 aboveground biomass within the Amazon biome compared to regional biomass maps from Saatchi et al. (2011), Baccini et al. (2012) and Avitabile et al. (2016), the latter being based on the other two maps. For example, ED-2.2 predicts higher aboveground biomass in the Guiana Shield, similar to estimates from Saatchi et al. (2011) and Avitabile et al. (2016) (Fig. 7b,d), the higher biomass near the border between Brazil, Peru, and Colombia, similar to Baccini et al. (2012) and Avitabile et al. (2016) ( Fig. 7c,d), and the low biomass, open savanna area near the Brazil-Guyana-Venezuela border. The model generally predicts higher biomass than the three remote-sensing maps for most of the Amazon south of the Guiana Shield, particularly in the 5 Western part of the Amazon (Fig. S3b-d), resulting in a peak in the distribution of biomass at 16.5 kgC m −2 , whereas the highbiomass peaks ranged between 11.5 − 14.0 kgC m −2 for the remotely sensed estimates of aboveground biomass (Fig. 8a). The a Observed values for GYF are summarized in Supplement S1. Estimates were based on the approach described by Malhi et al. (2009b, c).
b Observed values for TNF are from Malhi et al. (2009b, c) and references therein.
c ED-2.2 does not have a separate coarse woody debris pool, therefore we compared the sum of both.
ED-2.2 model and the remote-sensing estimates consistently predict relatively lower density function for intermediate values of biomass, and a pronounced peak of low biomass, even though the low-biomass peak predicted by ED-2.2 (0.65 kgC m −2 ) is lower than the remote-sensing estimates (1.0 − 3.4 kgC, m −2 , Fig. 8a). The shift in the low-biomass peak is mostly driven by ED-2.2 predictions of biomass in the savannas and xeric shrublands of Eastern Brazil, which were consistently lower than the remote-sensing estimates (Fig. S3). We also compared the results of leaf area index (LAI) with estimates from the Moderate 5 Resolution Imaging Spectroradiometer (MODIS, product MCD15A2H, Collection 6) (Yan et al., 2016) and found that ED-2.2 predicted a similar extent of LAI over the Amazon region (Fig. S4). However, ED-2.2 predicted lower LAI than MODIS for most of the Amazon, in particular along the arc of deforestation, and higher LAI in northwestern Colombia and Central Brazil (Fig. S4c).
The predicted spatial variability of total carbon stocks in the region emerged from variation in the environmental conditions 10 such as variability in available light and water (Fig. 8). Both ED-2.2 and the three remote-sensing biomass consistently showed the highest average biomass (11.7−15.1 kgC m −2 ) at 195 W m −2 , and the sharpest decline as a function of increased irradiance (0.53 − 0.77 kg C W −1 ) near 225 W m −2 (Fig. 8b). Similarly, the relationship between annual precipitation and above-ground biomass was consistent between model and remote-sensing estimates, with the highest changes in average biomass by increase in annual rainfall occurring between 1500 and 2200 mm yr −1 , and relatively stable values of above-ground biomass above 15 2500 mm (Fig. 8c). The increasing dry season severity, summarized by MCWD, has a strong association with decreasing average aboveground biomass in both the model and the remote-sensing estimates when the annual MCWD is less than 500 mm.
The strongest declines in average biomass occurred at mean annual MCWD between 300 − 350 mm in both ED-2.2 and the remote-sensing estimates (Fig. 8d). It must be noted, however, that the transition between high biomass and low biomass when MCWD 300 − 350 mm is substantially more pronounced in ED-2.2 (−0.075kgC m −2 mm −1 ) than in the remote-sensing estimates (between −0.041 and −0.050 kgC m −2 mm −1 ) (Fig. 8d).

Assessment of forest function and structure
In addition to the total carbon stocks, the regional variability in forest function and composition at steady state is generally well characterized by ED-2.2. First, the range and the variability of biomass across the network is generally well characterized, 5 with the exception of the driest sites located in Bolivia (red dots in Fig. 9a), where the model predicts less biomass than observations because of frequent fires predicted in the model. Also, both the model and the field measurements show similar negative correlation between biomass and mortality rates (Fig. 9b), and a similar positive correlation between biomass and the mean wood density (Fig. 9c), albeit significant differences exist in biomass for any given value of mortality or wood density (Fig. 9b,c). Both wood density and mortality rates are related to the abundance of pioneers or late successional individuals both 10 in the model (Moorcroft et al., 2001) and in observations (e.g. Chave et al., 2009;Kraft et al., 2010), suggesting that the model characterizes the variability in forest composition within the Amazon region.
The model captured the general distribution slope of stem demographic density (abundance) at GYF (Fig. 10a), and both the total basal area and the typical distribution of basal area for individuals with DBH between 20 and 50 cm (Fig. 10b), although the model predicted a lower contribution from trees with DBH < 20 cm to both abundance and basal area, and a higher contribution of individuals with DBH > 50 cm to basal area (Fig. 10a,b). Model comparison with TNF data also showed that the model reproduced the main characteristics of the forest structure, although the slope of abundance as a function of size was steeper in the model compared to observations (Fig. 10c). Basal area structure, on the other hand, showed good agreement with field inventory data and total basal area was within 5 % of the observed basal area. Total mortality rates were generally higher in ED-2.2 simulations than based on the observed rates at both GYF and TNF ( Fig. 11a,b), the model shows little variability in mortality between different census intervals. Most of the modeled mortality was due to background mortality, which is a combination of density-independent factors such as aging and treefall mortality, both assumed time-invariant in the model (Moorcroft et al., 2001). While the mortality due to environmental constrains (density-dependent) showed interannual variability, its magnitude was small, never exceeding 0.5% AGB yr −1 . The high mor-5 tality rates in the model is mostly attributable to early-successional trees, for which the modeled mortality rates was near 7.5 % AGB yr −1 , or 5-fold higher than late-successional, whereas observations were typically 3.0 % AGB yr −1 , or 3 times higher than late-successional. Likewise, growth rates were also higher than in observations, particularly at GYF (Fig. 11c), whereas the values were closer to observations for most of the simulated period at TNF (Fig. 11d). Growth rates of both sites are significantly different, which could be related to the particularly nutrient-poor soils at GYF (e.g, Baraloto et al., 2005), and that 10 the ED-2.2 does not account for nutrient limitation. Alternatively, higher growth rates may be due to tree allometry and low allocation to living tissues and contributing to high accumulation rates on structural tissues.

Water and energy fluxes
The comparisons with eddy covariance towers demonstrated the ability of ED-2.2 to simulate the magnitude and seasonality 15 of both the evapotranspiration fluxes and the water storage in the canopy air space ( Fig. 4;S1). The good agreement of evapotranspiration in the tropical sites using ED-2.2 contrasts with previous assessments using ED-2.1 for temperate sites, which found significant negative biases in simulated evapotranspiration and attributed the bias to the model overestimating the im- pacts of water stress on stomatal conductance (Matheny et al., 2014;Walker et al., 2014). The average ratio between canopy evaporation and total rainfall ranged from 7 − 11% at the two forest sites tested here (Guyaflux and Tapajos), which is at the low end but in the same range of values founds in previous studies (9 − 20%; Tobón Marin et al., 2000, and references therein).
The hydrological cycle, on the other hand, showed some important deviations in absolute value, particularly near the surface, despite being consistent with observation in relative terms (Fig. S6). Large biases in absolute soil moisture were caused by the 5 mismatches in the soil hydraulic properties that ultimately define the residual moisture and wilting point, field capacity, and porosity, thence the range of possible values of soil moisture. Soil hydraulic properties were derived from texture characteristics previously published for all sites. In ED-2.2 these properties were simplified to a single fraction assumed constant at every patch and throughout the profile, whereas in reality soil properties are known to vary significantly within the same area and  Figure 11. Comparison of (a,b) mortality rates and (c,d) growth rates obtained from simulations and forest inventories for the (a,c) GYF and (b,d) TNF sites. Vertical lines are the approximate times of forest inventory surveys, and bands associated with observations correspond to the 95% confidence interval, obtained from bootstrap (see Longo, 2014, for further details), and bands associated with model results are the range of simulations with different soil texture, leaf phenology, and initial time. To be consistent with the field plot protocol, only those cohorts with DBH ≥ 10 cm (diameter at breast height) were included in the model estimates. with depth (e.g. Epron et al., 2006). Moreover, soils in ED-2.2 are assumed to be mineral, whereas in reality macropores and soil organic content can substantially affect such properties (Saxton and Rawls, 2006;Fisher et al., 2008).
The model also realistically represented both the net absorption of visible irradiance and the vertical distribution throughout the canopy. The model showed similar magnitude and seasonality of the outgoing shortwave radiation, including photosynthetically active radiation, at the two long-term sites (Fig. 1), and the average light level profiles within the canopy. Agreement was 5 even higher during cloudy conditions (Fig. 2), when the spatial distribution of individual trees has less of an impact on local variability in the light profile (Mercado et al., 2009). Differences were more significant for photosynthetically active radiation than for total shortwave radiation, particularly during the dry season (Fig. 1c,d). These differences may result from two factors.
First, leaf optical properties in ED-2.2 are assumed constant for any given PFT, whereas observations indicate that leaf reflectivity depends on leaf age (Toomey et al., 2009;Chavana-Bryant et al., 2017). Second, ED-2.2 represents canopy structure in only one dimension for each patch, the effect of neighboring trees (or their absence) is not represented. A full three-dimensional approach similar to Morton et al. (2016) may not be feasible within the ED-2.2 because of the intensive computational burden and that the model does not represent the actual position of individual trees. Alternatively, the perfect plasticity approach (Purves et al., 2008;Farrior et al., 2013), in which finite-crown individual trees are arranged to maximize light access, has been recently adapted to another cohort-based model, the Functionally-Assembled Terrestrial Ecosystem Simulator (FATES, Fisher 5 et al., 2018). The perfect plasticity approach has the advantage of allowing trees of similar size to experience the same light levels (as opposed to the sequential light interception in ED-2.2), which could contribute to improve the light extinction profile in closed canopy forests. Importantly, while the current representation of vertical light distribution in ED-2.2 may affect light availability of individual cohorts, it does not imply that the simulated understory in ED-2.2 is excessively dark. In fact, when we compared the modeled and observed vertical structure of diffuse light (which is less sensitive to the observed position of 10 trees than direct light) we found that the model slightly overestimates understory light levels (Fig. 2).
The model predictions of canopy air space temperature at TNF generally matches observations well, except for a slight overestimation during the dry season at TNF (Fig. S1). However, the sensible heat flux also tends to be higher than tower estimates, particularly at the drier TNF site (Fig. 3). The better agreement of simulated water fluxes with observations relative to sensible heat flux is very common among land surface models (Best et al., 2015). One possible cause of this disagreement is 15 that observations from eddy flux towers typically contain significant sources and sinks from lateral advection and air drainage that may result in departures from energy closure by as much as 30% (Tóta et al., 2008;da Rocha et al., 2009;Leuning et al., 2012;Stoy et al., 2013). Energy conservation is a requirement in ED-2.2 and conservation of energy is consistently checked every model time step . However, ED-2.2 does not represent lateral advection or air drainage, and may compensate such losses through an increase in eddy fluxes. Moreover, Haughton et al. (2016) suggested that parameterization 20 problems, and not energy conservation, are the most likely cause for biases in land surface models. In ED-2.2, one possible issue is that the heat capacity of branches and leaves could be biased, allowing greater variability of temperature and higher sensible heat fluxes at the expense of reduced storage. To our knowledge, no long-term measurements of leaf or branch temperature exist for the Amazon sites, but differences in outgoing thermal-infrared irradiance (Fig. S2) suggest that the observed vegetation temperature during the afternoon may be lower and less variable than the model predictions. Additional measurements of leaf 25 and wood heat capacity for tropical forests could improve the accuracy of leaf and branch temperatures in the model.

Carbon fluxes and carbon storage
Comparisons of GPP between the ED-2.2 model and tower-based estimates show that the model captures both the magnitude of GPP and the typical GPP response to light (Fig. 5). In ED-2.2, the seasonality at tropical forest sites is mostly driven by light, thus the maximum productivity often occurs during the dry season (Fig. 5a,b). Estimates of GPP based on eddy 30 covariance towers, however, suggest a more complex pattern, with minimum occurring either during the wet season (GYF) or the transition from wet to dry season (TNF), and modestly increasing productivity during the dry season (Fig. 5a,b; see also Bonal et al., 2008;Restrepo-Coupe et al., 2013. GPP depends on multiple processes, and thus differences between ED-2.2 and tower estimates may be due to several factors including biases in the seasonality of leaf area and photosynthetic properties of leaves. In particular, empirical studies have shown that the seasonality of GPP in Amazon forests is linked to seasonal variation of photosynthetic capacity and leaf phenology (Wu et al., 2016;Restrepo-Coupe et al., 2017), whereas in the current implementation of ED-2.2 both the leaf turnover rate and the maximum photosynthetic capacity are assumed, for simplicity, to be constant. Work from several modeling studies (Kim et al., 2012;De Weirdt et al., 2012;Medvigy et al., 2013), and a leaf-level carbon optimization model (Xu et al., 2017) have shown that incorporating such seasonality into the dynamics 5 of leaf longevity and photosynthetic capacity significantly improves predictions of the seasonality of carbon fluxes in tropical forests. However, data from multiple sites may be needed to develop a generalizable phenology model for evergreen forests.
The model estimates of ecosystem respiration were generally higher and more seasonal than the expected values either from eddy covariance tower estimates or from a bottom-up assessment, especially at GYF ( Fig. 6; Table 1). In the model, the seasonality of ecosystem respiration was nearly exclusively driven by the seasonality in heterotrophic respiration, with a 10 significant decline in the dry season due to lower soil moisture (Fig. 6c,d), whereas eddy covariance tower estimates suggest a dry-season reduction in respiration only at TNF (Fig. 6a, Restrepo-Coupe et al., 2017;Aguilos et al., 2018). The ED-2.2 assumption that heterotrophic respiration at lower soil moisture  is consistent with observations for total respiration at GYF (e.g. Fig. 3 of Aguilos et al., 2018); however, the magnitude of the heterotrophic response to moisture is likely overestimated in the ED-2.2 model. 15 Total autotrophic respiration estimates from ED-2.2 were within range with independent bottom-up estimates for both sites , albeit only marginally within the 95%-confidence level (2 standard errors) of the bottom-up estimates at TNF (Table 1). At both sites, the simulated autotrophic respiration was driven by leaf and stem respiration, which contributed with roughly the same proportion to autotrophic respiration. In contrast, the bottom-up estimates at both sites suggest leaf respiration was 2-3 fold higher than stem respiration (Table 1). One possibility is that the allocation of carbon gains to tissue growth was overestimated 20 in ED-2, which is consistent with the overestimated growth rates compared to forest inventory plots (Fig. 11). It must be noted, however, that several terms estimated from observations also carry large uncertainties and assumptions. For example, stem respiration is typically measured near the surface (e.g Chambers et al., 2004;Stahl, 2010), which may introduce biases given that branches may have significantly higher respiration rates (Cavaleri et al., 2006). Furthermore, observed differences of expected values between sites are generally much larger than the differences obtained by the model, and such differences 25 may reflect true differences of plant community functioning between sites, or differences in sampling and techniques (Malhi et al., 2009b). Improved measurements of the different terms of ecosystem respiration would allow for improved constrains on the individual processes driving the total respiration.
In addition to tropical sites, the model's ability to represent productivity and respiration has been previously assessed for temperate ecosystems. The model showed excellent agreement in magnitude and seasonality of both net ecosystem productivity 30 (NEP) and gross primary productivity (GPP) at three flux tower sites installed at Harvard Forest, in particular when the model was initialized with ground-based or remote-sensing-based data (Antonarakis et al., 2014). In contrast, the model showed significant biases in net primary productive under ambient CO 2 at two Free-air CO 2 Enrichment (FACE) sites in Southeastern United States, overestimating NPP at Duke (evergreen forest) and underestimating at Oak Ridge (deciduous forest) (Walker et al., 2014). Also, a previous model-intercomparison study for eleven North American tower sites also revealed significant negative biases in ED-2.1, although the model inter-annual variability of GPP and ecosystem respiration were within the observed range for both deciduous and evergreen sites (Keenan et al., 2012). In addition, a wavelet analysis of the normalized error across 9 tower sites across North America suggested that the errors in net ecosystem exchange (NEE) are dominated by sub-annual (but longer than daily) time scales (Dietze et al., 2011). Together, these results indicate the need of quantifying which processes and parameters contribute the most to model uncertainties in order to improve the model predictions using 5 ED-2.2, which is currently being pursued (Fer et al., 2018;Raczka et al., 2018).
Finally, because ED-2.2 solves the carbon dioxide cycles at sub-hourly scale, it also accounts for changes in storage in the canopy air space. As described in the companion paper , in ED-2.2, canopy air space storage is accounted for energy, water, and carbon dioxide. While changes in storage are generally small in the seasonal or multi-annual scale ( Fig. S5a,b), they may are not negligible in the sub-daily scale. The relevance of the storage of CO 2 in the canopy air space has 10 been long recognized by the eddy covariance community (e.g. Goulden et al., 1996;Bonal et al., 2008;Hayek et al., 2018), but only rarely included in biosphere or land surface models. For example, the strong release of carbon dioxide in the early morning hours, resulting from the nighttime accumulation of respired CO 2 , is well characterized by the model at both test sites (Fig. S5c,d). Accounting for this time lag between biologically-driven emission or uptake and the emissions to the free atmosphere is particularly important for benchmarking the model with the upcoming column-integrated measurements of CO 2

Long-term, large-scale ecosystem dynamics
A key feature of the ED-2.2 model is the emergence of long-term, large-scale ecosystem composition, structure and function from spatially-localized, height-structured competition between individuals within the plant canopy. As seen in Fig. 8, the 20 model's large-scale predictions are consistent with remote-sensing estimates of how AGB variability along key climatological gradients of incoming solar irradiance, rainfall, and dry-season severity. In addition, the regional pattern of AGB predicted by ED-2.2 reproduces several notable features present in the remotely-sensed based estimates of AGB (Fig. 7). In particular, the model captures the spatial extent of the Amazon forest, and reproduces two characteristic patterns of spatial variability in forest biomass, namely (i) the high biomass forests found in the Guiana Shield (near GYF) and in the area south of the TNF 25 flux tower, and (ii) the area of lower biomass forest that runs east-to-west spanning the TNF and MDK sites.
The model's predictions of regional AGB also reveal important discrepancies. First, the model estimates are generally lower than the remote-sensing estimates of AGB in the drier savannas and xeric shrublands of central and Northeastern Brazil ( Fig. 7-8). The low biomass estimates in drier regions is likely related to the simplified fire model used in the regional simulations.
Following Moorcroft et al. (2001), fire occurrence within each climatological grid cell is controlled by a simple fixed soil 30 moisture threshold, the area burned per year increases linearly as a function of the mean AGB within each grid cell, and no plants survive burn events. In reality, as previous work has shown (e.g. Cardoso et al., 2003;Cochrane, 2003;Andela et al., 2016), fire frequency, burn area, and fire severity are also strongly influenced by environmental factors in addition to soil moisture, such as proximity to roads and deforested areas. Moreover, the model does not account for size-related differences in fire survivorship and plant-functional diversity-related differences in fire survivorship arising from variation in plant traits such as bark-thickness-and re-sprouting ability (Brando et al., 2012;Trugman et al., 2018). Furthermore, the model simulations do not include plant functional types with adaptations for the semi-arid conditions typically observed in Northeastern Brazil.
Such adaptations include smaller leaf size; internal water storage; modular, independent and redundant vascular systems; germination synchronized with rainfall; and Crassulacean Acid Metabolism (CAM) photosynthetic pathway (Cushman, 2001;5 Schenk et al., 2008;De Micco and Aronne, 2012). Incorporating these mechanisms that drive the ecosystem dynamics in drier areas could significant improve the model predictions outside tropical forests.
In the western Amazon, the model's predictions of AGB are generally higher than all three remote-sensing estimates, implying that the model is over-predicting AGB in this region (Fig. 7). ED-2.2 also tends to overestimate the high-biomass peak of the regional distribution of biomass (Fig. 8a). One potential reason for the AGB over-estimation in the Western Amazon is the 10 model's predicted dominance of late-successional plants over most of the Amazon region, whereas field observations indicate that forests in Western Amazon have higher stem turnover rates and lower wood densities than Eastern Amazon (Phillips et al., 2004;ter Steege et al., 2006). This has been linked to the fact that soils in Western Amazon have higher nutrient availability (Quesada et al., 2012), which was not accounted for in ED-2.2. Finally, the regional simulation did not account for all types of anthropogenic disturbances from tropical forest degradation, which could also explain part of the overestimation by ED-2.2 15 compared to the remote-sensing estimates. Tropical forest degradation through selective logging, mining and low-intensity understory fires in the Amazon is pervasive along the arc of deforestation (e.g. Morton et al., 2013;Asner et al., 2013;Tyukavina et al., 2017) and is known to significantly deplete aboveground biomass (Berenguer et al., 2014;Longo et al., 2016;Rappaport et al., 2018).
The regional model simulation also qualitatively captures two disturbance-mediated relationships between canopy AGB and 20 other tropical forest attributes observed in plot measurements ( Fig. 9): the negative correlation between AGB and average stem mortality rates (Phillips et al., 2004;Johnson et al., 2016), and the positive correlation between AGB and average wood density found by Baker et al. (2004b). The fact that the model qualitatively captures the directional trend of both these two relationships is encouraging and suggests that the model's predictions of regional biomass trends are arising from mechanisms similar to those observed in nature. However, the magnitudes of the predicted relationships differ from the observations: for a given value 25 of AGB, the model predicts a higher mean stem mortality rate and a higher mean wood density value than is observed in the plot measurements. The reasons for the differences in the magntidues of these relationships are, at present, unclear. However, in the case of the mean mortality-AGB relationship, the mismatch is likely related to the over-prediction of mortality rates seen in ED-2.2 (Fig. 9).
In this manuscript, we focused on assessing the model's ability to represent the dynamics of tropical forests, but previous changes in gross and net primary productivity (NPP) as functions of stand age. This result was consistent with a previous model-intercomparison study, which found good agreement on observed and ED-2.1 modeled CO 2 fertilization effect at Duke, whereas the model predicted response to elevated CO 2 at Oak Ridge (broadleaf-dominated) was overestimated (De Kauwe et al., 2013). In contrast, in a millennium-long model inter-comparison study for Northeastern United States, the model overestimated both the magnitude of NPP and its variability as a function of rainfall and CO 2 when compared to tree-ring estimates 5 (Rollinson et al., 2017), which indicates the need of constraining the model response for environmental conditions outside the current range.

Conclusions
Results from both observations and experimental studies have shown that plant diversity is an important determinant of terrestrial ecosystem function and how terrestrial eco fsystems respond to environmental perturbation (Tilman, 1996;Gunderson, 10 2000; Cadotte et al., 2011;Mori et al., 2013;Hautier et al., 2015;Falster et al., 2017). Terrestrial ecosystem models have advanced significantly towards representing functional diversity over large regions over the past twenty years (Moorcroft et al., 2001;Medvigy and Moorcroft, 2012;Fisher et al., 2015Fisher et al., , 2018, however their ability to represent complex, heterogeneous communities also depends on their ability to represent the heterogeneity of the environments where plants live and compete for resources. The ED-2.2 model accounts for this fine-scale heterogeneity by solving the energy, water, and carbon cycles for 15 the different micro-environments in the plant community. The ED-2.2 model integrates biosphysical, ecological and biogeochemical terrestrial ecosystem processes of heterogeneous landscapes on timescales ranging from minutes to centuries and on spatial scales ranging from individual plants to continental scales. As we have shown in the companion paper (Longo et al., 2019), the model shows excellent conservation of energy, water, and carbon dioxide that is a necessary condition for the model application to longer time scales and for accurate coupling with atmospheric models. 20 The thorough evaluation of the model, including an assessment of the separate components of the energy, water, and carbon cycles demonstrated the model's ability to represent multiple biophysical and biogeochemical mechanisms. The model dynamics are consistent with observations in most short-term fluxes of shortwave radiation ( Fig. 1-2) and water ( Fig. 4;S1), and even though it showed significant overestimation in sensible heat flux, the model characterized the average and the seasonality of canopy air space temperature (S1). The model represented realistic magnitude and light response curve of gross primary 25 productivity (GPP), albeit it did not capture the GPP seasonality at TNF (Fig. 5). Respiration showed the highest disagreement, both in terms of magnitude and seasonality, reflecting the uncertainties in representing respiration processes ( Fig. 6; Table 1).
In addition to the short-term comparisons with eddy covariance towers, ED-2.2 also showed good agreement with independent estimates of aboveground biomass distribution at regional level in the tropics (Figs. 7-9) and the size distribution of the aggregated properties (Fig. 10), and reasonable magnitude of mortality rates and growth rates for TNF, an inland tropical forest site,

30
although it significantly overestimated growth rates at GYF, a particularly nutrient-poor site (Fig. 11).
As pointed out in the companion model description manuscript , the ED-2.2 model continues to be developed. The ED-2.2 model evaluation presented here highlighted some short-and long-term processes that should be regarded as priorities for future developments. Better constrains in vegetation heat capacity could improve the quantification of energy storage and reduce biases in outgoing long wave radiation and sensible heat flux. Likewise, ecosystem respiration showed significant departures in magnitude and seasonality from site-level estimates. Formal optimization of parameters that control respiration response to temperature and moisture, along with better description of the range of decomposition time scales may be required. Finally, the excessive tree growth rates identified in nutrient-poor sites could be addressed by expanding the repre-5 sentation of biogeochemical cycles in tropical forests by including nitrogen and phosphorus dynamics, which could significant improve the characterization of the carbon cycle.
Code and data availability. The ED-2.2 software and further developments are publicly available. The most up-to-date source code, post-