Abstract Cloud cover can influence numerous important ecological processes, including reproduction, growth, survival, and behavior, yet our assessment of its importance at the appropriate spatial scales has remained remarkably limited. If captured over a large extent yet at sufficiently fine spatial grain, cloud cover dynamics may provide key information for delineating a variety of habitat types and predicting species distributions. Here, we develop new near-global, fine-grain (≈1 km) monthly cloud frequencies from 15 y of twice-daily Moderate Resolution Imaging Spectroradiometer (MODIS) satellite images that expose spatiotemporal cloud cover dynamics of previously undocumented global complexity. We demonstrate that cloud cover varies strongly in its geographic heterogeneity and that the direct, observation-based nature of cloud-derived metrics can improve predictions of habitats, ecosystem, and species distributions with reduced spatial autocorrelation compared to commonly used interpolated climate data. These findings support the fundamental role of remote sensing as an effective lens through which to understand and globally monitor the fine-grain spatial variability of key biodiversity and ecosystem properties.

Author Summary Cloud cover affects many important ecological processes, including reproduction, growth, survival, and behavior. When quantified globally at high spatial resolution, cloud cover dynamics can provide key information for delineating a variety of habitat types and predicting species distributions. In this study, we develop a new near-global, fine-grain (≈1 km) dataset of monthly cloud frequencies from 15 y of twice-daily satellite images. The new data reveal cloud cover dynamics at unprecedented spatial resolution. We show that the direct, observation-based nature of cloud-derived metrics can improve predictions of habitats, ecosystem, and species distributions with reduced spatial autocorrelation compared to commonly used interpolated climate data. These findings support the fundamental role of remote sensing as an effective lens through which to understand and globally monitor the fine-grain spatial variability of key biodiversity and ecosystem properties. Applications of these new data extend beyond ecology to validation of global climate models, economic applications in solar energy, tourism, and resource planning. With climate and land-use changes expected to perturb the geography of these conditions and ecological connections, standardized satellite-based observation of cloud cover may represent a key avenue for monitoring the health of biodiversity and ecosystems into the future.

Citation: Wilson AM, Jetz W (2016) Remotely Sensed High-Resolution Global Cloud Dynamics for Predicting Ecosystem and Biodiversity Distributions. PLoS Biol 14(3): e1002415. https://doi.org/10.1371/journal.pbio.1002415 Academic Editor: Michel Loreau, Centre National de la Recherche Scientifique, FRANCE Received: January 13, 2016; Accepted: February 24, 2016; Published: March 31, 2016 Copyright: © 2016 Wilson, Jetz. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Cloud climatologies are available at http://www.earthenv.org/cloud. Additional data and related files are available at http://dx.doi.org/10.6084/m9.figshare.1531955. Code associated with cloud data processing, distribution modeling, and figures is available at https://github.com/adammwilson/Cloud. Funding: National Science Foundation (NSF.gov) to WJ: DBI 1262600, DEB 1026764, and DEB 1441737. National Aeronautics and Space Administration (nasa.gov) to WJ: NASA NNX11AP72G. Yale Climate and Energy Institute postdoctoral fellowship to AW. Publication costs were covered in part by the Julian Park Fund at the University at Buffalo College of Arts of Sciences. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Abbreviations: AUC, area under the receiver operating characteristic curve; DIC, Deviance Information Criterion; MODIS, Moderate Resolution Imaging Spectroradiometer

Introduction Advanced spatial assessment and monitoring of biodiversity in today’s rapidly changing world is vital for managing future biological resources and a key element of several 2020 targets of the Convention on Biological Diversity [1,2] and the Intergovernmental Platform on Biodiversity and Ecosystem Services [3]. Growing evidence highlights the importance of fine-grain (≤1 km) climatic and environmental variability in driving the spatial distribution and abundance of organisms [4] and the need to correctly capture this variation globally [5,6]. Ecological research at regional to global extents remains reliant on environmental information that lacks important detail and is often interpolated between ground stations over vast distances of highly variable terrain [5,7]. Cloud cover and precipitation are prime examples of important environmental factors that can have significant spatial variability at grains lower than 2 km [8] and are particularly difficult to interpolate [9]. Cloud cover influences processes ranging from reproductive success in reptiles [10] to leaf wetness [11], CO 2 uptake [12], and the geographic distribution of plants [13]. Especially in the tropics, seasonal variability of cloud cover is typically more important than day length and solar angle in reducing available solar irradiance, with multi-fold ecological consequences [14]. These effects are difficult to observe in other remotely sensed products including vegetation indices, which for many parts of the world do not show much change throughout the year [15]. For example, [16] reported that persistent cloud cover on Santa Cruz Island (California, United States) reduced annual drought stress in bishop pine (Pinus muricata) by 22%–44% compared to less cloudy areas further inland. In other work, [10] experimentally altered available radiation to simulate increased cloud cover and found it lowered maternal pregnancy success and slowed growth rates of female McCann’s skinks (Oligosoma maccanni). Furthermore, cloud frequency can be a better predictor than interpolated precipitation for species distributions [13]. However, most studies incorporating cloud observations have very limited spatial extents and required either local cloud observations or extensive processing of satellite observations. Fortunately, several decades of satellite data now offer new opportunities to characterize our planet by providing data globally with consistent methodology and, critically, spatially contiguous observations at high spatial resolution. To date, the complex processing paths required for extracting fine-grain data from existing cloud products and the coarse spatial resolution of existing climatologies have made it difficult to access and account for cloud dynamics in ecological and biodiversity models. For example, recent work on the ecology of cloud forests [11] required processing over 14,000 daily ungridded satellite images simply to compare cloud frequency at two locations that were 2 km apart. The current alternative to this approach is to rely on available cloud climatologies, which typically have very coarse spatial resolution. A recent systematic review of satellite-derived cloud climatologies [17] and all Moderate Resolution Imaging Spectroradiometer (MODIS) level three atmosphere products are summarized at 1° (≈110 km) resolution (S1 Table). While this grain may be appropriate for study of global cloud dynamics (and necessary for cross-platform comparison), it is far too coarse to capture fine-grain variability that is important in many ecological questions [11]. Furthermore, cloud dynamics are particularly difficult to adequately parameterize in climate models [18] and, thus, the quality of modeled fine-grain cloud products are questionable. There are a few examples of finer-grain climatologies based on other sensors, such as HIRS (≈20 km) [19], AVHRR PATMOS-x [20] (≈11 km), and GridSAT [21] (≈8 km), but these are eight to 20 times coarser than possible with MODIS observations (S1 Table). There have been several regional–national climatologies assembled at finer (≤1 km) resolution from MODIS data, and these generally perform well in comparison with station observations and other meteorological satellites (e.g., [22]). To date, there have been two efforts to produce large-domain, high-resolution (≤1 km) cloud climatologies from the MODIS archive. One is based on the MOD35 250 m visible cloud mask [23], but this is spatially bounded to the tropics and incorporates only seven years of data (2000–2006). Additionally, these data were derived from the problematic collection 5 MODIS (MOD35) cloud mask and, thus, contain significant land-cover and processing-path biases in cloud frequency [24]. The other MODIS-derived 1 km cloud climatology [25,26] avoids the problematic MOD35 algorithm through a simple cloud masking procedure based on scaled visible wavelength (RGB) images from the MODIS “Rapid Response” system [27]. Douglas et. al. developed an algorithm that applies a user-defined threshold to convert RGB “brightness” to “cloudiness.” However, the product is based on a derivative of surface reflectance data rescaled for visual appeal [27], is strongly dependent on the brightness threshold, and is problematic over high-albedo surfaces (such as urban areas or snow). Furthermore, this approach does not exploit more sophisticated tests used in most cloud detection algorithms such as cloud-top infrared temperature [17] and is only available for select regions around the globe. In this paper, we develop new fine-grain (≈1 km resolution) global cloud climatologies from the 15-year MODIS archive of twice-daily observations. We then validate the new layers using cloud observations collected at a global network of 5,388 weather stations since 1971. To illustrate the utility of the new global fine-grain cloud cover observations, we explore four biodiversity science applications: (1) Biome boundaries are typically characterized by multiple environmental attributes, but cloud-associated variables tend to be either statistically interpolated or not well captured in existing global datasets. We illustrate how the substantial fine-grain variation in cloud cover dynamics can help delineate sharp ecological transitions. (2) Climatic stability is associated with elevated biological endemism [28–30] but is difficult to map globally at high spatial resolution. We suggest that cloud cover is an important yet often overlooked variable and put forward a first map of cloud cover stability. (3) Some ecosystems and the species specialized on them are particularly affected by local cloud conditions. We demonstrate how the newly developed remote sensing products allow a delineation of cloud forest habitats worldwide in unprecedented detail and with the potential for continued monitoring. (4) Species distribution modeling is an important biogeographical tool, but is often reliant on interpolated climatic data. We show that incorporating remotely sensed cloud climatologies in species distribution models can improve predictive accuracy without inflating autocorrelation inherent in commonly used interpolated data.

Conclusion Remotely sensed information has the potential to revolutionize our understanding of spatial ecoclimatological patterns and processes through direct capture of environmental variation at fine spatial grain and global extent [50]. Here, we have shown how global cloud dynamics can be quantified in unprecedented spatial detail and that cloud-associated factors are significantly associated with the distribution of various aspects of biodiversity habitats over large spatial scales. In this study, we did not set out to test specific mechanistic connections between vegetation or organisms and clouds, which are reliant on information about other factors (e.g., solar radiation, relative humidity, and soil moisture) not yet attainable at this scale, detail, and reliability. Instead, we focused on tests and demonstrations of the key model-based applications in basic and applied ecology. But we note that the new dataset provides a novel perspective into the fine-grain spatial variability of an environmental factor known to be mechanistically important in several ecological processes [10–13,16,50]. This type of spatially consistent environmental dataset is critical for rigorous hypothesis testing in geographical ecology and biogeography and best-possible prediction and monitoring of habitats and species distributions [51]. Our study also demonstrated key shortcomings of typical interpolated climate surfaces, which can misrepresent biological spatial autocorrelation, leading to false estimates of spatial biodiversity patterns and overestimates of species range size. This illustrates the important role of standardized, long-term, and globally contiguous remote sensing to help close key knowledge gaps and facilitate monitoring of biodiversity [52]. Continued global capture of spatiotemporal cloud cover dynamics has the potential to track the associated often-sharp ecological transitions and the ecological constraints on species ranges they impose, including and beyond those due to precipitation. Applications of this type of information extend beyond ecology to validation of global climate models [18], economic applications in solar energy [53,54], tourism [55], and resource planning. With climate and land-use changes expected to perturb the geography of these conditions and ecological connections, standardized satellite-based observation of cloud cover may represent a key avenue for monitoring the health of biodiversity and ecosystems into the future.

Materials and Methods Calculation of Cloud Frequencies The MODIS MOD09 atmospherically corrected surface reflectance product includes an internal cloud mask in the PGE11 program that relies on two reflective and one thermal test [56–58]. The reflective tests include the shortwave and middle infrared data combined in the “middle infrared anomaly” index (MIRA = ρ 20,21 − 0.82ρ 7 + 0.32ρ 6 , where ρ indicates MODIS band number). The second test uses reflectance at 1.38 microns (1.38 mic = ρ 26 ). The MIRA and the 1.38 mic reflectance are designed to be complementary, with MIRA efficiently detecting low or high reflective clouds [56], while 1.38 mic effectively detects high (and potentially not very reflective) clouds. Additionally, a thermal test is used to identify pixels with high infrared reflectance anomalies (e.g., fires, sun-glint, and high albedo surfaces) with respect to near-surface (2 m) air temperature computed by the NCEP reanalysis model [59]. The MOD09 cloud algorithm was designed to minimize confusion over snow and ice by taking the surface air temperature into account. Like many cloud masks, the MOD09 detection algorithm has a binary response (cloudy/not cloudy) and does not retain an estimate of confidence in cloud state (i.e., probability that the pixel was actually cloudy given the tests). The other MODIS cloud mask (MOD35) converts the continuous probabilities into four bins (certainly clear, probably clear, probably cloudy, and confidently cloudy) and is available at the satellite swath level (which would avoid any sampling problems introduced by the orbital parameters and the MODLAND selection criteria). However, due to spatially heterogeneous application of cloud tests (including recently reprocessed Collection 6), the MOD35 mask is unsuitable for generating global and spatially consistent maps of cloud frequencies at 1 km resolution [24]. Liu and Liu introduced an interesting alternative method of estimating cloud cover based on multi-year time series of MOD09 surface reflectance, which is promising but currently based on the frequency of clouds between 8 d MODIS Land Team (MODLAND) composites and, thus, cannot estimate the true daily cloud frequency (e.g., a cloudy observation in a single 8 d MODLAND window indicates eight cloudy days, but a clear observation could indicate one to seven clear days) [60]. Other approaches have been developed to estimate continuous probabilities rather than binned classifications [61], but these have not yet been applied to MODIS data. We extracted the daily cloud flags from bit 10 of the daily daytime surface reflectance product “state 1 km” Scientific Data Set (SDS) from both the Terra (MOD09GA, collected at approximately 10:30 AM local time) and Aqua (MYD09GA, approximately 1:30 PM) satellites from February 2000 to March 2014 archives (approximately 260 TB of data, n days = 1 x 104, n pixels = 7.2 x 108, n observations = 7.4 x 1012). The time series of monthly cloud frequencies (proportion of days with a positive cloud flag) was calculated separately for the daily MOD09GA and MYD09GA data using the Google Earth Engine application programming interface (http://earthengine.google.org/). Removal of Albedo and Orbital Artifacts The MODIS orbit results in systematic gaps in the daily global coverage near the equator [62] that results in nearly longitudinal orbital artifacts (15° for Terra and 345° for Aqua) in the long-term cloud frequencies. To remove these features, we used the Variational Stationary Noise Removal approach (VSNR [63]), a Bayesian image restoration technique. The VSNR is well suited to remove these artifacts because it allows specification of the shape and scale of known artifacts. We used a gabor filter with y = 200, x = 5, and θ = 15 for Terra and θ = -15 for Aqua to minimize the orbital artifacts (S8 Fig). Inspection of the resulting monthly climatologies revealed anomalously high cloud frequencies over some areas with high maximum and intra-annual variability of albedo, such as the Kati Thanda–Lake Eyre (28.34°S, 137.26°E) in Australia and the Salar de Uyuni salt flats (20.27°S, 67.40°W) of Bolivia. Mean monthly albedo from the 1 km MODIS MCD43B3 product was used to identify and gap-fill in these problematic regions. Calculation of Seasonal Metrics The bias-corrected monthly climatologies for both sensors were then averaged by month and transformed to geographic coordinates at 30-arc-sec spatial resolution (≈1 km). Averaging cloud observations from both products was necessary to minimize scan line artifacts due to gaps in the satellite orbits. Let m index months (m∈1: 12) and y index years (y∈ 2000: 2014). The combined product represents monthly mean midday cloud frequencies (proportion of days within each month with a positive cloud flag including both Terra and Aqua observations). These monthly time series were summarized to “climatological” mean cloud frequencies and standard deviations: μ m = mean(CF m,y ) and σ m = SD(CF m,y ). The inter-annual variability was calculated as mean(σ m ) and intra-annual variability (seasonality) as SD(μ m ). For example, the monthly mean for January, μ 1 , represents the proportion of twice-daily observations (%) with a positive cloud flag out of all observations in January from both Terra and Aqua across all years 2000–2014. The inter-annual standard deviation for January, σ 1 , is the standard deviation of the monthly cloud frequencies observed in each of the 15 Januaries (2000–2014). Due to the algorithm’s use of tests based on reflectance data, the flag is only available for daytime scenes and, thus, high latitudes have no data during winter months (May–September in the Southern Hemisphere and November–February in the Northern Hemisphere). We also quantified the seasonality of cloud frequencies following [36] by considering mean monthly cloud frequencies as polar vectors with magnitude μ m and direction (month of year expressed in polar coordinates). The sum of the twelve vectors encapsulates both the seasonal concentration (magnitude) and direction (season of maximum cloudiness) for each pixel. Dividing the magnitude by the mean annual cloud frequency results in an index ranging from 0 (equal cloud cover throughout the year) to 100 (all observed clouds occurring in a single month). Validation The monthly CF were validated using a global observational dataset of synoptic weather reports collected at 5,388 stations [34] from 1971 to 2009 (S1 Fig). We extracted the mean “total cloud” amount for each month, which represents the mean proportion of the sky that was covered by all types of cloud during the observations in that month. Comparison of these observations to satellite data must take into account that the sampling radius of these observations (the visible sky) depends on cloud height, cloud thickness, the curvature of the earth, and other factors, but is typically much larger than a single 1 km MODIS pixel. To account for these factors, we took the mean monthly MODCF for a circle with a 16 km radius around each station location, effectively converting the temporal MODCF to mean cloud amount within the sample radius to make it comparable to the station observations [64]. The monthly MODCF (including MODIS data from 2000 to 2014) were compared to station observations using linear models over two time periods: (1) the MODIS era (2000–2009) and (2) the full station record (1970–2009). The MODIS era comparison evaluates the ability of the satellite observations to estimate cloud frequency at station locations during approximately the same time period, while the full station record assesses accuracy and relevance of the 14-year, satellite-derived data for estimating multi-decadal monthly climatologies. For the MODIS era comparison, we included only stations with at least 20 observations per month for the full ten-year period (2000–2009), so the number of stations available was reduced to 1,558 (S2 Table and S1 Fig). For the full record comparison, the station dataset was filtered to include only stations with at least 20 observations per month for at least 20 years, which retained 4,679 stations. Several countries (notably the US, Canada, and New Zealand) converted from human cloud observations to automated laser ceilometers over the past decade, leading to a decline in the number of observations from 1997 to 2009 [34]. The MODCF captures 78% of the variability (n = 17,021, RMSE = 7.99% cloud frequency, p < 0.001) in monthly mean cloud frequencies observed from the weather stations available with data between 2000 and 2009, with monthly relationships varying from R2 = 0.66 (n = 1,450, RMSE = 8.30%) in May to R2 = 0.84 (n = 1,391, RMSE = 7.33%) in November (S2 Fig and S2 Table). The MODCF data is nearly as accurate (R2 = 0.74, n = 53,678, p < 0.001) over the full station record as for the MODIS-era (2000–2009, the period with available validation data) alone. The spatial distribution of residuals from a linear model between the new MODCF and all station observations (S3 Fig) was used to explore spatial variability of potential biases in the cloud product. While it is tempting to think of the station data as “truth” with which to evaluate the satellite product, the station observations are themselves visual estimates of cloud fraction and prone to regional methodological variability [65,66]. It is also important to note that cloud cover can be significantly affected by land cover [46], which further reinforces the need for high-resolution monitoring of cloud dynamics. The spatial distribution of residuals confirms a known positive bias [67] at high latitudes (>60°), which could be due to the use of modeled surface air temperature in determining the likelihood of snow and ice within the cloud detection algorithm (S3 Fig). There is also a negative bias over the monsoonal regions of India in the Northern Hemisphere summer (JJA, S3 Fig), which is likely due to the diurnal cycle of clouds associated with the monsoon [68] and the timing of MODIS overpasses. We also summarized the residuals over water and different land cover types (S4 Fig). Cloud detection over open water (which tends to be dark and warm in contrast to clouds, which tend to be bright and cold) is relatively straightforward [67], and the role of large water bodies in cloud formation is well understood. In colder months, the cold air above relatively warm water generates instability, increasing cloudiness downwind, while in warmer months the water temperatures are typically cooler than surrounding land, which tends to suppress some cloud types [69]. Biome Summaries To illustrate and contrast the spatial variability in cloud frequency within and between Earths biomes, we summarized the mean and standard deviation of the MODCF within each of the up to 14 biomes in each geographic “realm” delineated by the “Terrestrial Ecoregions of the World” dataset [35]. Data are available in S3 Table. Cloud Forest Modeling Locations of known cloud forests were obtained from the Tropical Montane Cloud Forest Sites database maintained by the United Nations Environment Program–—World Conservation Monitoring Centre and published in A Global Directory of Tropical Montane Cloud Forests [48]. This dataset contains 529 locations compiled from literature and correspondence with regional experts. Unfortunately, only the central location for each record is known, and there is no accounting for the large variability in size, which can range from 50 hectares to hundreds of square kilometers [48]. We used these locations as presence points with 10,000 randomly selected background points in an infinitely weighted logistic regression (identical to the MaxEnt modeling framework and inhomogeneous point process model [47,70]). We compared models built with elevation, interpolated mean annual temperature, mean annual precipitation, and precipitation seasonality [7] with models built with elevation and the new cloud metrics (including mean annual, inter-, and intra-annual variability) for the global tropical landmass (±23.44° latitude). See S4 Table for more details. Regions were defined with longitudinal breaks at 29.5°W and 63.4°E and included as a factor in the model. Models were compared using AIC, BIC, and the area under the receiver operating characteristic (AUC, S4 Table). Distribution Modeling Occurrence data for L. lacrymiger were extracted from the eBird Basic Dataset v1.3 [71]. All eBird observations for any species in which the observer recorded looking for “all species” with survey durations of less than 4 h, survey distances of no more than 5 km, and/or a survey area of no more than 500 ha (to reduce spatial uncertainty) were considered as observation “trials,” and observations of L. lacrymiger were considered “presences.” Occurrence data for P. cynaroides were extracted from the Protea Atlas Dataset [72]. Survey locations with no observed P. cynaroides were considered “trials.” The number of point-level trials and presences for each species were then counted in each 1 km grid cell. To account for imperfect observation (false negatives), the probabilities of occurrence given the various environmental data (see below) were modeled in a Bayesian framework as a zero-inflated binomial using the hSDM.ZIB function in the hSDM R package [73]. We compared two candidate models for each species: the first used spatially interpolated mean annual temperature and three spatially interpolated precipitation measures (mean January precipitation, mean July precipitation, and precipitation seasonality), all from the 30 arc-sec (≈1 km) WorldClim dataset [7]. The second model used the same spatially interpolated mean annual temperature layer with three remotely sensed cloud cover measures (mean January cloud frequency, mean July cloud frequency, and the standard deviation of the intra-annual cloud frequency) (S7 Fig). Models were evaluated by calculating the area under the receiver operator curve (AUC), deviance information criterion (DIC), correlation (cor), and two measures of spatial autocorrelation, Moran’s I (MoransI) and Geary’s C (GearyC). Three chains of each model were burned-in for 10,000 iterations and then run for an additional 50,000 iterations and thinned by 50 to reduce the autocorrelation of the posterior samples. Model convergence was assessed using the multivariate potential scale reduction factor [74]. Range size estimates were calculated by summing the mean posterior pixel-level occurrence probabilities across the modeling domain. This technique avoids the need to apply a threshold the estimated probabilities and accounts for the uncertainty in the model predictions. For example, the estimated range from two cells with 50% probability would be one cell.

Acknowledgments The authors would also like to thank the National Center for Ecological Analysis and Synthesis (NCEAS) Environment and Organisms working group (especially Benoit Parmentier) for constructive feedback and Google Earth Engine for providing computational resources.

Author Contributions Conceived and designed the experiments: AMW WJ. Performed the experiments: AMW. Analyzed the data: AMW. Wrote the paper: AMW WJ.