Study region

Our analysis is set within the Mimika Regency of the Papua province of Indonesia (Fig. 1), downstream from Grasberg, a large-scale, open-pit copper-gold mine. This region includes forested and mangrove areas along the Ajkwa River as well as coastal waters of the Arafura Sea approximately 60 km south of Grasberg. Our study’s first component introduces the Noise Insensitive Trajectory Algorithm (NITA) to examine short- and long-term vegetation disturbance along the river; the second component addresses the spatial diffusion of suspended sediments into Arafura coastal waters. All relevant geographic features fit within a single Landsat tile (WRS-2, Path 103, Row 63).

Vegetation disturbance using Landsat time series

A remote sensing time series-based approach that was robust to consistent cloud cover and sensitive to spatially diffuse and temporally protracted changes was required to examine the relationship between Grasberg tailings production and regional environmental disturbance. NITA was inspired by significant recent work dedicated to constructing per-pixel land-cover histories using Landsat time series46,47,48,49,50,51,52,53,54. The strengths of these methods range from parsimonious characterization of annual trends46,47,48, to statistical determination of disturbance dates using data driven approaches49,54, to simultaneous monitoring of seasonal and trend change50,52, to flexible parameterization based on user needs51. However, with some exceptions52,55, most of the aforementioned algorithms rely on consistent data availability due to relatively low cloud cover. For our study, we sought a disturbance detection algorithm that has ease of implementation, flexible parameterization, the ability to overcome limitations imposed by missing image dates and remnant atmospheric contamination, and which did not require information on sub-annual phenology. Lacking an exhaustive comparison between NITA and existing disturbance detection algorithms, we do not make the assertion that other algorithms could not perform equally well.

We used 199 Landsat 4/5 TM and Landsat 7 ETM + scenes to form a 28-year time series from 1987–2014. Surface reflectance images generated using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS)56 were downloaded directly, along with the fmask cloud mask product57 from the United States Geological Survey’s EarthExplorer website (earthexplorer.usgs.gov). Given the 55% mean scene cloud cover, we developed a vegetation disturbance detection algorithm suitable for high frequency observations that is insensitive to missing data due to atmospheric contamination or Landsat 7’s Scan-Line Corrector data gaps. We performed standard preprocessing including co-registration, spatial subsetting, and cloud masking, and then calculated NDVI58 using a single automated process. Though NDVI saturates at high values, it is sensitive to change in photosynthetic biomass below 0.70, typical of our study area59.

The NITA algorithm is implemented on each image pixel and uses all image dates containing valid pixel-level spectral information. The inputs into the algorithm are the set of valid image dates (“x”), the accompanying set of spectral index values (“y”), and the user-defined parameters, max_segment, bail_thresh, prctile, filt_dist, and penalty each described in Table 1.

Table 1 User defined input parameters for NITA. Full size table

An overview of the NITA algorithm as implemented on a single pixel time series follows (also see Fig. 8):

1 Calculate the noise variable for the entire times series using the median forward finite difference of the set of spectral values. This variable internalizes atmospheric contamination, geometric errors, and phenology. 2 Establish a single-segment, linear fit in accordance with the prctile parameter. 3 Compare the value of bail_thresh with the ratio of error of the initial linear fit to noise. If bail_thresh is not exceeded (e.g., the case of low error and/or high noise), adopt a single-segment model; otherwise, continue. 4 Initiate NITA build subroutine and add successive breakpoints until the number of breakpoints reaches max_segments. Breakpoints are added at the date (x value) of maximum orthogonal error of the previous fit, after filtering error by filt_dist. The y-value of the breakpoint is calculated from the set of spectral values within filt_dist image dates of the date of maximum error. If, for example, prctile is 75, then the y-value is the approximate 75th percentile value of the filt_dist subset of points (Fig. 8). 5 Run the NITA subtract subroutine to iteratively remove superfluous breakpoints. The final set of breakpoints minimizes the Bayesian Information Criterion51 based on log-likelihood of a lognormal distribution (given all orthogonal distances >0) and the penalty parameter. The BIC equation is formulated as:

where logL is the log-likelihood function value, penalty is the user-defined multiplier, segs is the number of model segments, and N is the number of observations.

The generation of trajectories is based on iterative fits of the piecewise function while the ultimate number of segments is determined with a BIC criterion. Thus, there is no explicit judgement as to what does or does not constitute a “disturbance”. As in Landtrendr, the user can decide on the magnitude of change in spectral index values over a given duration should be considered a disturbance48. In our study, based on typical forested NDVI values of above 0.8, we consider a disturbance as an event that changes NDVI from above 0.70 to below 0.40 at a rate of >0.10 per year. This disturbance rate reflects rapid NDVI changes in the ADA due to inundation while excluding more protracted NDVI decline associated with degradation in the urbanizing region surrounding Timika. As in Paull et al.13, we separate regions of urbanization from inundation using the ADA’s western levee13 as a physically meaningful dividing line.

Parameter sensitivity and accuracy assessment through simulation

In order to select the most effective input parameter set, we tested the sensitivity of results against simulated data representing the key pixel trajectories in our Papua study area. Idealized trajectories were manually delineated for disturbances in 1991, 1997, and 2003 as well as braided river dynamics, stable forest, and urbanization processes. For each idealized trajectory, noise was added in two ways: (1) Real sets of valid dates (i.e., cloud- and SLC error-free dates) were generated from 100 pixels randomly sampled throughout the image. For each pixel history, 20 valid date sets were randomly selected from the 100 pixel sample. (2) For each trajectory and each date set, 20 sets of random noise were added according to a non-parametric noise model which, itself, was based on the spectral index distributions of 100 forested pixels. This effectively created 400 example pixels for each of the manually-delineated trajectories in question.

For each trajectory type, bail_thresh, filt_dist, and penalty were varied to determine their impact on estimates of model complexity (number of segments), fit RMSE compared to the originating trajectory, and, when applicable, date-before-disturbance breakpoint and date-of-nadir breakpoint. We hypothesized that, in particular, changes in filt_dist and penalty would simultaneously improve accuracy of model complexity while reducing accuracy of disturbance dating and vice versa. The parameters max_segment and prctile were not tested because their sensitivities can be understood logically; they are provided to the user for fine tuning.

Following selection of a reasonable parameter set (prctile = 90, filt_dist = 3, bail_thresh = 2, penalty = 4), we conducted an accuracy assessment using 100 manually-generated trajectories based on real data from 100 pixels with 1 to 5 “true” segments. In the same manner as the parameter testing above, these 100 idealized trajectories were used to simulate missing dates and noise. We simulated 10 sets of missing dates and 10 sets of random noise for each trajectory yielding 3900 “pixels” with one segment, 1100 with 2, 3500 with 3, 1100 with 4, and 400 with 5. These trajectories represented the spectrum of disturbance dates and types present in our study area. Noise standard deviation was set to 0.2 NDVI units, a level commonly found in this study area.

Spatio-temporal dynamics of coastal suspended particulate matter (SPM)

While larger particulates tend to settle upstream, closer to the mine, smaller-sized particulates may settle along the length of the ADA and even infiltrate coastal waters9,11. We hypothesized that the transport of tailings through the Ajkwa River and the ADA would increase after 1998 and yield a greater SPM concentration in the Arafura Sea at the river’s outlet. To test this hypothesis, we calculated changes in Landsat-derived SPM concentration (g/m3) using the following equation34:

where is the water-leaving red band reflectance and A (327.84 g/m3) and C (0.1708) are empirical coefficients specific to the 660 nm center of Landsat TM and ETM+’s red band; variation in can be attributed to optical characteristics of water rather than atmospheric or glint effect60.

To assess changes in SPM concentration before and after 1998, we generated a 199-date SPM time series in the same manner as the NDVI time series discussed above. We sampled SPM concentrations for all image dates along three-pronged, trident-shaped transects, ranging from 1 to 15 km in length, originating from the Ajkwa Estuary directly south of the ADA, which receives the vast majority of Grasberg tailings. In addition, we constructed similar transects (Supplementary Figure S1) at 20 river outlets outside of the ADA to capture baseline changes in SPM that are not likely to be related to mining activity. At the ADA outlet, the SPM concentration for each image date was measured as the median SPM at all sample points along a transect of a given length. The SPM at non-ADA outlets is characterized by the median SPM across all 20 non-ADA transects of a given length for each image date. Spatial averaging using median accounts for the lognormal frequency distribution of SPM values and minimizes the impact of outlier values (frequently from unmasked cloud edges). All temporal aggregation was completed using 90th percentile values based on well-established theory that in mountain rivers, half of the annual suspended sediment flux will be transported in <5% of the days25. For the purposes of this study, 90th percentile SPM values at the ADA outlet were compared to 90th percentile values at non-ADA outlets for pre-1998 and post-1998 periods. This comparison supports identifying changes unique to the ADA outlet transect by decoupling spatial and temporal variability in SPM from changes in river channel morphology, runoff potential, and variability in rainfall or other mesoscale phenomena that may affect SPM transport, deposition, or dilution.

To further explore the effects of high spatial and temporal variability in this system, we use 2 km transects as a case study (Fig. 6) to show the interquartile range of SPM values over space (i.e., across sample transects) and time (i.e., image dates within a single year). This variation in Landsat-derived SPM within a given year results from the high temporal variability in sediment transport and highlights the need to augment single-date point sampling methods through spatially and temporally extensive sampling. It is possible that Landsat’s nominal 16-day revisit period is too infrequent for representative sampling in locations where the majority of the river’s sediment transport occurs over a small minority of days (e.g., rivers predominately fed by rainfall or snow and glacier melt rather than groundwater or lakes25).