Global features

Global features are computed over all the pixels of an entire image.

Color: The HSL (hue, saturation, lightness) and HSV (hue, saturation, value) color spaces are the two most common cylindrical-coordinate representations of points in an RGB color model. The HSV and HSL color space define pixel color by its hue, saturation and value, respectively lightness (Joblove & Greenberg, 1978). This provides a color definition similar to the human visual perception. The first step for each picture analysis was therefore to calculate the average hue, saturation and value respectively lightness for both color spaces. Assuming a constant hue, the definition of saturation and of value and lightness are very much different. Therefore hue, saturation, and value of a pixel in the HSV space will be denoted as I H (m, n), I S (m, n) and I V (m, n), and hue, saturation and lightness in the HSL space as I H_ (m, n), I S_ (m, n) and I L_ (m, n) from here on, where m and n are the number of rows and columns in each image. (1) f 1 = 1 M N ∑ n ∑ m I H m , n (2) f 2 = 1 M N ∑ n ∑ m I S m , n (3) f 3 = 1 M N ∑ n ∑ m I V m , n (4) f 4 = 1 M N ∑ n ∑ m I S _ m , n (5) f 5 = 1 M N ∑ n ∑ m I L _ m , n . To assess colorfulness the RGB color space was separated in 64 cubes of identical volume by dividing each axis in four equal parts. Each cube was then considered as individual sample point and color distribution D 1 of each image defined as the frequency of color occurrence within each of the 64 cubes. Additionally a reference distribution D 0 was generated so that each sample point had a frequency of 1/64. The colorfulness of an image was then defined as distance between these two distributions, using the Quadratic-form distance (Ke, Tang & Jing, 2006) and the Earth Mover’s Distance (EMD). Both features take the pair-wise euclidian distances between the sample points into account. Assuming c i is now the center position of the i-th cube, we get d ij = ‖rgb2luv(c i ) − rgb2luv(c j )‖ 2 after a conversion to the LUV (Adams chromatic valence space; Adams, 1943) color space. This leads to (6) f 6 = h − h 0 T A h − h 0 and f 7 = emd D 1 , D 0 , d i j | 1 < i , j < 64 in which h and h 0 are vectors listing the frequencies of color occurrence in D 1 and D 0 . A = (a ij ) is a similarity matrix with a ij = 1 − d ij /d max and d max = max(d ij ); ‘emd’ denotes the earth mover’s distance we implemented using an algorithm described by Rubner, Tomasi & Guibas (2000).

For color analysis only pixels with a saturation I s_ (m, n) < 0.2 and a lightness I L_ ∈ [0.15, 0.95] were used as the human eye is unable to distinguish hues and only sees shades of grey outside this range. As P H = {(m′, n′)|I S_ > 0.2 and 0.15 < I L_ < 0.95} represents the set of pixels whose hues can be perceived by humans, f 8 was defined as the most frequent hue in each image and f 9 as the standard deviation of colorfulness. (7) f 8 = min h max , where ∀ hue h, # of {(m′, n′) ∈ P H |I H_ = h max } ≥ # of {(m′, n′)} ∈ P H |I H_ = h. If hues had an identical cardinal, the smallest one was chosen. (8) f 9 = std var I H _ ′ . where I H _ ′ m , n = I H _ m , n if (m, n) ∈ P H ; otherwise I H _ ′ m , n = 0 . var I H _ ′ is the vector containing the variance of each column of I H _ ′ , and std returns its standard deviation.

The hue interval [0, 360] was then uniformly divided into 20 bins of identical size and computed into a hue histogram of the image. Q represents the maximum value this histogram and the hue count was defined as the number of bins containing values greater than C ⋅ Q. The number of missing hues represents bins with values smaller than c ⋅ Q. C and c was set to 0.1 and 0.01, respectively. (9) f 10 = # of i | h i > C ⋅ Q (10) f 11 = # of i | h i < c ⋅ Q . Hue contrast and missing hues contrast was computed as: (11) f 12 = max ‖ c h i − c h j ‖ a l with i , j ∈ i | h i > C ⋅ Q (12) f 13 = max ‖ c h i − c h j ‖ a l with i , j ∈ i | h i < c ⋅ Q where c h (i) is the center hue of the i-th bin of the histogram and ‖ ⋅ ‖ al refers to the arc-length distance on the hue wheel. f 14 denotes the percentage of pixels belonging to the most frequent hue: (13) f 14 = Q / N where N = # of P H (14) f 15 = 20 − # of i | h i > C 2 ⋅ Q with C 2 = 0 . 05

Color models: As some color combinations are more pleasant for the human eye than others (Li & Chen, 2009), each image was fit against one of 9 color models (Fig. S2K). As the models can rotate, the k-th model rotated with an angle α as M k (α), G k (I H_ (m, n) was assigned to the grey part of the respective model. E M k (α)(m, n) was defined as the hue of G k (α) closest to I H_ . (15) E M k α m , n = I H _ m , n if I H _ m , n ∈ G k α H nearestborder if I H _ m , n ∉ G k α where H nearestborder is the hue of the sector border in M k (α) closest to the hue of pixel (m, n). Now the distance between the image and the model M k (α) can be computed as (16) F k , α = 1 ∑ m ∑ m I S _ m , n ∑ n ∑ m ‖ E M k α m , n − I H _ m , n ‖ a l ⋅ I S _ m , n with I S_ (m, n) accounting for less color differences with lower saturation. This definition of the distance to a model was inspired by Datta et al. (2006) with the addition of a normalization 1 ∑ m ∑ n I s _ m , n which allows for a comparison of different sized images. As the distances of an image to each model yield more information than the identity of the single model the image fits best, all distances were calculated and features f 16 –f 24 are therefore defined as the smallest distance to each model: (17) f 15 + k = min α F k , α , k ∈ 1 , … , 9 . Theoretically the best fitting hue model could be defined as M ko (α o ) with (18) α k = arg min α F k , α , k 0 = arg min k ∈ 1 , … 9 F k , α k and α 0 = α k 0 . Those models are, however, very difficult to fit. Therefore we set a threshold TH assuming that if F k,α(k) < TH, the picture fits the k-th color model. If ∀k F k,α(k) ≥ TH the picture was fit to the closest model. In case several models could be assigned to an image not the closest one, but the most restrictive was chosen. As the color models are already ordered according to their restrictiveness the fit to the color model we characterize as: (19) f 25 = max k ∈ j | F j , α j , T H k if ∃ k ∈ 1 , … , 9 , F k , α k < T H k 0 if ∀ k F k , α k ≥ T H Normalizing the distances to the models enabled us to set a unique threshold (TH = 10) for all the images independently of their size.

Brightness: Light conditions captured by a given picture are some of the most noticeable features involved in human aesthetic perception. Some information about the light condition is already explored by the previously described color analysis, however, analyzing the brightness provides an even more direct approach to evaluating the light conditions of a given image. There are several ways to measure the brightness of an image. For this study, we implemented analysis which target slightly different brightness contrasts. (20) f 26 = 1 M N ∑ m ∑ n L m , n (21) f 27 = exp 255 M N ∑ m ∑ n log ∈ + L m , n 255 where L(m; n) = (I r (m; n) + I g (m; n) + I b (m; n))/3. f 26 represents the arithmetic and f 27 the logarithmic average brightness; the latter takes the dynamic range of the brightness into account. Different images can therefore equal in one but differ in the other value. The contrast of brightness was assessed by defining h 1 as a histogram with 100 equally sized bins for brightness L(m; n), with d as index for the bin with the maximum energy h 1 (d) = max(h 1 ). Two indices a and b were set as the interval [a; b] which contains 98% of the energy of h 1 . The histogram was then analyzed step by step towards both sides starting from the dth bin to identify a and b. The first measure of the brightness contrast is then (22) f 28 = b − a + 1 . For the second contrast quality feature a brightness histogram h 2 with 256 bins comprising the sum of the gray-level histograms h r , h g and h b generated from the red, green and blue channels: (23) h 2 i = h r i + h g i + h b i , ∀ i ∈ 0 , … , 255 . The contrast quality f 29 is then the width of the smallest interval [a 2 , b 2 ] where ∑ i = a 2 b 2 h 2 i > 0 . 98 ∑ i = 0 255 h 2 i . (24) f 29 = b 2 − a 2 .

Edge features: Edge repartition was assessed by looking for the smallest bounding box which contains a chosen percentage of the energy of the edges, and compare its area to the area of the entire picture. Although Li & Chen (2009) and Ke, Tang & Jing (2006) offer two different versions to target this feature, both use the absolute value of the output from a 3 × 3 Laplacian filter with α = 0.2. For color images the R, G and B channels are analyzed separately and the mean of the absolute values is used. At the boundaries the values outside the bounds of the matrix was considered equal to the nearest value in the matrix borders. According to Li & Chen (2009) the area of the smallest bounding box, containing 81% of the edge energy of their ‘Laplacian image’ (90% in each direction), was divided by the area of the entire image (Figs. S2E–S2H). (25) f 30 = H 90 W 90 / H W H 90 and W 90 represent the height and width of the bounding box with H and W as the height and width of the image.

Ke, Tang & Jing (2006) resized each Laplacian image initially to 100 × 100 and the image sum was normalized to 1. Subsequently the area of the bounding box containing 96.04% of the edge energy (98% in each direction) was established and the quality of the image was defined as 1 − H 98 W 98 , whereby H 98 and W 98 are the height and width of the bounding box. (26) f 31 = 1 − H 98 W 98 ; H 98 and W 98 ∈ 0 , 1 . Resizing and normalizing the Laplacian images further allows for an easy comparison of different Laplacian images. Analog to Ke, Tang & Jing (2006) who compared one group of professional quality photos and one group of photos of inferior quality, we can now consider two groups of images: one with pictures of pristine and one with pictures of degraded reefs. M p and M s represent the mean Laplacian image of the pictures in each of the respective groups. This allows a comparison of the Laplacian image L with M p and M s using the L1-distance. (27) f 32 = d s − d p , where (28) d s = ∑ m , n | L m , n − M s m , n | (29) d p = ∑ m , n | L m , n − M p m , n | . The sum of edges f 33 was added as an additional feature not implemented by one of the above mentioned studies. Sobel image S of a picture was defined as a binary image of identical size, with 1’s assigned to edges present according to the Sobel method and 0’s for no edges present. For a color image Sobel images S r , S g and S b were constructed for each of its red, green and blue channels and the sum of edges defined as (30) f 33 = | S r | L 1 + | S g | L 1 + | S b | L 1 / 3 .

Texture analysis: To analyze the texture of pictures more thoroughly we implemented features not yet discussed in Ke, Tang & Jing (2006), Datta et al. (2006), or Li & Chen (2009). Therefore we considered R H to be a matrix of the same size as I H , where each pixel (m, n) contains the range value (maximum value–minimum value) of the 3-by-3 neighborhood surrounding the corresponding pixel in I H . R S and R V were computed in the same way for I S and I V and the range of texture was defined as (31) f 34 = 1 M N ∑ m ∑ n R H m , n + R S m , n + R V m , n / 3 . Additionally D H , D S , and D V were set as the respective matrix identical in size to I H , I S , and I V , where each pixel (m, n) contains the standard deviation value of the 3-by-3 neighborhood around the corresponding pixel in I H , I S , or I V . The average standard deviation of texture was defined as: (32) f 35 = 1 M N ∑ m ∑ n D H m , n + D S m , n + D V m , n / 3 . The entropy of an image is a statistical measure of its randomness, and can also be used to characterize its texture. For a gray-level image, it is defined as— ∑ i = 0 255 p i ∗ log 2 p i where p is a vector containing the 256 bin gray-level histogram of the image. Thus, we define features f 36 , f 37 and f 38 as the entropy of I r , I g , and I b respectively. (33) f 36 = entropy I r (34) f 37 = entropy I g (35) f 38 = entropy I b .

Wavelet based texture: Texture feature analysis based on wavelets was conducted according to Datta et al. (2006). However concrete information on some of the implemented steps (e.g., norm or exact Daubechies wavelet used) was sometimes not available which may result in a slight deviation of the calculation. First a three level wavelet transformation on I H was performed using the Haar Wavelet (see Figs. S2I and S2J). A 2D wavelet transformation of an image yields 4 matrices: the approximation coefficient matrix CA and the three details coefficient matrices CH, CV and CD. Height and width of resulting matrices are 50% of the input image and CH, CV and CD show horizontal, vertical and diagonal details of the image. For a three-level wavelet transformation a 2D wavelet transformation is performed and repeated on the approximation coefficient matrix C 1 A and repeated again on the new approximation coefficient matrix C 2 A , resulting in 3 sets of coefficients matrices. The ith-level detail coefficient matrices for the hue image I H were then denoted as C i H , C i V , and C i D I ∈ 1 , 2 , 3 . Features f 39 –f 41 are then defined as follows: (36) f 38 + i = 1 S i ∑ m ∑ n C i H m , n + C i V m , n + C i D m , n , i ∈ 1 , 2 , 3 where ∀i ∈ {1, 2, 3}, S i = | C i H | L 1 + | C i V | L 1 + | C i D | L 1 . Features f 42 –f 44 and f 45 –f 47 recomputed accordingly for I s and I v . Features f 48 –f 50 are defined as the sum of the three wavelet features for H, S, and V respectively: (37) f 48 = ∑ i = 40 42 f i , f 49 = ∑ i = 43 45 f i , f 50 = ∑ i = 46 48 f i .