In 2008, Schnall investigated how participants rate moral dilemmas after they have been presented with words related to the topic of cleanliness, as opposed to neutral words [1]. The study reported that participants who were primed with the concept of cleanliness, found moral transgressions less bad than participants who weren’t primed. [2] conducted a replication study using the same methods and materials. In contrast to [1], [2] found that the mean ratings of the two groups in their study did not differ. [3] pointed out that in [2] participants provided overall higher ratings than in the original study. [3] argued that the failure of the replication study to establish a difference between the two groups was due to a ceiling effect. Since substantial proportions of both groups provided maximum ratings, it was not possible to determine a difference in rating between these two groups. [4] provided their own analyses of ceiling effects in both, the original and the replication study and concluded that the ceiling effect can’t account for the failure to replicate the original finding. Other researchers shared their analyses of the ceiling effects in these two studies on their websites and on social media. Here is a quick overview of the variety of the suggested analyses: [3] showed that the mean ratings in the replication study were significantly higher than those in the original study. Furthermore, she showed that the proportion of the most extreme ratings on the 10 point scale was significantly higher in the replication study than in the original study. [4] argued that rank-based Mann-Whitney test provides results that are identical to an analysis with Analysis of Variance (ANOVA). Furthermore, analyses without extreme values failed to reach significance as well. [5] did not find the above-mentioned analyses satisfactory and suggested an analysis with Tobit model, which showed a non-significant effect. [6] argued that Schnall’s analyses do not support her conclusions. [7] investigated how ceiling effects would affect the power of a t-test. He used a graded response model to simulate data that were affected by ceiling, similar to those obtained in the replication study. The effect size was set to a value obtained in the original study. He found that, depending on the model parametrization, the power of a t-test in the simulated replication study ranges from 70 to 84% which should be sufficient to detect the effect. [8] performed Bayes Factor analysis and compared the quantiles. Both analyses suggested an absence of an effect in the replication study.

In our opinion, this discussion about the presence and impact of ceilings effects illustrates how relevant, yet elusive, the concept of ceiling effect is. Apparently, the only point regarding ceiling effects, on which all parties agreed, is that the application of parametric analyses such as ANOVA or t-test is problematic in the presence of a ceiling effect. Yet the authors disagreed on how to demonstrate and measure the impact of ceiling effect, which makes the default application of ANOVA problematic per se. Motivated by these concerns, the current work presents a computer simulation study that investigates how various methods of statistical inference perform when the measurements are affected by ceiling and/or floor effect (CFE). The main focus is on the performance of the textbook methods: Welch’s version of t-test, ANOVA and rank-based tests. In addition, the performance of potential candidate methods, some of which were already encountered in the discussion of the study by [1], is investigated. The hallmark of the current work is the theoretical elaboration of the concept of CFE with the help of formal measurement theory [9]. This theoretical embedding provides a backbone for the simulations, and, as we further point out, the lack of such theoretical embedding may be one of the reasons why the number and scope of simulation studies of CFE up until now is limited.

Measurement-theoretic definitions of CFE are discussed in section 1.1. Section 1.2 reviews the literature on the robustness of the textbook methods. Only few robustness studies do explicitly consider the CFE. However, numerous studies investigate other factors, which may combine to create CFE. These studies, in particular, are considered in section 1.2. Taking stock of the material presented in the preceding sections, sections 1.3.1 and 1.3.2 justify the choice of statistical methods and the choice of the data-generating mechanism utilized by the simulations. While section 2 provides additional details on the methods and procedures, the description provided in sections 1.3.1 and 1.3.2 should be sufficient to follow the results presentation and a reader interested in the results may directly skip to section 3 and 4.

1.1 Formal definition of CFE

Consider first some informal notions of CFE. The dictionary of statistical terms by [10] provides the following entry on CFE. “Ceiling effect: occurs when scores on a variable are approaching the maximum they can be. Thus, there may be bunching of values close to the upper point. The introduction of a new variable cannot do a great deal to elevate the scores any further since they are virtually as high as they can go.” The dictionary entry in [11] (see also [12]) says: “Ceiling effect: A term used to describe what happens when many subjects in a study have scores on a variable that are at or near the possible upper limit (‘ceiling’). Such an effect may cause problems for some types of analysis because it reduces the possible amount of variation in the variable.”

We identify two crucial aspects of CFE in these quotes. First, CFE causes a “bunching” of measured variables, such that the measure becomes insensitive to changes in the latent variable that it is supposed to measure. Second, CFE does not only affect the expected change in the measured variable but its other distributional properties as well which may in turn affect the performance of some statistical methods. [11] mentions the variability, which one may interpret as the variance of the measured variable. [13] hypothesized that skew is the crucial property that characterizes CFE. Importantly, the informal descriptions lack precise rationale and risk excluding less obvious and intuitive phenomena from the definition of CFE. In section 1.1.1 we show that formal measurement theory allows us to make these (and many others) informal notions precise. Historically, the research in measurement theory has been concerned with deterministic variables and despite multiple attempts a principled extension to random variables was not achieved. In section 1.1.3 we review the derivation of maximum entropy distributions, which provides an extension of measurement theory to random variables and in particular allow us to derive distributions, that can be used to simulate CFE and to manipulate its magnitude.

1.1.1 Measurement theory. A function, say ϕ from A to a subset of , describes the assignment of numbers to empirical objects or events. ϕ is, in the context of measurement theory, referred to as scale. It is crucial that ϕ is chosen such that the numerical values retain the relations and properties of the empirical objects in A (i.e. ϕ is a homomorphism, see section 1.2.2 in [9]). For instance, if the empirical objects are ordered, such that a ⪯ b for some a, b ∈ A, then it is desirable that ϕ satisfies a ⪯ b if and only if (iff) ϕ(a) ≤ ϕ(b). Measurement theory describes various scale types and the properties of the empirical events necessary and sufficient to construct the respective scale type. In addition, given a set of properties of empirical objects, multiple choices of ϕ may be possible and measurement theory delineates the set of such permissible functions. A scale that preserves the order, i.e. a ⪯ b iff ϕ(a)≤ϕ(b) is referred to as ordinal scale. Given that ϕ is an ordinal scale, then ϕ′(a) = f(ϕ(a)) is also an ordinal scale for all a ∈ A and for a strictly increasing f (ibid. p.15). Note that the set of possible scales is described as a set of possible transformations f of some valid scale ϕ. Other notable instances are ratio scale and interval scale. In addition to order, a ratio scale preserves a concatenation operation ∘ such that ϕ(a ∘ b) = ϕ(a) + ϕ(b). A ratio scale is specified up to a choice of unit i.e. ϕ′(a) = αϕ(a), with α > 0. The required structure of empirical events is called extensive structure (ibid. chapter 3). In some situations it is not possible to take direct measurements of the empirical objects of interest, however one may measure pairwise differences or intervals between the empirical objects, say a ⊖ b or c ⊖ d. Then, one may construct an interval scale given that a ⊖ b ⪯ c ⊖ d if ϕ(a ⊖ b) ≤ ϕ(c ⊖ d) and ϕ(a ⊖ c) = ϕ(a ⊖ b) + ϕ(b ⊖ c) for all a, b, c, d ∈ A (ibid, p. 147). The set of permissible transformations is given by ϕ′(a) = αϕ(a) + β with and α > 0. The corresponding structure is labelled difference structure (ibid. chapter 4). Consider concatenation again. The concatenation operation in length measurement can be performed by placing two rods sequentially. In weight measurement the concatenation may be performed by placing two objects on the pan of a balance scale. As [9] (chap. 3.14) point out, finding and justifying a concatenation operation in social sciences often poses difficulties. Furthermore, it is not necessary to map concatenation to addition. For instance, taking ψ = exp(ϕ), with interval scale ϕ, will translate addition on to multiplication on . More generally, any strictly monotonous function f may be used to obtain a valid numerical representation ψ(a) = f−1(ϕ(a)) with a (possibly non-additive) concatenation formula ψ(a ∘ b) = f−1(f(ψ(a)) + f(ψ(b))). [9] (chap. 3.7.1) make use of this fact when considering measurement of relativistic velocity (also referred to as rapidity). Relativistic velocity is of interest in the present work because it is bounded—it can’t exceed the velocity of light. The upper bound poses difficulties for the additive numerical representation since an extensive structure assumes positivity of addition, i.e. a ≺ a ∘ b for all a, b ∈ A (axiom 5 in definition 1 on p.73 ibid.). However, if z is the velocity of light, we have z ∼ z ∘ a, which violates positivity. [9] resolve this issue by mapping velocity from a bounded range to an unbounded range, performing addition there and mapping the result back to the bounded range. Formally, concatenation is given by ([9] chapter 3, theorem 6) (1) where f u is a strictly increasing function from [0, 1] to that is unique up to a positive multiplicative constant. As [9] point out, taking transformation f u = tanh−1 results in the velocity-addition formula of the relativistic physics. However, this choice is arbitrary and Eq 1 provides us with the general result, which we will use in the current work. [9] call an element which satisfies z u ∼ z u ∘ a (for all a ∈ A) an essential maximum. [9] further show that given an extensive structure with an essential maximum, there always exists a strictly increasing function f u such that Eq 1 is satisfied. Next, we consider several straightforward extensions of the result by [9]. First, we wish to introduce extensive structures with an essential minimum z l . Note that even though velocity has a lower bound at zero, this lower bound is not an essential minimum, because the concatenation is positive and a repeated concatenation results in increasing numerical values. As a consequence, essential maximum or essential minimum is a property related to a concatenation operation rather than a property of the numerical range of a scale. If we distinguish between z u and z l we need to distinguish between ∘ u and ∘ l and in turn between f u and f l . Of course, this does not preclude the possibility that in some particular application it may be true that f u = f l . Consider a modification of Eq 1 to describe an essential minimum. Qualitatively, an essential minimum manifests, similar to an essential maximum, the property z ∼ z ∘ l a. In this case however ∘ l is a negative operation in the sense that a ≻ a ∘ l b for all a, b ∈ A. The results then are analogous to those of the velocity derivation. The main difference is that the scale ϕ l maps to [ϕ l (z l ), 0] rather than to [0, ϕ u (z u )]. However, both expressions ϕ l (a)/ϕ l (z l ) and ϕ u (a)/ϕ u (z u ) translate into the range [0, 1] and hence, the only modification of Eq 1 is to add subscripts: (2) Thus, if a structure has an essential minimum, then a strictly increasing function f l exists such that Eq 2 is satisfied. Second, we wish to extend Eq 1 to situations in which the minimal element z l (irrespective of whether it is essential minimum or not) is non-zero. We do so by first translating the measured values from range [ϕ(z l ), ϕ(z u )] to [0, ϕ(z u ) − ϕ(z l )] and then to the domain of f u i.e. [0, 1]: (3) Third, similar to step two, we modify Eq 2 to apply in situations with a non-zero maximal element z u (4) Above, we distinguished between a scale with an essential minimum ϕ l and a scale with an essential maximum ϕ, which was in Eq 3 labelled more accurately as ϕ u . This distinction was necessary, because the two scales in Eqs 3 and 4 map to different number ranges. However, and this is the fourth extension, we need to consider a single scale which has both, an essential minimum and an essential maximum. To do so, consider a scale ϕ with range [ϕ(z l ), ϕ(z u )]. Then both Eqs 3 and 4 apply. We just change the labels: ϕ l = ϕ u = ϕ. Fifth, a further simplification can be achieved by assuming that essential minima and essential maxima affect concatenation in an identical manner i.e. f = α u f u = α l f l = f for some positive constants α l and α u . Eqs 3 and 4 simplify respectively to: (5) (6) for all a, b ∈ A. To simplify the notation, we introduced the function . This notation highlights that in terms of measurement ϕ, the operations ∘ l and ∘ u are symmetric around the line g[x] = 0.5. To illustrate this with an example, consider the popular choice f l (x) = f u (x) = f(x) = −log(1 − x) with x restricted to [0, 1]. This is a strictly increasing function to and hence provides a valid choice. The left panel in Fig 1 shows f(1 − g[ϕ(a)]) = −log(g[ϕ(a)]) and f(g[ϕ(a)]) = −log(1 − g[ϕ(a)]) as a function of g[ϕ(a)] ∈ [0, 1]. As noted, the two curves manifest symmetry around g[ϕ(a)] = 0.5 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Examples of f u and f l . The panels show functions f l (g[ϕ]) and f u (1 − g[ϕ]). In the left panel f l (g[ϕ]) = f u (g[ϕ]) = log g[ϕ] while in the right panel f l (g[ϕ]) = f u (g[ϕ]) = log(g[ϕ]/(1 − g[ϕ])). https://doi.org/10.1371/journal.pone.0220889.g001 Above, we assumed that ϕ is a ratio scale. As a final modification we consider the case when ϕ is an interval scale. The result for an interval scale and the corresponding difference structure is provided in chapter 4.4.2 in [9]. The result is identical to Eq 1 except that f is a function to rather than to and that f is unique up to a linear transformation. Hence, if ϕ is an interval scale and ∘ l = ∘ u then f = α u f u + β u = α l f l + β l (α > 0 and ). Again, to provide an example of an interval scale with an essential maximum and an essential minimum, consider a case with logit function f l (x) = f u (x) = f(x) = log(x/(1 − x)). The right panel in figure shows f(1 − g[ϕ(a)]) and f(g[ϕ(a)]) as a function of g[ϕ(a)] ∈ [0, 1]. A logit function is a strictly increasing function from [0, 1] to and is hence valid model for difference structure with essential maximum and with essential minimum. The set of permissible transformations of f is given by f(x) = α log(x/(1 − x)) + β.

1.1.2 Structure with Tobit maximum. The measurement structures discussed so far have the notable property that it’s not possible to obtain the maximal element by concatenation of two non-maximal elements. [9] note that “we do not know of any empirical structure in which the concatenation of two such elements is an essential maximum”. [9] (theorem 7 on p. 95-96) nevertheless provide the results for such a case, which we refer to as extensive structure with Tobit maximum. This measurement structure is implicit in the popular Tobit model which is sometimes discussed in connection with CFE and hence, we briefly present it. The scale ϕ must satisfy order monotonicity and monotonicity of concatenation when the concatenation result is not equal to the Tobit maximum. In the remaining case, i.e. when z u ∼ a ∘ b, the concatenation is represented numerically as ϕ(z u ) = inf(ϕ(a) + ϕ(b)) where inf is the infimum over all a, b ∈ A which satisfy z u ∼ a ∘ b. The scale ϕ is unique up to a multiplication by a positive constant. Extensions to extensive structures with Tobit minimum and to difference structures with Tobit maximum and/or minimum are straightforward and follow the rationale presented in the previous section.

1.1.3 Random variables with CFE. The extensive and difference structures with essential minimum/maximum introduce a crucial aspect of CFE that was exemplified by the dictionary entry on the ceiling effect by [10]. Formally, we may interpret the “introduction of a new variable” that elevates the previous level of the variable as a concatenation with some other element. Then we may look at the difference between the new level f−1(f(x) + f(h)) and the old level x. Notably, we get lim x→1 |f−1(f(x) + f(h)) − x| = 0 (for h ≠ 0), which may be seen as a formal notion of “bunching”. Crucially, measurement theory suggests that the concept of a boundary effect implicitly assumes the existence of a concatenation operation which has a non-additive numerical representation. The measurement-theoretic account fits well with the description of CFE by [10]. It misses however the other highlighted aspects of CFE: the distributional properties of the measured variable such as the variance reduction or the increased skew. To formally approach the concept of reduced variation and the influence of CFE on the distribution of the measured values more generally, we need to introduce the concept of random variables (RVs). Recall, that scale ϕ maps from the set of empirical events to some interval. Instead, we consider ϕ to map from empirical events to a set of RVs over the same interval. We are not aware of work that explores such a probabilistic formulation of measurement theory. Nor do we wish to explore such an approach in detail in the current work. Our plan is to point out that the above-listed results from measurement theory along with few straight-forward assumptions about the probabilistic representation, allow us to derive most widely used probability distributions. Crucially, the derivation determines which parameters and under what transformation, represent the concatenation operation. This in turn allows us to specify and justify the choice of data generators used in the simulations and the choice of the metrics used to evaluate the performance on the simulated data. When the scale maps to a set of RVs, constraints, such us ϕ(a ∘ b) = ϕ(a) + ϕ(b), or a ⪯ b iff ϕ(a) ≤ ϕ(b), are in general not sufficient to determine the distribution of ϕ. The first step is to formulate the constraints in terms of summaries of RVs. We choose the expected value E[X] as data summary. The expected value is linear in the sense that E[aX + b] = aE[X] + b and also E[X] + E[Y] = E[X + Y] (where X,Y are independent RVs and a, b are constants). Due to the linearity property, we view expected value as a data summary that is applicable to scales, which represent concatenation by addition. See chapter 2 in [14] and chapter 22 in [15] for similar views on the role of expectation in additive numerical representations. As a consequence, we modify the constraint a ⪯ b iff ϕ(a) ≤ ϕ(b) to a ⪯ b iff E[ϕ(a)] ≤ E[ϕ(b)]. We modify the constraint ϕ(a ∘ b) = ϕ(a) + ϕ(b) to E[ϕ(a ∘ b)] = E[ϕ(a)] + E[ϕ(b)]. Effectively, the above constraints state that ϕ is a parametric distribution with a parameter equal to E[ϕ(a)] for each a ∈ A and in which the parameter satisfies monotonicity, additivity, or some additional property required by the structure. Above, we saw that in the presence of CFE, concatenation can’t be represented by addition. Instead, we apply the expectation to the values transformed with f which supports addition. For instance Eq 3 translates into (7) Thus, we require that the distribution is parametrized by c u = E[(f u (g[ϕ(a)])] and/or c l = E[(f l (1 − g[ϕ(a)])] depending on whether the structure has an essential maximum, an essential minimum or both. Consider again the dictionary descriptions of CFE. One may interpret these in terms of random variables as follows. With the repeated concatenation, the expected value of the measured values approaches the boundary. Furthermore, as it approaches the boundary, a concatenation of the equivalent object/event results in an increasingly smaller adjustment to the expected value. Finally, [11] stated that the variability decreases as the values approach boundary, which one may interpret as that the variance of the random variable approaches zero upon repeated concatenation. Consider a random variable Y(c u ) ∈ [y l , y u ] with a ceiling effect at y u and parameter . We may investigate whether the stated requirements are satisfied by checking whether the following formal conditions of a ceiling effect are true: Instead of condition 3 and 4, one may alternatively consider whether Var[Y(c u )] and Skew[Y(c u )] are respectively increasing and decreasing functions of c u . Eq 7 implies the existence of series of random variables Y(c u ) that converge to ϕ(z u ) as c u → ∞. By dominated convergence theorem (chapter 9.2 in [16]) then the second condition holds but instead of the first condition we obtain . By assuming E[ϕ(z l )] = y l one additionally obtains both, the first and the third condition. Indeed, the third condition is a direct consequence of the first condition. Analogous results follow for a variable with a floor effect at y l with the limiting process c u → −∞. To conclude, the second condition follows immediately from the measurement-theoretic considerations, however to determine the remaining conditions one has to consider specific distributions of Y which we do next.

1.1.4 Maximum entropy distributions with CFE. In this section we present an approach that allows us to derive probability distribution Y(c l , c u ) given functions f l and f u . We adapt the principle of maximum entropy (POME, [17–19]) to obtain the probability distributions with the desired parametrization. According to POME, if nothing else is known about the distribution of Y except a set of N constraints of the form c i = E[g i (Y)] and that the domain of Y is [y l , y u ] (and that it is a probability distribution, i.e. ), then one should select a distribution that maximizes the entropy of the distribution subject to the stated constraints. Mathematically, this is achieved with the help of the calculus of variations ([20] chapter 12). The POME derivation results in a distribution with N parameters and the derivation fails if the constraints are inconsistent. The procedure is similar, but somewhat more general compared to the alternative method of deriving a parametric distribution that is member of the exponential family with the help of constraints ([for applications of this method see for instance [21]). POME allows to derive distributions that are not part of the exponential family. To mention some general examples, the uniform distribution is the maximum entropy distribution of RV on closed interval without any additional constraints. Normal distribution is the maximum entropy distribution of RV Y on real line with constraints c m = E[Y] and c v = E[(Y − c 1 )2] ([19] section 3.1.1). Table 1 provides an overview of maximum entropy distributions found in the POME literature that are derived from constraints posed by structures with essential minimum and/or essential maximum and thus relevant in the current context. For more details on the derivation of the listed maximum entropy distributions see [19] and [22]. We make the following observations. First, the popular choice f u = f l = −log(1 − Y) translates into constraints c u = E[log(1 − Y)] and c l = E[logY], which correspond to the logarithm of geometric mean of Y and of 1 − Y respectively. The sole exception in the table is the Logit-normal distribution which uses f u = f l = log(Y/(1 − Y)) to model CFE. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Maximum entropy distributions derived from constraints posed by structures with essential minimum and/or essential maximum. https://doi.org/10.1371/journal.pone.0220889.t001 Second, the maximum entropy distributions include large portion of the most popular distributions on [0, 1] and [0, ∞]. Exponential, Weibull, Gamma, F, Log-logistic or Power function distribution are included by Generalized Gamma family, by Beta Prime distribution or both [23]. A generalized version of the Beta prime distribution can be obtained with a constraint . [22] went further and showed that a maximum entropy distribution with constraints c l = E[log(Y)] and includes the Generalized Gamma family (for c p → 0) and the generalized Beta Prime family (c p = 1). While [22] don’t consider the case c p = −1, it is straightforward to see that assuming results in a generalized form of Beta distribution. The resulting distributions and the relations between them are discussed in more detail by [23]. Third, the parameters c v may be interpreted as variance parameters, or more generally as constraints that introduce a distance metric. Notably, the variance is formulated over f(Y). Thus while c l can be respectively interpreted as expected log-odds or logarithm of geometric mean, c v can be interpreted as log-odds variance or geometric variance. Introduction of these parameters is consistent with the measurement theoretic framework, except that the constraints are expressed in terms of variance c v = Var[f(Y)] rather than in terms of expectation c l = E[f(Y)]. Fourth, the interpretation of the c m parameters is interesting as well. One possibility is to view c m as an additional constraint unrelated to CFE. As [24] discuss in the case of gamma distribution, the parameter c l controls the generation of small values while c m controls the generation of large values. This interpretation is similar to the interpretation of c l and c u of beta distribution even though there is no essential maximum in the former case as opposed to the latter case. Fifth, one may illustrate the similarity between c u of Beta and c m of Gamma distribution in one additional way. Consider a generalization of Beta distribution with a scaling parameter b = ϕ(z u ) such that Y ∈ [0, b]. As detailed in [23] Gamma and some other distributions can be obtained from a generalized Beta by constructing the limit b → ∞. The parameter c u of Beta translates into c m of Gamma. Sixth, it’s possible for two notationally distinct constraints to result in the same probability distribution, albeit with different parametrization. For instance, constraints and imply a Generalized Gamma distribution with a somewhat different parametrization than that of the Generalized Gamma distribution listed in the table: c l = c n log b + ψ(a). Seventh, recall that f l and f u are defined up to a scaling constant (extensive structure) or up to linear transformation (difference structure). As a consequence c l and c u are known up to a scale or up to a linear transformation, i.e. c l = β l + α l E[f l (Y)]. In similar manner, one may modify the constraints c u or even c m so that the set of permissible transformations of f is explicit. α and β are not identifiable in addition to c l and their introduction does not affect the derivation of maximum entropy distribution. Nevertheless, it may be possible to parametrize the distribution with α and/or β instead of some nuisance parameter. Consider the Generalized Gamma distribution with constraints and . Note that we may introduce transform c l = c n E[log(Y)] so that c n can be interpreted as the scale/unit of c l . Finally, one may sacrifice parameter c m and introduce shift parameter, say c β such that c l = c β + c n E[log(Y)]. From the formula for c l from Table 1 it follows that c β = c n log b = log c m − log a. Eighth, as illustrated with the case of truncated gamma distribution, it is straightforward to introduce a maximum of the distribution while maintaining the floor effect. Note though that the maximum thus introduced is not an essential maximum and the process of truncation can’t be used to introduce CFE. This can be perhaps best seen on the case of truncated normal distribution with range [0, ∞] which does not satisfy the CFE conditions listed in the previous section. Finally, it is straightforward to apply the above ideas to discrete measurement. While the values of Y are discrete, the values E[f(Y)], that are part of the constraints, are continuous. Maximum entropy derivation provides analogous results to continuous distributions. Constraints c l = E[logY] and c m = E[Y] with Y ∈ {0, 1, …} lead to generalized geometric distribution ([19] section 2.1d). Constraints c l = E[log Y] and c m = E[log(1 − Y)] result in a discrete version of Beta distribution with Y ∈ {0, 1/n, 2/n, …, (n − 1)/n, 1} and . Discrete Beta distribution is seldom used in applied work, perhaps due to the fact that p(Y = 0) = p(Y = 1) = 0. As a more popular and more plausible alternative we included the Beta-binomial distribution, which can be obtained by sampling q from Beta distribution parametrized by c l and c u and y ∈ 0, 1, …, n is then sampled from Binomial distribution with proportion parameter q and . As described in [25], binomial distribution can be seen as a maximum entropy distribution with E[Y] = q and with additional assumptions about the discretization process. Having presented the maximum entropy distributions, we now consider to what extent these satisfy the informal CFE conditions. As already mentioned, the second CFE condition is incorporated as an assumption in the derivation of the maximum entropy distributions, but the remaining three conditions must be checked separately. In principle, such task is in exercise of looking up the formula for expectation, variance or skew and then computing the limit. Unfortunately, the analytic formulas for these quantities are either not known or do not exist (Generalized Gamma, Log-logistic) or the available formulas do not use the current parametrization (Beta, Beta prime, Beta-binomial, Generalized geometric). As a consequence only few results are readily available and are discussed next. The remaining results are obtained through simulation and presented in section 3.9. In the case of Log-normal distribution, the first and the third condition are satisfied while the skew is independent of c l . In the case of Beta distribution we set c l → ∞ while c u is held constant (and vice versa). As a consequence c l − c u = ψ(a) − ψ((1/E[Y] − 1)a) → ∞. Since ψ is increasing and convex, c l − c u → ∞ when E[Y] → 0 or both a → 0 and b → 0. In the latter case, however c u → ∞ which is in contradiction with our assumptions and hence only the former case is valid. Thus, the first condition is satisfied by Beta distribution. Regarding Generalized Gamma distribution, note that the result depends on the choice of nuisance parameters. Trivially, if we hold c m = E[Y] constant, then c l → −∞ will not affect the expected value. Instead, we propose to hold c n and c n log b constant, where the latter term may be conceived as the offset of c l . Then c l → −∞ implies, ψ(a) → −∞, hence a → 0 and as a consequence E[Y] = exp(c n log b)a → 0. The first condition is satisfied by Generalized Gamma distribution distribution.