Consider the common case for which the null hypothesis is the one defined by a subset of lower dimensionality. In this case, we use a surface integral to normalize the values of the prior density in the null set so that the sum or integral of these values is equal to unity. Figure 1 illustrates how this procedure is carried out. Recall that a prior density can be seen as a preference system in the parameter space. That preference system must be kept even within the null hypothesis; coherence in access to prior distributions is crucial. Further details on this procedure can be found in [ 16 18 ]. Later, Dawid, and Lauritzen [ 20 ] considered multiple ways of obtaining compatible priors under alternative models (hypotheses). The “conditioning” approach described by Dawid and Lauritzen is equivalent to the technique presented here. Dickey [ 21 ] used a similar approach previously, but in a more parameterization-dependent way.

a density on the subset that has the smaller dimension. The choice of this density should be coherent with the original prior density over the global parameter space Θ .

With the statistical model defined, a partition of the parameter space is defined by the consideration of a null hypothesis that is to be compared to its alternative:In the case of composite hypotheses with the partition elements having the same dimensionality, the model would then be complete. Such cases would not involve partitions for which there are components of zero Lebesgue measure. In the case of precise or “sharp” hypotheses, that is, the partition components having different dimensionalities, other elements must be added:

As usual, let x and θ be random vectors (could be scalars) x ∈ X ⊂ R s , X being the sample space, and θ ∈ Θ ⊂ R k , Θ being the parameter space, and s and k being positive integers. To state the relation between the two random vectors, the statistician considers the following: a family of probability density functions indexed by the conditioning parameter θ , { f ( x | θ ) ; θ ∈ Θ } ; a prior probability density function g ( θ ) on the entire parameter space Θ , and the posterior density function g ( θ | x ) . In order to be appropriate, the family of likelihood functions indexed by x , { L ( θ | x ) = f ( x | θ ) ; x ∈ X } , must be measurable in the prior σ -algebra.

2.2. Significance Index

H. We begin this section by stating a generalization of the Neyman–Pearson Lemma, as presented by DeGroot [ p -value as an evidence measure, and Evans [ p -value is the most widely used significance index across diverse fields of study, including almost all scientific areas. In the present work, we present a replacement for the classical p -value has a number of advantages that will be described here and in future work. The conceptual and operational similarity between classical hypothesis tests as currently used and the new tests could potentially help researchers accept and use the new tests. By “significance index”, we mean a real function over the sample space that is used as an evidence measure for decision-making with respect to accepting or rejecting the null hypothesis,. We begin this section by stating a generalization of the Neyman–Pearson Lemma, as presented by DeGroot [ 19 ]. Cox [ 22 23 ] also considers the classical-value as an evidence measure, and Evans [ 24 ] considers evidence measures in general, outlines the relative belief theory developed in the references of that paper, and suggests that the associated evidence measure could have advantages over other measures of evidence and be the basis of a complete approach to estimation and hypothesis-assessment problems. The classical-value is the most widely used significance index across diverse fields of study, including almost all scientific areas. In the present work, we present a replacement for the classical-value has a number of advantages that will be described here and in future work. The conceptual and operational similarity between classical hypothesis tests as currently used and the new tests could potentially help researchers accept and use the new tests.

f H ( x ) and f A ( x ) be probability density functions over the sample space X . The decision problem is to choose one of these densities as being the true generator of the observed data. Consider now a binary function δ ( x ) used to define the decision procedure. Defining a partition of the sample space with X H ∪ ​ X A = X and X H ∩ ​ X A = ∅ , where X H is the non-rejection region for H. The test function is δ ( x ) = { 0 , if x ∈ X H 1 , if x ∈ X A . (2) Letbe probability density functions over the sample space. The decision problem is to choose one of these densities as being the true generator of the observed data. Consider now a binary functionused to define the decision procedure. Defining a partition of the sample space withand, whereis the non-rejection region for. The test function is

A and B , with A > B , A = B and A < B meaning, respectively, preference for the null hypothesis, indifference, and preference for the alternative. The decision rule is then to reject the null hypothesis, H , whenever δ ( x ) = 1 , and not to reject otherwise. The following theorem, a generalized version of the Neyman–Pearson Lemma presented in the textbook by DeGroot [ α ( δ ) = Pr { rejecting H | H is true } = Pr { δ ( x ) = 1 | H } (3) β ( δ ) = Pr { not rejecting H | H is false } = Pr { δ ( x ) = 0 | A } . (4) To choose between a hypothesis and its alternative, one should first choose two positive real numbers, say, withmeaning, respectively, preference for the null hypothesis, indifference, and preference for the alternative. The decision rule is then to reject the null hypothesis,, whenever, and not to reject otherwise. The following theorem, a generalized version of the Neyman–Pearson Lemma presented in the textbook by DeGroot [ 19 ] provides a test that is optimal in the sense of minimizing a linear combination of the probabilities of the two types of errors: Type I, which is the rejection of a true hypothesis, and Type II, the non-rejection of a false hypothesis.and

Generalized Neyman–Pearson Lemma: δ * be a test that rejects H in favor of A if A f H ( x ) < B f A ( x ) , does not reject H if A f H ( x ) > B f A ( x ) , and is indifferent if A f H ( x ) = B f A ( x ) . Then, for any other test δ , A α ( δ ) + B β ( δ ) ≥ A α ( δ * ) + B β ( δ * ) . (5) Letbe a test that rejectsin favor ofif, does not rejectif, and is indifferent ifThen, for any other test

α and minimizing β , like in the Neyman–Pearson approach [ In 1957, both Lindley [ 25 ] and Bartlett [ 26 ] recognized that fixing a significance level was a major cause of problems with hypothesis tests. In 1966, Cornfield [ 27 ] advocated hypothesis tests that minimize a linear combination of error probabilities like Equation (5) rather than fixing a canonicaland minimizinglike in the Neyman–Pearson approach [ 28 ].

A α ( δ ) + B β ( δ ) , consider a loss function that is zero if the decision is correct and w A ( w H ) if the decision favors A ( H ) when H ( A ) is the true state of nature. In addition, if π is the prior probability of H and δ the test function, the risk function is r ( δ ) = w A π α ( δ ) + w H ( 1 − π ) β ( δ ) . (6) To see that Bayesian hypothesis tests minimize a linear combination of error probabilities of the formconsider a loss function that is zero if the decision is correct andif the decision favorswhenis the true state of nature. In addition, ifis the prior probability ofandthe test function, the risk function is

( π w A ) and ( 1 − π ) w H as A and B , respectively, and recalling that risk functions are to be minimized; Bayesian tests should minimize a linear combination of the form A α ( δ ) + B β ( δ ) . Both the classical and the Bayesian applications of the theorem are stated in terms of the comparison of the ratio f H f A to the constant K , given by K = B A = ( 1 − π ) w H π w A . (7) Consequently, simply identifying, respectively, and recalling that risk functions are to be minimized; Bayesian tests should minimize a linear combination of the form. Both the classical and the Bayesian applications of the theorem are stated in terms of the comparison of the ratioto the constant, given by

It is important to remember that this generalized version of the Neyman–Pearson Lemma, from the classical point of view, will only apply to simple-versus-simple hypotheses. It is not common in classical inference to consider a density function under a composite hypothesis. However, some classical methods use optimization by considering the maximum of the likelihood function both under H and under A . Recall that the likelihood function can be represented as I x = { L ( θ | x ) = f ( x | θ ) ; ∀ θ ∈ Θ } .

L plays an important role, which is not at all surprising, because it is the only mathematical object considered that defines an association between a sample x and a parameter θ . Rather than optimization, integration is the Bayesian tool applied here. With the prior densities defined, the following conditional expectations are calculated: f H ( x ) = E { L ( θ | x ) | x , θ ∈ Θ H } and f A ( x ) = E { L ( θ | x ) | x , θ ∈ Θ A } . (8) In the Bayesian paradigm, the likelihood functionplays an important role, which is not at all surprising, because it is the only mathematical object considered that defines an association between a sampleand a parameter. Rather than optimization, integration is the Bayesian tool applied here. With the prior densities defined, the following conditional expectations are calculated:

X . The ratio between the two functions is known as the Bayes factor, B F ( x ) = f H ( x ) f A ( x ) . (9) These functions are the Bayesian predictive densities under the respective hypotheses. Both are probability density functions over the sample space. The ratio between the two functions is known as the Bayes factor,

p -value, it is necessary to establish an ordering of all the points in the sample space. Montoya-Delgado et al. [ t -test” using a specific kind of prior. Both of these approaches continue to use the comparison of a Bayes factor to fixed values, such as those in the table presented by Jeffreys [ To define a confidence index, an alternative to the usual-value, it is necessary to establish an ordering of all the points in the sample space. Montoya-Delgado et al. [ 17 ] suggest the use of the Bayes factor values of all sample points to induce the necessary order. García-Donato and Chen [ 29 ] use a similar ordering of the sample space on the way to calculating Type-I and Type-II error probabilities for Bayes factor tests like those of Jeffreys [ 30 ] under a specific symmetry condition on the sampling distribution of the Bayes factor. Gu, Hoijtink, and Mulder [ 31 ] apply a similar condition, essentially holding the probabilities of the two types of error to be equal via tuning of the Bayes factor for a “Bayesian-test” using a specific kind of prior. Both of these approaches continue to use the comparison of a Bayes factor to fixed values, such as those in the table presented by Jeffreys [ 30 ] and the updated table presented by Kass and Raftery [ 32 ], to choose from competing hypotheses. The new hypothesis tests presented here adopt a criterion for choosing which hypothesis to reject that is more like the one used in familiar Neyman–Pearson testing, but with the advantage that the significance level is adaptive, that is, depends on the sample size.

The steps to perform a hypothesis test are as follows:

Define a prior density g ( θ ) over the entire parameter space Θ . This function can be chosen either objectively of subjectively.

Clearly define the hypotheses to be tested, H and A .

H, is obtained from the following expression, subject to the condition (on the parameter space as a whole and the hypotheses) that the integral in the denominator can be defined: g ( θ | H ) = { 0 i f θ ∉ Θ A g ( θ ) ∮ Θ H g ( y ) d y i f θ ∈ Θ H . (10) Obtain the predictive functions under the two alternative hypotheses. In the case for which the parametric subspaces defined by the hypotheses are of different dimensionalities, the definition of a prior density under the subset of smaller dimension, say, is obtained from the following expression, subject to the condition (on the parameter space as a whole and the hypotheses) that the integral in the denominator can be defined:

Θ H . When Θ H consists of a single point, there is no need to perform the integral. In the case of Θ H and Θ A of different dimensionalities, define an additional positive probability π that H is the true hypothesis. g ( θ | H ) is obtained from the prior g ( θ ) over the full parameter space Θ . The denominator is the surface integral over the subspace. Whenconsists of a single point, there is no need to perform the integral. In the case ofof different dimensionalities, define an additional positive probabilitythatis the true hypothesis. Figure 1 illustrates howis obtained from the priorover the full parameter space

4. Define the loss function, considering mainly the relative importance of the hypotheses and of the two types of error—consider, for example, a governor who is concerned more with the budget than with public health and who will strongly prefer the hypothesis that the apparent wave of meningitis cases in his state do not represent an epidemic. 5. Use the Bayes factor to order the sample space: { B F ( x ) : x ∈ X } ⊂ R establishes the order of each x ∈ X . This ordering can be used independently of the dimensionalities of the spaces X and Θ . 6. Using the theorem above, compute the optimal averaged error probabilities and use the value of α ( δ * ) as the adaptive level of significance, which will depend on the loss function, the probability densities, the prior probability π , and especially on the sample size. 7. Calculate the significance index, the P -value, as follows: if x 0 is the observed value of a statistic and C 0 = { x ; B F ( x ) ≤ B F ( x 0 ) } is the observed tail under the new ordering, the P -value is calculated using the expression P 0 = ∫ C 0 f H ( x ) d x . Clearly, this may be a single or a multiple integral or sum. 8. Compare the value P 0 with the value of α ( δ * ) . Reject (do not reject) H if P 0 < ( > ) α ( δ * ) . In the case of equality, take either decision without prejudice to optimality. 9. Finally, if a value of α ( δ * ) is specified a priori, calculate the sample size needed to make this fixed value as close as possible to optimal according to the generalized Neyman–Pearson Lemma.

We emphasize that it does not matter how the prior over the entire parameter space is chosen. The present work is concerned with how to perform the new hypothesis tests once an overall prior has been chosen.