Bayesian model selection based on proper scoring rules

Personal, or subjective, probabilities are used as inputs to many inferential and decision-making models, and various procedures have been developed for the elicitation of such probabilities. Included among these elicitation procedures are scoring rules, which involve the computation of a score based on the assessor's stated probabilities and on the event that actually occurs.

The development of scoring rules has, in general, been restricted to the elicitation of discrete probability distributions. In this paper, families of scoring rules for the elicitation of continuous probability distributions are developed and discussed. Authors: James E. MathesonRobert L. Winkler James E. Search Search.

Scoring Rules for Continuous Probability Distributions

Volume 66, Issue 10 October Volume 66, Issue 9 September Volume 66, Issue 8 August Volume 66, Issue 7 July Volume 66, Issue 6 June Volume 66, Issue 5 May Volume 66, Issue 4 April Volume 66, Issue 3 March Volume 66, Issue 2 February Volume 66, Issue 1 January View PDF. Go to Section.A statistical analysis plan SAP is a critical link between how a clinical trial is conducted and the clinical study report.

To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques.

To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP.

We focus on generalized linear mixed models GLMMs for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial.

The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios.

It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data.

The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process.

The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. Writing a good SAP for a model-based sensitivity or ancillary analysis [ 12 ] involves non-trivial decisions on and justification of many aspects of the chosen model setting.

In particular, trials with longitudinal count data as primary endpoint pose challenges for model choice and model validation. This paper explores tools for this decision process when sensitivity analyses are performed using generalized linear mixed models GLMMs for the analysis of longitudinal count data.

These tools can be used to build transparent strategies for shaping the final models reported in the SAP. The documentation of longitudinal profiles for the primary endpoint offers many advantages. Treatment effects evaluated by comparing change over time in quantitative outcome variables between the treatment groups are of great interest [ 45 ]. The analysis of longitudinal profiles offers an effective way to handle composite endpoints like: 1.

We are interested in parametric modeling approaches for quantifying absolute effects, adjusting for baseline covariates and handling stratification. There is a rich literature on nonparametric methods for longitudinal data, for example, Brunner et al. These models do, in general, allow estimation of relative effects. Omar et al. Mixed effects or random effects models allow us to investigate the profile of individual patients, estimate patient effects and describe the heterogeneity of treatment effects over individual patients.

Lesson 18: Bayesian Model Selection

They account for different sources of variation patient effects, center effects, measurement errors and provide direct estimates of the variance components which might be of interest in their own right.

Furthermore, they allow us to address various covariance structures and are useful for accommodating overdispersion often observed among count response data [ 10 - 12 ]. Mixed models are also helpful for handling missing values.

Also, for non-ignorable missing data mechanisms, newer model-based strategies for longitudinal analyses are increasingly available and offer the opportunity to account for dropout patterns e. To be fully compatible with the intention-to-treat ITT principle, one has to explicitly consider incomplete individual profiles to correctly incorporate the information available for all randomized patients.

These points in summary may explain why our interest focuses on GLMMs as a powerful tool for the sensitivity analysis of longitudinal count data.In statisticsthe use of Bayes factors is a Bayesian alternative to classical hypothesis testing. The models under consideration are statistical models.

The Bayes factor is a likelihood ratio of the marginal likelihood of two competing hypotheses, usually a null and an alternative. If instead of the Bayes factor integral, the likelihood corresponding to the maximum likelihood estimate of the parameter for each statistical model is used, then the test becomes a classical likelihood-ratio test.

Unlike a likelihood-ratio test, this Bayesian model comparison does not depend on any single set of parameters, as it integrates over all parameters in each model with respect to the respective priors. However, an advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. For models where an explicit version of the likelihood is not available or too costly to evaluate numerically, approximate Bayesian computation can be used for model selection in a Bayesian framework, [7] with the caveat that approximate-Bayesian estimates of Bayes factors are often biased.

Note that classical hypothesis testing gives one hypothesis or model preferred status the 'null hypothesis'and only considers evidence against it. Harold Jeffreys gave a scale for interpretation of K : [9]. The second column gives the corresponding weights of evidence in decihartleys also known as decibans ; bits are added in the third column for clarity.

According to I.

bayesian model selection based on proper scoring rules

An alternative table, widely cited, is provided by Kass and Raftery : [6]. Suppose we have a random variable that produces either a success or a failure. We take a sample ofand find successes and 85 failures. The likelihood can be calculated according to the binomial distribution :.

The ratio is then 1. A frequentist hypothesis test of M 1 here considered as a null hypothesis would have produced a very different result. Note that is more than two standard deviations away from Note, however, that a non-uniform prior for example one that reflects the fact that you expect the number of success and failures to be of the same order of magnitude could result in a Bayes factor that is more in agreement with the frequentist hypothesis test.

That gives a likelihood ratio of 0. M 2 is a more complex model than M 1 because it has a free parameter which allows it to model the data more closely. The ability of Bayes factors to take this into account is a reason why Bayesian inference has been put forward as a theoretical justification for and generalisation of Occam's razorreducing Type I errors.

On the other hand, the modern method of relative likelihood takes into account the number of free parameters in the models, unlike the classical likelihood ratio. The relative likelihood method could be applied as follows.

Hence M 1 is about exp 7. Thus M 2 is slightly preferred, but M 1 cannot be excluded. From Wikipedia, the free encyclopedia. A statistical factor used to compare competing hypotheses.

Criteria for Bayesian model choice with application to variable selection

See guidance in Wikipedia:Summary style.It is called proper if honesty is your best policy, i. It is closely connected with likelihood inference, with communication theory, and with minimum description length model selection. However, every statistical decision problem induces a proper scoring rule, so there is a very wide variety of these. Many of them have additional interesting structure and properties. At a theoretical level, any proper scoring rule can be used as a foundational basis for the theory of subjective probability.

At an applied level a proper scoring can be used to compare and improve probability forecasts, and, in a parametric setting, as an alternative tool for inference. In this article we give an overview of some uses of proper scoring rules in statistical inference, including frequentist estimation theory and Bayesian model selection with improper priors. This is a preview of subscription content, log in to check access.

bayesian model selection based on proper scoring rules

Rent this article via DeepDyve. Almeida, M. Barndorff-Nielsen, O. Google Scholar. Basu, A. Biometrika 85—59 Berger, J. Besag, J. Royal Stat. D The Statistician 24—95 Brier, G. Weather Rev. Dawid, A. In: Kotz, S. Encyclopedia of statistical sciences, vol. Wiley-Interscience, New York University of Tokyo AStA Adv. Good, I.Philip Dawid Search this author in:.

Bayesian model selection with improper priors is not well-defined because of the dependence of the marginal likelihood on the arbitrary scaling constants of the within-model prior densities. We show how this problem can be evaded by replacing marginal log-likelihood by a homogeneous proper scoring rule, which is insensitive to the scaling constants.

Suitably applied, this will typically enable consistent selection of the true model. Source Bayesian Anal. Zentralblatt MATH identifier Dawid, A. Philip; Musio, Monica. Bayesian Anal. More by A. Abstract Article info and citation First page References See also Abstract Bayesian model selection with improper priors is not well-defined because of the dependence of the marginal likelihood on the arbitrary scaling constants of the within-model prior densities.

Article information Source Bayesian Anal. Export citation. Export Cancel. References Bayarri, M. Comment on Article by Dawid and Musio. Digital Object Identifier: doi You have access to this content. You have partial access to this content. You do not have access to this content. More like this.Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly.

DOI: Bayarri and J. Berger and A. Forte and G. BayarriJ. In objective Bayesian model selection, no single criterion has emerged as dominant in defining objective prior distributions. Indeed, many criteria have been separately proposed and utilized to propose differing prior choices.

We first formalize the most general and compelling of the various criteria that have been suggested, together with a new criterion. View PDF on arXiv. Save to Library. Create Alert. Launch Research Feed. Share This Paper. Top 3 of Citations View All Comparison of objective Bayes factors for variable selection in parametric regression models for survival analysis.

Cabras, M. Castellanos, S. Perra Statistics in medicine Johnson, B. Mallick, … S. Goddard Safe models for risky decisions. Steingroever Citation Type. Has PDF. Publication Type. More Filters. Comparison of objective Bayes factors for variable selection in parametric regression models for survival analysis. Research Feed. View 1 excerpt, cites methods. View 1 excerpt, cites background. Transmit precoding and Bayesian detection for cognitive radio networks with limited channel state information.Walker More by Fabrizio Leisen Search this author in:.

This article is in its final form and can be cited using the date of online publication and the DOI. Objective prior distributions represent an important tool that allows one to have the advantages of using a Bayesian framework even when information about the parameters of a model is not available.

The usual objective approaches work off the chosen statistical model and in the majority of cases the resulting prior is improper, which can pose limitations to a practical implementation, even when the complexity of the model is moderate. In this paper we propose to take a novel look at the construction of objective prior distributions, where the connection with a chosen sampling distribution model is removed.

We explore the notion of defining objective prior distributions which allow one to have some degree of flexibility, in particular in exhibiting some desirable features, such as being proper, or log-concave, convex etc.

Bayes factor

The basic tool we use are proper scoring rules and the main result is a class of objective prior distributions that can be employed in scenarios where the usual model based priors fail, such as mixture models and model selection via Bayes factors. In addition, we show that the proposed class of priors is the result of minimising the information it contains, providing solid interpretation to the method.

Source Bayesian Anal. Keywords calculus of variation differential entropy Euler—Lagrange equation Fisher information invariance objective Bayes proper scoring rules.

Rights Creative Commons Attribution 4. Bayesian Anal. More by Stephen G. Advance publication This article is in its final form and can be cited using the date of online publication and the DOI. Abstract Article info and citation First page References Supplemental materials Abstract Objective prior distributions represent an important tool that allows one to have the advantages of using a Bayesian framework even when information about the parameters of a model is not available.

No document with DOI "10.1.1.1049.1392"

Article information Source Bayesian Anal. Export citation. Export Cancel. References Berger, J. Berger, J.

bayesian model selection based on proper scoring rules

Zentralblatt MATH: Supplementary Material. Digital Object Identifier: doi You have access to this content. You have partial access to this content. You do not have access to this content.


thoughts on “Bayesian model selection based on proper scoring rules”

Leave a Reply

Your email address will not be published. Required fields are marked *