Zero-inflated negative binomial regression for differential abundance testing in microbiome studies

Methods negative binomial mixed models nbmms for microbiome studies typical microbiome data generated by the 16s rrna gene sequencing or the shotgun metagenomic sequencing consist of the following components see table 1. Jul 26, 2018 additionally, microbiome studies usually collect samples longitudinally, which introduces timedependent and correlation structures among the samples and thus further complicates the analysis and interpretation of microbiome count data. Furthermore, theory suggests that the excess zeros are generated by a separate process from the count values. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. The zeroinflated negative binomial regression generates two separate models and then combines them. The proposed method utilizes an expectation maximization em algorithm, by incorporating a twopart mixture model consisting of i a negative binomial model to account for overdispersion and ii a logistic regression model to. A marginalized twopart beta regression model for microbiome. Dec 17, 2019 however, the current methods for integrating microbiome data and other covariates are severely lacking.

Urban midblock crashes are influenced mainly by traffic operation and roadway geometric features. Differential abundance analysis via zero inflated beta regression. The zero inflated negative binomial regression model suppose that for each observation, there are two possible cases. In this paper, we propose a zeroinflated negative binomial zinb regression for identifying differentially abundant taxa between two or more populations.

Recent work in this area 18 addresses the performance of parametric normalization and differential abundance testing approaches for microbial ecology. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative. Apr 24, 2019 differential abundance analysis is a crucial task in many microbiome studies, where the central goal is to identify microbiome taxa associated with certain biological or clinical conditions. My model studies the change from innovative to noninnovative firms and vice versa. There are two different modes of microbiome differential abundance analysis. The second set used modifications of a real gut microbiome dataset.

A multivariate zeroinflated logistic model for microbiome relative abundance data zhigang li1,2, katherine lee3, margaret r. Pdf zeroinflated negative binomial regression for differential. Recently, a number of zeroinflated zi models have been proposed to analyze zeroinflated microbiome count data, including the zi gaussian model paulson et al. The test is built on a zero inflated negative binomial regression model and winsorized count data to account for zero inflation and outliers. An application of the bayesian hierarchical negative binomial model. First, a logit model is generated for the certain zero cases described above, predicting whether or not a student would be in this group.

This paper therefore examines how various normalization and differential abundance testing procedures available in the literature are affected by the challenges inherent in microbiome data. To determine the specific otus driving compositional differences, tests of differential otu abundance were performed using zeroinflated negative binomial regression. Additionally, microbiome studies usually collect samples longitudinally, which introduces timedependent and correlation structures among the samples and thus further complicates the analysis and interpretation of microbiome count data. Fillon 4 4 1 department of biostatistics and informatics, colorado school of public health, 5 university of colorado denver, aurora, colorado, usa.

A bayesian zeroinflated negative binomial regression model. Estimation of claim count data using negative binomial. Hence, we present an integrative bayesian zero inflated negative binomial regression model that can both distinguish differentially abundant taxa with distinct phenotypes and quantify covariatetaxa effects. A multivariate zeroinflated logistic model for microbiome. Data from 16s ribosomal rna rrna amplicon sequencing present challenges to ecological and statistical interpretation. The smallest aic values among all fitting models are displayed in. In this paper, we propose a betabinomial model for this task. The zeroinflated negative binomial regression model suppose that for each observation, there are two possible cases.

However, if case 2 occurs, counts including zeros are generated according to the negative binomial model. Existing methods fail to pinpoint the degree of association. In the current paper, we consider an example from horticulture and use it to mo tivate adaptations of lamberts 1992 zeroinflated poisson zip regression models. The zero inflated models include zip and zinb and assume that for each observation, there are two possible data generation processes with the result of a bernoulli trial determining which process is used.

Zinb regression is obtained by mixing a distribution degenerate at zero with a nb distribution, by allowing the incorporation of explanatory variables in both the zero process. Bayesian zeroinflated negative binomial regression model for. Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Fitting a zeroinflated negative binomial regression with r. Accounting for excess zeros and sample selection in poisson and negative binomial regression models. Bayesian modeling of microbiome data for differential. Recently, a number of zero inflated zi models have been proposed to analyze zero inflated microbiome count data, including the zi gaussian model paulson et al. It reports on the regression equation as well as the confidence limits and likelihood. Since beta distribution has a wide range of different shapes depending on the values of two parameters, beta regression models ferrari and cribarineto, 2004 are very useful when the response variables are continuous and restricted to the interval 0,1. Thats why i am searching for a stata command to do a zeroinflated negative binomial regression. Testing in microbiomeprofiling studies with mirkat, the microbiome regression. However, reproducibility has been lacking due to the myriad of different experimental and computational approaches. Application of zeroinflated negative binomial mixed model. Proper testing of overdispersion and zeroinflated microbiome data is challenging.

Assume that x follows a beta distribution denoted as x. Negative binomial mixed models for analyzing microbiome count. Differential abundance analysis is a crucial task in many microbiome studies, where the central goal is to identify microbiome taxa associated with certain biological or clinical conditions. Zeroinflated negative binomial regression for differential abundance testing in microbiome studies. Application of zeroinflated negative binomial mixed model to. Zeroinflated negative binomial regression sas data. Apr 16, 2019 the rapid growth of highthroughput sequencingbased microbiome profiling has yielded tremendous insights into human health and physiology. Multiple stressors interact primarily through antagonism. Zeroinflated beta regression for differential abundance analysis with metagenomics data. On the other hand, several zeroinflated models have also been proposed to correct for excess zero counts in microbiome measurements, including zeroinflated gaussian, lognormal. Assume that x follows a beta distribution denoted as. Normalization and microbial differential abundance strategies. Rpubs models for excess zeros using pscl package hurdle.

Zeroinflated beta regression for differential abundance. Frontiers an adaptive multivariate twosample test with. First, it characterizes the overdispersion and zeroinflation frequently observed in microbiome count data by introducing a zeroinflated negative binomial zinb model. Several zeroinflated models were proposed to correct for excess zero counts in microbiome measurements, including zeroinflated gaussian, lognormal. The test is built on a zeroinflated negative binomial regression model and winsorized count data to account for zeroinflation and outliers. Negative binomial mixed models for analyzing microbiome. Conditional regression based on a multivariate zeroinflated logisticnormal model for microbiome relative abundance data 10 july 2018 statistics in biosciences, vol. Zeroinflated negative binomial model for panel data. Zeroinflated negative binomial model for panel data 23 mar 2017. Previous zeroinflated models for differential abundance analysis focused on testing the change in the abundance mean of the nonzero component andor prevalence probability of the nonzero component, treating the dispersion as a nuisance parameter chen and li, 2016. A bayesian zeroinflated negative binomial regression. Fitting the zeroinflated binomial model to overdispersed binomial data as with count models, such as poisson and negative binomial models, overdispersion can also be seen in binomial models, such as logistic and probit models, meaning that the amount of variability in the data exceeds that of the binomial distribution.

Bayesian hierarchical negative binomial models for multivariable. Multivariate random parameters zeroinflated negative. Zero inflated negative binomialgeneralized exponential. Zeroinflated poisson and binomial regression with random. Zeroinflated negative binomial regression for differential abundance testing in microbiome studies x zhang, h mallick, n yi journal of bioinformatics and genomics 2 2, 2016. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. After doing further research outside of the thread, i have come to the conclusion that a zero inflated negative binomial model is likely the best fit given that i believe there are two processes generating the data. In this thread, i laid out a problem involving fitting a model that attempts to use minor league baseball statistics to predict success at the major league level explained in full in the thread.

Differential abundance analysis via zeroinflated beta regression. Working paper ec9410, department of economics, stern school of business, new york university. Fillon 4 4 1 department of biostatistics and informatics, colorado school of public health, 5 university of colorado denver, aurora, colorado, usa 6 2 department of pediatrics, division of pulmonology, university of colorado. In this paper, 10year crash data from 1,506 directional urban midblock segments in nebraska were analyzed using the multivariate random parameters zero inflated negative binomial model to account for unobserved heterogeneity produced by correlations across segments, correlations across crash. Xinyan zhang, himel mallick, and nengjun yi 2016 zeroinflated negative binomial regression for differential abundance testing in microbiome studies. The rapid growth of highthroughput sequencingbased microbiome profiling has yielded tremendous insights into human health and physiology. The aics of different methods for data simulated under zip distribution with. Hypothesis testing and statistical analysis of microbiome. Current rnaseq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or undersampling of the microbes. Oct 12, 2017 we introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion. In this paper, we propose a likelihood ratio test for testing the association between the relative abundance of bacteria and disease covariate for microbiome data while using a generalized zero. Bayesian zeroinflated negative binomial regression model.

Zeroinflated negative binomial regression for differential abundance testing in microbiome studies xinyan zhang, himel mallick. In this article, we propose negative binomial mixed models nbmms for longitudinal microbiome studies. Assessment and selection of competing models for zero. Frontiers negative binomial mixed models for analyzing. The appropriateness of using zeroinflated model in gut microbiome study was assessed by extensive simulations and applied in a real human microbiome study. Zeroinflated negative binomial regression for differential abundance testing in microbiome studies article pdf available december 2016 with 544 reads how we measure reads. Jan 27, 2016 conditional regression based on a multivariate zero inflated logisticnormal model for microbiome relative abundance data 10 july 2018 statistics in biosciences, vol. Models for excess zeros using pscl package hurdle and zeroinflated regression models and their interpretations by kazuki yoshida last updated over 6 years ago. Normalization and microbial differential abundance. Furthermore, theory suggests that the excess zeros are generated by a separate process from the count values and that the excess zeros can be modeled independently. We introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion.

Data generated from highthroughput sequencing of 16s rrna gene amplicons are often preprocessed into composition or relative abundance. It performs a comprehensive residual analysis including diagnostic residual reports and plots. Proper testing of overdispersion and zero inflated microbiome data is challenging. As an alternative to zip regression, one may consider zeroinflated negative binomial zinb regression if the count data continue to suggest additional overdispersion. An omnibus test for differential distribution analysis of. Additionally, microbiome studies usually collect samples. However, reproducibility has been lacking due to the myriad of different experimental and computational. Fitting a zeroinflated negative binomial regression with. Zeroinflated negative binomial regression for differential. Typical data in a microbiome study consist of the operational taxonomic unit otu counts that have the characteristic of excess zeros, which are often ignored by investigators. The negative binomial regression, which is a standard statistical method for analyzing overdispersed count observations, has been recently applied to microbiome data.

A marginalized twopart model for semicontinuous data. Zeroinflated beta regression for differential abundance analysis. I have overdispersion and i do have excess zeros more than 40%. The numbers are the mean of the aics for replications. We begin with the conventional twopart model with a beta component in part ii 1012.

As an alternative to zip regression, one may consider zero inflated negative binomial zinb regression if the count data continue to suggest additional overdispersion. Zero inflated negative binomial regression is for modeling count variables with excessive zeros and it is usually for overdispersed count outcome variables. Microbial longitudinal studies are powerful experimental designs utilized to classify diseases, determine prognosis, and analyze microbial systems dynamics. A bayesian zeroinflated negative binomial regression model for the integrative analysis of microbiome data. These confounding variables need to be adjusted for more accurate differential abundance analysis. Second, it models the heterogeneity from different sequencing depths, covariate effects, and group effects via a loglinear regression framework on the zinb mean components. For a given operational taxonomic unit otu, let y i denote its semicontinuous relative abundance for subject i, where 0. After doing further research outside of the thread, i have come to the conclusion that a zeroinflated negative binomial model is likely the best fit given that i believe there are two processes. Inflation model this indicates that the inflated model is a logit model, predicting a latent binary outcome. Results we introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion. The proposed method utilizes an expectation maximization em algorithm, by incorporating a twopart mixture model consisting of i a negative binomial model to account for overdispersion and ii a logistic regression model to account for excessive zero counts. In this paper, 10year crash data from 1,506 directional urban midblock segments in nebraska were analyzed using the multivariate random parameters zeroinflated negative binomial model to account for unobserved heterogeneity produced by correlations across segments, correlations across crash.

The current breadth of microbiome research is founded upon recent. In this paper, we propose a zero inflated negative binomial zinb regression for identifying differentially abundant taxa between two or more populations. In longitudinal studies, only identifying differential features between two phenotypes does not provide sufficient information to determine whether a change in the relative abundance is shortterm or continuous. Hoen1,2,4, hongzhe li6 1department of biomedical data science, geisel school of medicine at dartmouth, 1 medical center drive, lebanon, nh 03756, usa, 2childrens environmental health and disease. For the remaining four otus a zeroinflated negative binomial regression was used to account for the excess of zeros in the data using glmmtmb v0. Download citation zeroinflated beta regression for differential abundance analysis with metagenomics data metagenomics data have been growing rapidly due to the advances in ngs technologies. This program computes zip regression on both numeric and categorical variables.

In this paper, we propose a new zero inflated distribution, namely, the zero inflated negative binomialgeneralized exponential zinbge distribution. Zeroinflated negative binomial regression stata annotated. Converging statistical and biological evidence suggests that the countbased zeroinflated model is more appropriate for microbiome data. In the context of microbiome studies, this problem arises when researchers wish to use a sample from a population of microbes to estimate the population proportion of a particular taxon, known as the taxons relative abundance. The appropriateness of using zero inflated model in gut microbiome study was assessed by extensive simulations and applied in a real human microbiome study. Fitting the zero inflated binomial model to overdispersed binomial data as with count models, such as poisson and negative binomial models, overdispersion can also be seen in binomial models, such as logistic and probit models, meaning that the amount of variability in the data exceeds that of the binomial distribution. The new distribution is used for count data with extra zeros and is an alternative for data analysis with overdispersed count data. In this paper, we propose a zeroinflated negative binomial. Zeroinflated negative binomial regression number of obs e 316 nonzero obs f 254 zero obs g 62 inflation model c logit lr chi23 h 18. Zeroinflated negative binomial regression is for modeling count variables with excessive zeros and it is usually for overdispersed count outcome variables. Several zeroinflated models were proposed to correct for excess zero counts in microbiome measurements, including zeroinflated gaussian, lognormal, negative binomial, and beta models paulson et.

707 425 447 362 1508 1581 666 713 426 109 510 1125 1416 1279 799 1242 617 909 370 1252 849 1486 91 933 7 1616 144 870 1480 956 125 813 417 1259 1001 773 1093