An Integrated Gaussian Graphical Model to evaluate the impact of exposures on metabolic networks

https://doi.org/10.1016/j.compbiomed.2019.103417Get rights and content

Highlights

  • We demonstrated that latent interactions impacted by external variables can be identified by Integrated Gaussian Graphical Model.

  • This paper implemented data integration of two heterogenous data sets, placenta trace elements and cord-blood metabolites.

  • We found associations that were corroborated in the experimental literature.

Abstract

Examining the effects of exogenous exposures on complex metabolic processes poses the unique challenge of identifying interactions among a large number of metabolites. Recent progress in the quantification of the metabolome through mass spectrometry (MS) and nuclear magnetic resonance (NMR) has given rise to high-dimensional biomedical data of specific metabolites that can be leveraged to study their effects in humans. These metabolic interactions can be evaluated using probabilistic graphical models (PGMs), which define conditional dependence and independence between components within and between heterogeneous biomedical datasets. This method allows for the detection and recovery of valuable but latent information that cannot be easily detected by other currently existing methods. Here, we develop a PGM method, referred to as an “Integrated Gaussian Graphical Model (IGGM)”, to incorporate exposure concentrations of seven trace elements—arsenic (As), lead (Pb), mercury (Hg), cadmium (Cd), zinc (Zn), selenium (Se) and copper (Cu—into metabolic networks. We first conducted a simulation study demonstrating that the integration of trace elements into metabolomics data can improve the accuracy of detecting latent interactions of metabolites impacted by exposure in the network. We tested parameters such as sample size and the number of neighboring metabolites of a chosen trace element for their impact on the accuracy of detecting metabolite interactions. We then applied this method to measurements of cord blood plasma metabolites and placental trace elements collected from newborns in the New Hampshire Birth Cohort Study (NHBCS). We found that our approach can identify latent interactions among metabolites that are related to trace element concentrations. Application to similarly structured data may contribute to our understanding of the complex interplay between exposure-related metabolic interactions that are important for human health.

Introduction

Metabolomics is the study of all low molecular weight molecules present in biological fluids and tissues and may be the most promising of the “omics” technologies used in exposome research. Applying metabolic profiling to the examination of normal or complicated pregnancies has emerged as an innovative unsupervised approach for exploring potential biomarkers and biological mechanisms of reproductive outcomes. Pregnancy is a dynamic period consisting of a series of minute physiologic fetal adjustments over time that affect the metabolism of nutrients in an effort to facilitate fetal development [1]. Human pregnancy and development are also susceptible to the toxic effect of metals, which may stunt infant growth and cause preterm delivery [2]. Therefore, it is critical to include environmental exposures such as essential nutrients or potentially toxic heavy metals to the study of the metabolic interaction network.

Previous work has focused on developing statistical methods in Graphical Models integrating elements from heterogeneous datasets. Recently, a joint Gaussian Graphical Model (jGGM) was applied to analyze significantly interacting genes in genomics data of common features over studies of independent samples [3]. While this method adjusts penalization in joint GGM using a priori pathway information validated in the existing biology literature, it does not consider impacts from external data. Additionally, while the Ising model considers the impact of subject-specific external variables, its applications are limited to multivariate binary genomic data [4]. As such, there is a need for a GGM that can handle continuous omics data beyond genomics and consider the impact of external variables.

In this paper, we present a method, an Integrated Gaussian Graphical Model (IGGM), that integrates metabolomics and trace element data to infer a metabolic network outside the realm of genomics. Our proposed method will allow us to conduct an integrative analysis of how trace elements affect metabolites and how metabolites interact with each other. We first use a simulation to demonstrate that this integrated approach is more powerful in estimating latent interactions of metabolites impacted by exposure variables than GGM, which estimates the network based only on metabolomics data. We then examine the optimal set of parameters, such as sample size and the number of strongly correlated neighbor metabolites of each trace element. Finally, we assess the estimated metabolic pathway consisting of the most statistically significant metabolic interactions detected by our proposed method and discuss these newly detected interactions of metabolites in the context of known associations in the literature.

Section snippets

Gaussian Graphical Model (GGM) and least absolute shrinkage and selection operator (Lasso) implementation

We start with GGM. We assume that the data are randomly sampled observational data from a multivariate normal distribution. Specifically, let X be a random normal p-dimensional vector and X1,X2,Xp denote the p features. Let X(k) be the vector of feature values for the kth sample. We assume that X~Np (0, Σ) and Σ represent a positive definite covariance matrix. Let Ω  =  wij be the precision matrix, which is defined as the inverse of the covariance matrix Σ. Let S be the empirical covariance

Comparisons of methods over different parameters and conditions for simulated data

We evaluated the performance of the proposed IGGM, GGM-PE, GGM, Integrated Ising and Ising methods in edge recovery with simulation settings varying by sample size and the number of neighbors of external variables, assuming strong correlations between each external variable and its selected features. We first considered the level of accuracy of recovering edges considering different sample sizes and the number of neighbors of external variables. We then evaluated how the different numbers of

Discussion

Investigating the complex structures occurring in biological systems remains an emerging area of methodologic research. Recent progress in high-dimensional biomedical data analysis approaches has enabled quantification of metabolomic data from broad-spectrum metabolomics from mass spectrometry (MS) [23,24] and nuclear magnetic resonance (NMR) [[25], [26], [27]]. Advances in statistical analysis [28], network analysis [29] and software development [30] have resulted in a better understanding of

Disclosures and ethics

As a requirement of publication, the author(s) have provided to the publisher signed a confirmation of compliance with legal and ethical obligations, including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that

Conflicts of interest

As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this

Acknowledgments

This study is funded in part by the following grants: R01LM012012 and R01LM012723 from the National Library of Medicine, P20GM104416 from the National Institute of General Medical Sciences, P42007373 and P01ES022832 from the National Institute of Environmental Health Sciences, RD8354201 from the Environmental Protection Agency and R25CA134286 from the National Cancer Institute.

References (42)

  • D.R. Mattison

    Environmental exposures and development

    Curr. Opin. Pediatr.

    (2010)
  • P. Danaher et al.

    The joint graphical lasso for inverse covariance estimation across multiple classes

    J. R. Stat. Ser. Soc. B Stat. Methodol.

    (2014)
  • J. Cheng et al.

    A sparse Ising model with covariates

    Biometrics

    (2014)
  • J. Friedman et al.

    Regularization paths for generalized linear models via coordinate descent

    J. Stat. Softw.

    (2010)
  • Robert Tibshirani et al.

    Tibshirani. Strong rules for discarding predictors in lasso-type problems

    J. R. Stat. Soc. Ser. B

    (2010)
  • Glmnet Vignette et al.
  • T. Punshon et al.

    Placental metal concentrations in relation to maternal and infant toenails in a U.S. Cohort

    Environ. Sci. Technol.

    (2016)
  • B.H. Walsh et al.

    The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy

    PLoS One

    (2012)
  • B. Englich

    Maternal cytokine status may prime the metabolic profile and increase risk of obesity in children

    Int. J. Obes.

    (2017)
  • P. Ravikumar et al.

    High-dimensional Ising model selection using l1-regularized logistic regression

    Annu. Stat.

    (2010)
  • P. Theriappan et al.

    Accumulation of proline under salinity and heavy metal stress in cauliflower seedlings

    J. Appl. Sci. Environ. Manag.

    (2011)
  • View full text