An Integrated Gaussian Graphical Model to evaluate the impact of exposures on metabolic networks
Introduction
Metabolomics is the study of all low molecular weight molecules present in biological fluids and tissues and may be the most promising of the “omics” technologies used in exposome research. Applying metabolic profiling to the examination of normal or complicated pregnancies has emerged as an innovative unsupervised approach for exploring potential biomarkers and biological mechanisms of reproductive outcomes. Pregnancy is a dynamic period consisting of a series of minute physiologic fetal adjustments over time that affect the metabolism of nutrients in an effort to facilitate fetal development [1]. Human pregnancy and development are also susceptible to the toxic effect of metals, which may stunt infant growth and cause preterm delivery [2]. Therefore, it is critical to include environmental exposures such as essential nutrients or potentially toxic heavy metals to the study of the metabolic interaction network.
Previous work has focused on developing statistical methods in Graphical Models integrating elements from heterogeneous datasets. Recently, a joint Gaussian Graphical Model (jGGM) was applied to analyze significantly interacting genes in genomics data of common features over studies of independent samples [3]. While this method adjusts penalization in joint GGM using a priori pathway information validated in the existing biology literature, it does not consider impacts from external data. Additionally, while the Ising model considers the impact of subject-specific external variables, its applications are limited to multivariate binary genomic data [4]. As such, there is a need for a GGM that can handle continuous omics data beyond genomics and consider the impact of external variables.
In this paper, we present a method, an Integrated Gaussian Graphical Model (IGGM), that integrates metabolomics and trace element data to infer a metabolic network outside the realm of genomics. Our proposed method will allow us to conduct an integrative analysis of how trace elements affect metabolites and how metabolites interact with each other. We first use a simulation to demonstrate that this integrated approach is more powerful in estimating latent interactions of metabolites impacted by exposure variables than GGM, which estimates the network based only on metabolomics data. We then examine the optimal set of parameters, such as sample size and the number of strongly correlated neighbor metabolites of each trace element. Finally, we assess the estimated metabolic pathway consisting of the most statistically significant metabolic interactions detected by our proposed method and discuss these newly detected interactions of metabolites in the context of known associations in the literature.
Section snippets
Gaussian Graphical Model (GGM) and least absolute shrinkage and selection operator (Lasso) implementation
We start with GGM. We assume that the data are randomly sampled observational data from a multivariate normal distribution. Specifically, let X be a random normal p-dimensional vector and denote the p features. Let be the vector of feature values for the kth sample. We assume that (0, ) and represent a positive definite covariance matrix. Let = be the precision matrix, which is defined as the inverse of the covariance matrix . Let S be the empirical covariance
Comparisons of methods over different parameters and conditions for simulated data
We evaluated the performance of the proposed IGGM, GGM-PE, GGM, Integrated Ising and Ising methods in edge recovery with simulation settings varying by sample size and the number of neighbors of external variables, assuming strong correlations between each external variable and its selected features. We first considered the level of accuracy of recovering edges considering different sample sizes and the number of neighbors of external variables. We then evaluated how the different numbers of
Discussion
Investigating the complex structures occurring in biological systems remains an emerging area of methodologic research. Recent progress in high-dimensional biomedical data analysis approaches has enabled quantification of metabolomic data from broad-spectrum metabolomics from mass spectrometry (MS) [23,24] and nuclear magnetic resonance (NMR) [[25], [26], [27]]. Advances in statistical analysis [28], network analysis [29] and software development [30] have resulted in a better understanding of
Disclosures and ethics
As a requirement of publication, the author(s) have provided to the publisher signed a confirmation of compliance with legal and ethical obligations, including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that
Conflicts of interest
As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this
Acknowledgments
This study is funded in part by the following grants: R01LM012012 and R01LM012723 from the National Library of Medicine, P20GM104416 from the National Institute of General Medical Sciences, P42007373 and P01ES022832 from the National Institute of Environmental Health Sciences, RD8354201 from the Environmental Protection Agency and R25CA134286 from the National Cancer Institute.
References (42)
Physiology of pregnancy and nutrient metabolism
Am. J. Clin. Nutr.
(May 2000)- et al.
The effect of haemolysis on the metabolomic profile of umbilical cord blood
Clin. Biochem.
(2015) The effect of arsenic contamination on amino acids metabolism in Spinacia oleracea L
Ecotoxicol. Environ. Saf.
(2010)- et al.
Metabolism of lysine in α-aminoadipic semialdehyde dehydrogenase-deficient fibroblasts: evidence for an alternative pathway of pipecolic acid formation
FEBS (Fed. Eur. Biochem. Soc.) Lett.
(2010) Recent developments in inductively coupled plasma magnetic sector multiple collector mass spectrometry
Int. J. Mass Spectrom. Ion Process.
(1995)- et al.
Recent advances and new strategies in the NMR-based identification of natural products
Curr. Opin. Biotechnol.
(2014) - et al.
Disentangling interactions in the microbiome: a network perspective
Trends Microbiol.
(2017) - et al.
Computational approaches for systems metabolomics
Curr. Opin. Biotechnol.
(2016) - et al.
More is better: recent progress in multi-omics data integration methods
Front. Genet.
(2017) - et al.
Designing and interpreting ‘multi-omic’ experiments that may change our understanding of biology
Curr. Opin. Struct. Biol.
(2017)
Environmental exposures and development
Curr. Opin. Pediatr.
The joint graphical lasso for inverse covariance estimation across multiple classes
J. R. Stat. Ser. Soc. B Stat. Methodol.
A sparse Ising model with covariates
Biometrics
Regularization paths for generalized linear models via coordinate descent
J. Stat. Softw.
Tibshirani. Strong rules for discarding predictors in lasso-type problems
J. R. Stat. Soc. Ser. B
Placental metal concentrations in relation to maternal and infant toenails in a U.S. Cohort
Environ. Sci. Technol.
The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy
PLoS One
Maternal cytokine status may prime the metabolic profile and increase risk of obesity in children
Int. J. Obes.
High-dimensional Ising model selection using l1-regularized logistic regression
Annu. Stat.
Accumulation of proline under salinity and heavy metal stress in cauliflower seedlings
J. Appl. Sci. Environ. Manag.
Cited by (4)
Applications of Big Data and AI-Driven Technologies in CADD (Computer-Aided Drug Design)
2024, Methods in Molecular Biology