Qiwei Li, Xinlei Wang, Faming Liang, Faliu Yi, Yang Xie, Adi Gazdar, Guanghua Xiao
Digital pathology imaging of tumor tissues, which captures histological details in high resolution, is fast becoming a routine clinical procedure. Recent developments in deep-learning methods have enabled the identification, characterization, and classification of individual cells from pathology images analysis at a large scale. This creates new opportunities to study the spatial patterns of and interactions among different types of cells. Reliable statistical approaches to modeling such spatial patterns and interactions can provide insight into tumor progression and shed light on the biological mechanisms of cancer...
May 18, 2018: Biostatistics
Amy Ming-Fang Yen, Hsiu-Hsi Chen
Multistate Markov regression models used for quantifying the effect size of state-specific covariates pertaining to the dynamics of multistate outcomes have gained popularity. However, the measurements of multistate outcome are prone to the errors of classification, particularly when a population-based survey/research is involved with proxy measurements of outcome due to cost consideration. Such a misclassification may affect the effect size of relevant covariates such as odds ratio used in the field of epidemiology...
May 21, 2018: Statistics in Medicine
Matthew Willetts, Sven Hollowell, Louis Aslett, Chris Holmes, Aiden Doherty
Current public health guidelines on physical activity and sleep duration are limited by a reliance on subjective self-reported evidence. Using data from simple wrist-worn activity monitors, we developed a tailored machine learning model, using balanced random forests with Hidden Markov Models, to reliably detect a number of activity modes. We show that physical activity and sleep behaviours can be classified with 87% accuracy in 159,504 minutes of recorded free-living behaviours from 132 adults. These trained models can be used to infer fine resolution activity patterns at the population scale in 96,220 participants...
May 21, 2018: Scientific Reports
Emma Pierson, Tim Althoff, Jure Leskovec
Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, and the menstrual cycle. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyH-MMs) for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with both discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to accommodate variation both within and between individual time series...
April 2018: Proceedings of the International World-Wide Web Conference
Julian Lee
Time-reversal symmetry of the microscopic laws dictates that the equilibrium distribution of a stochastic process must obey the condition of detailed balance. However, cyclic Markov processes that do not admit equilibrium distributions with detailed balance are often used to model systems driven out of equilibrium by external agents. I show that for a Markov model without detailed balance, an extended Markov model can be constructed, which explicitly includes the degrees of freedom for the driving agent and satisfies the detailed balance condition...
March 2018: Physical Review. E
Han Zhang, Tanner Yohe, Le Huang, Sarah Entwistle, Peizhi Wu, Zhenglu Yang, Peter K Busk, Ying Xu, Yanbin Yin
Complex carbohydrates of plants are the main food sources of animals and microbes, and serve as promising renewable feedstock for biofuel and biomaterial production. Carbohydrate active enzymes (CAZymes) are the most important enzymes for complex carbohydrate metabolism. With an increasing number of plant and plant-associated microbial genomes and metagenomes being sequenced, there is an urgent need of automatic tools for genomic data mining of CAZymes. We developed the dbCAN web server in 2012 to provide a public service for automated CAZyme annotation for newly sequenced genomes...
May 16, 2018: Nucleic Acids Research
Roey Angel, Maximilian Nepel, Christopher Panhölzl, Hannes Schmidt, Craig W Herbold, Stephanie A Eichorst, Dagmar Woebken
Diazotrophic microorganisms introduce biologically available nitrogen (N) to the global N cycle through the activity of the nitrogenase enzyme. The genetically conserved dinitrogenase reductase ( nifH ) gene is phylogenetically distributed across four clusters (I-IV) and is widely used as a marker gene for N2 fixation, permitting investigators to study the genetic diversity of diazotrophs in nature and target potential participants in N2 fixation. To date there have been limited, standardized pipelines for analyzing the nifH functional gene, which is in stark contrast to the 16S rRNA gene...
2018: Frontiers in Microbiology
James S Crampton, Stephen R Meyers, Roger A Cooper, Peter M Sadler, Michael Foote, David Harte
Periodic fluctuations in past biodiversity, speciation, and extinction have been proposed, with extremely long periods ranging from 26 to 62 million years, although forcing mechanisms remain speculative. In contrast, well-understood periodic Milankovitch climate forcing represents a viable driver for macroevolutionary fluctuations, although little evidence for such fluctuation exists except during the Late Cenozoic. The reality, magnitude, and drivers of periodic fluctuations in macroevolutionary rates are of interest given long-standing debate surrounding the relative roles of intrinsic biotic interactions vs...
May 14, 2018: Proceedings of the National Academy of Sciences of the United States of America
André S Santos, Rommel T Ramos, Artur Silva, Raphael Hirata, Ana L Mattos-Guaraldi, Roberto Meyer, Vasco Azevedo, Liza Felicori, Luis G C Pacheco
Biochemical tests are traditionally used for bacterial identification at the species level in clinical microbiology laboratories. While biochemical profiles are generally efficient for the identification of the most important corynebacterial pathogen Corynebacterium diphtheriae, their ability to differentiate between biovars of this bacterium is still controversial. Besides, the unambiguous identification of emerging human pathogenic species of the genus Corynebacterium may be hampered by highly variable biochemical profiles commonly reported for these species, including Corynebacterium striatum, Corynebacterium amycolatum, Corynebacterium minutissimum, and Corynebacterium xerosis...
May 11, 2018: Functional & Integrative Genomics
Pia R Neubauer, Christiane Widmann, Daniel Wibberg, Lea Schröder, Marcel Frese, Tilman Kottke, Jörn Kalinowski, Hartmut H Niemann, Norbert Sewald
Flavin-dependent halogenases catalyse halogenation of aromatic compounds. In most cases, this reaction proceeds with high regioselectivity and requires only the presence of FADH2, oxygen, and halide salts. Since marine habitats contain high concentrations of halides, organisms populating the oceans might be valuable sources of yet undiscovered halogenases. A new Hidden-Markov-Model (HMM) based on the PFAM tryptophan halogenase model was used for the analysis of marine metagenomes. Eleven metagenomes were screened leading to the identification of 254 complete or partial putative flavin-dependent halogenase genes...
2018: PloS One
Xiang Li, Yong Liu, Pengbin Chen, Jiewei Wu, Han Zhang
Sleep status is an important indicator to evaluate the health status of human beings. In this paper, we proposed a novel type of unperturbed sleep monitoring system under pillow to identify the pattern change of heart rate variability (HRV) through obtained RR interval signal, and to calculate the corresponding sleep stages combined with hidden Markov model (HMM) under the no-perception condition. In order to solve the existing problems of sleep staging based on HMM, ensemble empirical mode decomposition (EEMD) was proposed to eliminate the error caused by the individual differences in HRV and then to calculate the corresponding sleep stages...
April 1, 2018: Sheng Wu Yi Xue Gong Cheng Xue za Zhi, Journal of Biomedical Engineering, Shengwu Yixue Gongchengxue Zazhi
Gleb Filatov, Bruno Bauwens, Attila Kertész-Farkas
Motivation: Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Results: Here, we present a new convolutional kernel function for protein sequences called the LZW-Kernel. It is based on code words identified with the Lempel-Ziv-Welch (LZW) universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance (NCD), which often violates the distance metric properties in practice and requires further techniques to be used with SVMs...
May 7, 2018: Bioinformatics
Amalia Luque, Javier Romero-Lemos, Alejandro Carrasco, Luis Gonzalez-Abril
Several authors have shown that the sounds of anurans can be used as an indicator of climate change. Hence, the recording, storage and further processing of a huge number of anuran sounds, distributed over time and space, are required in order to obtain this indicator. Furthermore, it is desirable to have algorithms and tools for the automatic classification of the different classes of sounds. In this paper, six classification methods are proposed, all based on the data-mining domain, which strive to take advantage of the temporal character of the sounds...
2018: PeerJ
Ushnish Sengupta, Birgit Strodel
Allosteric regulation refers to the process where the effect of binding of a ligand at one site of a protein is transmitted to another, often distant, functional site. In recent years, it has been demonstrated that allosteric mechanisms can be understood by the conformational ensembles of a protein. Molecular dynamics (MD) simulations are often used for the study of protein allostery as they provide an atomistic view of the dynamics of a protein. However, given the wealth of detailed information hidden in MD data, one has to apply a method that allows extraction of the conformational ensembles underlying allosteric regulation from these data...
June 19, 2018: Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
Yasuhiro Matsunaga, Yuji Sugita
Single-molecule experiments and molecular dynamics (MD) simulations are indispensable tools for investigating protein conformational dynamics. The former provide time-series data, such as donor-acceptor distances, whereas the latter give atomistic information, although this information is often biased by model parameters. Here, we devise a machine-learning method to combine the complementary information from the two approaches and construct a consistent model of conformational dynamics. It is applied to the folding dynamics of the formin-binding protein WW domain...
May 3, 2018: ELife
Antonio Ciampi, Chun Bai, Alina Dyachenko, Jane McCusker, Martin G Cole, Eric Belzile
ABSTRACTBackground:A few studies examine the time evolution of delirium in long-term care (LTC) settings. In this work, we analyze the multivariate Delirium Index (DI) time evolution in LTC settings. METHODS: The multivariate DI was measured weekly for six months in seven LTC facilities, located in Montreal and Quebec City. Data were analyzed using a hidden Markov chain/latent class model (HMC/LC). RESULTS: The analysis sample included 276 LTC residents. Four ordered latent classes were identified: fairly healthy (low "disorientation" and "memory impairment," negligible other DI symptoms), moderately ill (low "inattention" and "disorientation," medium "memory impairment"), clearly sick (low "disorganized thinking" and "altered level of consciousness," medium "inattention," "disorientation," "memory impairment" and "hypoactivity"), and very sick (low "hypoactivity," medium "altered level of consciousness," high "inattention," "disorganized thinking," "disorientation" and "memory impairment")...
May 3, 2018: International Psychogeriatrics
Zhanbin Liang, Di Liu, Xinyao Lu, Hong Zong, Jian Song, Bin Zhuge
During high gravity fermentation, a set of hexose transporters in yeasts plays an important role in efficient sugar transport. However, hexose transporters have been studied mainly in the Saccharomyces cerevisiae model and at low or moderate sugar concentrations. The hexose transporters are still poorly understood in the industrial glycerol producer Candida glycerinogenes, which assimilates sugar efficiently at high glucose concentration. To explore these hexose transporters, 14 candidates were identified using a hidden Markov model and characterized...
April 28, 2018: Applied Microbiology and Biotechnology
Yves Rybarczyk, Jan Kleine Deters, Clément Cointe, Danilo Esparza
The enhancement of ubiquitous and pervasive computing brings new perspectives in medical rehabilitation. In that sense, the present study proposes a smart, web-based platform to promote the reeducation of patients after hip replacement surgery. This project focuses on two fundamental aspects in the development of a suitable tele-rehabilitation application, which are: (i) being based on an affordable technology, and (ii) providing the patients with a real-time assessment of the correctness of their movements...
April 26, 2018: Sensors
Siddhartha Kundu
The accurate annotation of an unknown protein sequence depends on extant data of template sequences. This could be empirical or sets of reference sequences, and provides an exhaustive pool of probable functions. Individual methods of predicting dominant function possess shortcomings such as varying degrees of inter-sequence redundancy, arbitrary domain inclusion thresholds, heterogeneous parameterization protocols, and ill-conditioned input channels. Here, I present a rigorous theoretical derivation of various steps of a generic algorithm that integrates and utilizes several statistical methods to predict the dominant function in unknown protein sequences...
April 26, 2018: Acta Biotheoretica
Antonio Punzo, Salvatore Ingrassia, Antonello Maruotti
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution...
April 22, 2018: Statistics in Medicine
