Read by QxMD icon Read


Mohamed Salem, Rafet Al-Tobasei, Ali Ali, Daniela Lourenco, Guangtu Gao, Yniv Palti, Brett Kenney, Timothy D Leeds
Detection of coding/functional SNPs that change the biological function of a gene may lead to identification of putative causative alleles within QTL regions and discovery of genetic markers with large effects on phenotypes. This study has two-fold objectives, first to develop, and validate a 50K transcribed gene SNP-chip using RNA-Seq data. To achieve this objective, two bioinformatics pipelines, GATK and SAMtools, were used to identify ~21K transcribed SNPs with allelic imbalances associated with important aquaculture production traits including body weight, muscle yield, muscle fat content, shear force, and whiteness in addition to resistance/susceptibility to bacterial cold-water disease (BCWD)...
2018: Frontiers in Genetics
Jerome Kelleher, Mike Lin, C H Albach, Ewan Birney, Robert Davies, Marina Gourtovaia, David Glazer, Cristina Y Gonzalez, David K Jackson, Aaron Kemp, John Marshall, Andrew Nowak, Alexander Senf, Jaime M Tovar-Corona, Alexander Vikhorev, Thomas M Keane
Summary: Standardised interfaces for efficiently accessing high-throughput sequencing data are a fundamental requirement for large-scale genomic data sharing. We have developed htsget, a protocol for secure, efficient and reliable access to sequencing read and variation data. We demonstrate four independent client and server implementations, and the results of a comprehensive interoperability demonstration. Availability and implementation: http://samtools.github...
June 19, 2018: Bioinformatics
Guangtu Gao, Torfinn Nome, Devon E Pearse, Thomas Moen, Kerry A Naish, Gary H Thorgaard, Sigbjørn Lien, Yniv Palti
Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout ( Oncorhynchus mykiss ), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery...
2018: Frontiers in Genetics
Zhentang Li, Yi Wang, Fei Wang
BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. RESULTS: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data...
April 19, 2018: BMC Bioinformatics
Darrell O Ricke, Anna Shcherbina, Adam Michaleas, Philip Fremont-Smith
High-throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele-calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS...
November 2018: Journal of Forensic Sciences
Samira Asgharzade, Mohammad Amin Tabatabaiefar, Javad Mohammadi-Asl, Morteza Hashemzadeh Chaleshtori
BACKGROUND: Recent studies have confirmed the utility of targeted next-generation sequencing (NGS), providing a remarkable opportunity to find variants in known disease genes, especially in genetically heterogeneous disorders such as hearing loss (HL). METHODS: After excluding mutations in the most common autosomal recessive non-syndromic HL (ARNSHL) genes via Sanger sequencing and genetic linkage analysis, we performed NGS in the proband an Iranian family with ARNSHL...
May 2018: International Journal of Pediatric Otorhinolaryngology
Nam S Vo, Vinhthuy Phan
Motivation: The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately...
September 1, 2018: Bioinformatics
Taemook Kim, Hogyu David Seo, Lothar Hennighausen, Daeyoup Lee, Keunsoo Kang
Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository...
May 18, 2018: Nucleic Acids Research
K Yu Tsukanov, A Yu Krasnenko, D A Plakhina, D O Korostin, A V Churov, O S Druzhilovskaya, D V Rebrikov, V V Ilinsky
We aimed to develop a pipeline for the bioinformatic analysis and interpretation of NGS data and detection of a wide range of single-nucleotide somatic mutations within tumor DNA. Initially, the NGS reads were submitted to a quality control check by the Cutadapt program. Low-quality 3¢-nucleotides were removed. After that the reads were mapped to the reference genome hg19 (GRCh37.p13) by BWA. The SAMtools program was used for exclusion of duplicates. MuTect was used for SNV calling. The functional effect of SNVs was evaluated using the algorithm, including annotation and evaluation of SNV pathogenicity by SnpEff and analysis of such databases as COSMIC, dbNSFP, Clinvar, and OMIM...
October 2017: Biomedit︠s︡inskai︠a︡ Khimii︠a︡
Jie Qiu, Wenwei Zhang, Qingsheng Xia, Fuxue Liu, Shuwei Zhao, Kailing Zhang, Min Chen, Chuanshan Zang, Ruifeng Ge, Dapeng Liang, Yan Sun
As the predominant thyroid cancer, papillary thyroid cancer (PTC) accounts for 75‑85% of thyroid cancer cases. This research aimed to investigate transcriptomic changes and key genes in PTC. Using RNA‑sequencing technology, the transcriptional profiles of 5 thyroid tumor tissues and 5 adjacent normal tissues were obtained. The single nucleotide polymorphisms (SNPs) were identified by SAMtools software and then annotated by ANNOVAR software. After differentially expressed genes (DEGs) were selected by edgR software, they were further investigated by enrichment analysis, protein domain analysis, and protein‑protein interaction (PPI) network analysis...
November 2017: Molecular Medicine Reports
Boyan Zhou, Shaoqing Wen, Lingxiang Wang, Li Jin, Hui Li, Hong Zhang
Ancient DNA obtained from ancient samples, such as sediments, bones, and teeth, is an important genetic resource that can be used to reconstruct an evolutional history of humans, animals, and plants. The application of high-throughput sequencing enables the research of ancient DNA to be conducted in a whole genome scale. However, post-mortem DNA damage mainly caused by deamination of cytosine to uracil (or methylated cytosine to thymine) may confound the variant calling and downstream analysis. In this article, we develop a Python program to implement a new variant caller, "AntCaller", which extracts the information on nucleotide substitutions from sequencing data and calculates the probability of each genotype based on a Bayesian rule...
December 2017: Molecular Genetics and Genomics: MGG
Bo-Young Kim, Jung Hoon Park, Hye-Yeong Jo, Soo Kyung Koo, Mi-Hyun Park
Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging...
2017: PloS One
Rafet Al-Tobasei, Ali Ali, Timothy D Leeds, Sixin Liu, Yniv Palti, Brett Kenney, Mohamed Salem
BACKGROUND: Coding/functional SNPs change the biological function of a gene and, therefore, could serve as "large-effect" genetic markers. In this study, we used two bioinformatics pipelines, GATK and SAMtools, for discovering coding/functional SNPs with allelic-imbalances associated with total body weight, muscle yield, muscle fat content, shear force, and whiteness. Phenotypic data were collected for approximately 500 fish, representing 98 families (5 fish/family), from a growth-selected line, and the muscle transcriptome was sequenced from 22 families with divergent phenotypes (4 low- versus 4 high-ranked families per trait)...
August 7, 2017: BMC Genomics
Peizhou Liao, Glen A Satten, Yi-Juan Hu
A fundamental challenge in analyzing next-generation sequencing (NGS) data is to determine an individual's genotype accurately, as the accuracy of the inferred genotype is essential to downstream analyses. Correctly estimating the base-calling error rate is critical to accurate genotype calls. Phred scores that accompany each call can be used to decide which calls are reliable. Some genotype callers, such as GATK and SAMtools, directly calculate the base-calling error rates from phred scores or recalibrated base quality scores...
July 2017: Genetic Epidemiology
Arpita Konar, Olivia Choudhury, Rebecca Bullis, Lauren Fiedler, Jacqueline M Kruser, Melissa T Stephens, Oliver Gailing, Scott Schlarbaum, Mark V Coggeshall, Margaret E Staton, John E Carlson, Scott Emrich, Jeanne Romero-Severson
BACKGROUND: Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the correct ordering of sequence scaffolds and contigs to chromosomal locations. Accurate maps also facilitate the discovery of chromosome segments containing allelic variants conferring resistance to the biotic and abiotic stresses that threaten forest trees worldwide...
May 30, 2017: BMC Genomics
Rigbe G Weldatsadik, Jingwen Wang, Kai Puhakainen, Hong Jiao, Jari Jalava, Kati Räisänen, Neeta Datta, Tiina Skoog, Jaana Vuopio, T Sakari Jokiranta, Juha Kere
Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Pooled sequencing (Pool-seq) is a cost effective approach for population level genetic studies that require large numbers of samples such as various strains of a microbe. To test the use of Pool-seq in identifying variation, we pooled DNA of 100 Streptococcus pyogenes strains of different emm types in two pools, each containing 50 strains...
March 31, 2017: Scientific Reports
Anna V Klepikova, Artem S Kasianov, Mikhail S Chesnokov, Natalia L Lazarevich, Aleksey A Penin, Maria Logacheva
BACKGROUND: RNA-seq is a useful tool for analysis of gene expression. However, its robustness is greatly affected by a number of artifacts. One of them is the presence of duplicated reads. RESULTS: To infer the influence of different methods of removal of duplicated reads on estimation of gene expression in cancer genomics, we analyzed paired samples of hepatocellular carcinoma (HCC) and non-tumor liver tissue. Four protocols of data analysis were applied to each sample: processing without deduplication, deduplication using a method implemented in SAMtools, and deduplication based on one or two molecular indices (MI)...
2017: PeerJ
Sarah Sandmann, Aniek O de Graaf, Mohsen Karimi, Bert A van der Reijden, Eva Hellström-Lindberg, Joop H Jansen, Martin Dugas
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict...
February 24, 2017: Scientific Reports
Chun Hang Au, Anskar Y H Leung, Ava Kwong, Tsun Leung Chan, Edmond S K Ma
BACKGROUND: Complex insertions and deletions (indels) from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies. Somatic and germline complex indels in key disease driver genes could be missed in NGS-based genomics studies. RESULTS: INDELseek is an open-source complex indel caller designed for NGS data of random fragments and PCR amplicons. The key differentiating factor of INDELseek is that each NGS read alignment was examined as a whole instead of "pileup" of each reference position across multiple alignments...
January 5, 2017: BMC Genomics
Ariane L Hofmann, Jonas Behr, Jochen Singer, Jack Kuipers, Christian Beisel, Peter Schraml, Holger Moch, Niko Beerenwinkel
BACKGROUND: Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant...
January 3, 2017: BMC Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"