keyword
MENU ▼
Read by QxMD icon Read
search

GPUs

keyword
https://www.readbyqxmd.com/read/29104852/a-streaming-multi-gpu-implementation-of-image-simulation-algorithms-for-scanning-transmission-electron-microscopy
#1
Alan Pryor, Colin Ophus, Jianwei Miao
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods...
2017: Advanced Structural and Chemical Imaging
https://www.readbyqxmd.com/read/29104714/a-hybrid-task-graph-scheduler-for-high-performance-image-processing-workflows
#2
Timothy Blattner, Walid Keyrouz, Shuvra S Bhattacharyya, Milton Halem, Mary Brady
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems...
December 2017: Journal of Signal Processing Systems
https://www.readbyqxmd.com/read/29095910/preliminary-research-on-flow-rate-and-free-surface-of-the-accelerator-driven-subcritical-system-gravity-driven-dense-granular-flow-target
#3
Xiaodong Li, Jiangfeng Wan, Sheng Zhang, Ping Lin, Yanshi Zhang, Guanghui Yang, Mengke Wang, Wenshan Duan, Jian'an Sun, Lei Yang
A spallation target is one of the three core parts of the accelerator driven subcritical system (ADS), which has already been investigated for decades. Recently, a gravity-driven Dense Granular-flow Target (DGT) is proposed, which consists of a cylindrical hopper and an internal coaxial cylindrical beam pipe. The research on the flow rate and free surface are important for the design of the target whether in Heavy Liquid Metal (HLM) targets or the DGT. In this paper, the relations of flow rate and the geometry of the DGT are investigated...
2017: PloS One
https://www.readbyqxmd.com/read/29079118/full-monte-carlo-based-biologic-treatment-plan-optimization-system-for-intensity-modulated-carbon-ion-therapy-on-graphics-processing-unit
#4
Nan Qin, Chenyang Shen, Min-Yu Tsai, Marco Pinto, Zhen Tian, Georgios Dedes, Arnold Pompos, Steve B Jiang, Katia Parodi, Xun Jia
PURPOSE: One of the major benefits of carbon ion therapy is enhanced biological effectiveness at the Bragg peak region. For intensity modulated carbon ion therapy (IMCT), it is desirable to use Monte Carlo (MC) methods to compute the properties of each pencil beam spot for treatment planning, because of their accuracy in modeling physics processes and estimating biological effects. We previously developed goCMC, a graphics processing unit (GPU)-oriented MC engine for carbon ion therapy...
September 12, 2017: International Journal of Radiation Oncology, Biology, Physics
https://www.readbyqxmd.com/read/29068678/efficient-and-accurate-born-oppenheimer-molecular-dynamics-for-large-molecular-systems
#5
Laurens D M Peters, Jörg Kussmann, Christian Ochsenfeld
An efficient scheme for the calculation of Born-Oppenheimer molecular dynamics (BOMD) simulations is introduced. It combines the corrected small basis set Hartree-Fock (HF-3c) method by Sure and Grimme [J. Comput. Chem. 2013, 43, 1672], extended Lagrangian BOMD (XL-BOMD) by Niklasson et al. [J. Chem. Phys. 2009, 130, 214109], and the calculation of the two electron integrals on graphics processing units (GPUs) [J. Chem. Phys. 2013, 138, 134114; J. Chem. Theory Comput. 2015, 11, 918]. To explore the parallel performance of our strong scaling implementation of the method, we present timings and extract, as its validation and first illustrative application, high-quality vibrational spectra from simulated trajectories of β-carotene, paclitaxel, and liquid water (up to 500 atoms)...
October 25, 2017: Journal of Chemical Theory and Computation
https://www.readbyqxmd.com/read/29036886/model-based-real-time-non-rigid-tracking
#6
Sebastián Bronte, Luis M Bergasa, Daniel Pizarro, Rafael Barea
This paper presents a sequential non-rigid reconstruction method that recovers the 3D shape and the camera pose of a deforming object from a video sequence and a previous shape model of the object. We take PTAM (Parallel Mapping and Tracking), a state-of-the-art sequential real-time SfM (Structure-from-Motion) engine, and we upgrade it to solve non-rigid reconstruction. Our method provides a good trade-off between processing time and reconstruction error without the need for specific processing hardware, such as GPUs...
October 14, 2017: Sensors
https://www.readbyqxmd.com/read/29035209/an-efficient-approach-for-accelerating-bucket-elimination-on-gpus
#7
Filippo Bistaffa, Nicola Bombieri, Alessandro Farinelli
Bucket elimination (BE) is a framework that encompasses several algorithms, including belief propagation (BP) and variable elimination for constraint optimization problems (COPs). BE has significant computational requirements that can be addressed by using graphics processing units (GPUs) to parallelize its fundamental operations, i.e., composition and marginalization, which operate on functions represented by large tables. We propose a novel approach to parallelize these operations with GPUs, which optimizes the table layout so to achieve better performance in terms of increased speedup and scalability...
November 2017: IEEE Transactions on Cybernetics
https://www.readbyqxmd.com/read/29016700/a-parallel-approximate-string-matching-under-levenshtein-distance-on-graphics-processing-units-using-warp-shuffle-operations
#8
ThienLuan Ho, Seung-Rohk Oh, HyunJin Kim
Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance...
2017: PloS One
https://www.readbyqxmd.com/read/28989754/accelerating-adaptive-inverse-distance-weighting-interpolation-algorithm-on-a-graphics-processing-unit
#9
Gang Mei, Liangliang Xu, Nengxiong Xu
This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory...
September 2017: Royal Society Open Science
https://www.readbyqxmd.com/read/28875524/iterative-hard-thresholding-for-model-selection-in-genome-wide-association-studies
#10
Kevin L Keys, Gary K Chen, Kenneth Lange
A genome-wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges...
December 2017: Genetic Epidemiology
https://www.readbyqxmd.com/read/28868521/an-out-of-core-gpu-based-dimensionality-reduction-algorithm-for-big-mass-spectrometry-data-and-its-application-in-bottom-up-proteomics
#11
Muaaz Gul Awan, Fahad Saeed
Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra...
August 2017: ACM-BCB: ACM Conference on Bioinformatics, Computational Biology and Biomedicine
https://www.readbyqxmd.com/read/28866532/sparseleap-efficient-empty-space-skipping-for-large-scale-volume-rendering
#12
Markus Hadwiger, Ali K Al-Awami, Johanna Beyer, Marco Agus, Hanspeter Pfister
Recent advances in data acquisition produce volume data of very high resolution and large size, such as terabyte-sized microscopy volumes. These data often contain many fine and intricate structures, which pose huge challenges for volume rendering, and make it particularly important to efficiently skip empty space. This paper addresses two major challenges: (1) The complexity of large volumes containing fine structures often leads to highly fragmented space subdivisions that make empty regions hard to skip efficiently...
August 29, 2017: IEEE Transactions on Visualization and Computer Graphics
https://www.readbyqxmd.com/read/28859833/massively-parallel-simulator-of-optical-coherence-tomography-of-inhomogeneous-turbid-media
#13
Siavash Malektaji, Ivan T Lima, Mauricio R Escobar I, Sherif S Sherif
BACKGROUND AND OBJECTIVE: An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons...
October 2017: Computer Methods and Programs in Biomedicine
https://www.readbyqxmd.com/read/28835734/embedded-based-graphics-processing-unit-cluster-platform-for-multiple-sequence-alignments
#14
Jyh-Da Wei, Hui-Jun Cheng, Chun-Yuan Lin, Jin Ye, Kuan-Yu Yeh
High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs)...
2017: Evolutionary Bioinformatics Online
https://www.readbyqxmd.com/read/28830153/efficient-algorithms-for-large-scale-quantum-transport-calculations
#15
Sascha Brück, Mauro Calderara, Mohammad Hossein Bani-Hashemian, Joost VandeVondele, Mathieu Luisier
Massively parallel algorithms are presented in this paper to reduce the computational burden associated with quantum transport simulations from first-principles. The power of modern hybrid computer architectures is harvested in order to determine the open boundary conditions that connect the simulation domain with its environment and to solve the resulting Schrödinger equation. While the former operation takes the form of an eigenvalue problem that is solved by a contour integration technique on the available central processing units (CPUs), the latter can be cast into a linear system of equations that is simultaneously processed by SplitSolve, a two-step algorithm, on general-purpose graphics processing units (GPUs)...
August 21, 2017: Journal of Chemical Physics
https://www.readbyqxmd.com/read/28818036/ecccl-parallelized-gpu-implementation-of-ensemble-classifier-chains
#16
Mona Riemenschneider, Alexander Herbst, Ari Rasch, Sergei Gorlatch, Dominik Heider
BACKGROUND: Multi-label classification has recently gained great attention in diverse fields of research, e.g., in biomedical application such as protein function prediction or drug resistance testing in HIV. In this context, the concept of Classifier Chains has been shown to improve prediction accuracy, especially when applied as Ensemble Classifier Chains. However, these techniques lack computational efficiency when applied on large amounts of data, e.g., derived from next-generation sequencing experiments...
August 17, 2017: BMC Bioinformatics
https://www.readbyqxmd.com/read/28768689/accelerating-wright-fisher-forward-simulations-on-the-graphics-processing-unit
#17
David S Lawrie
Forward Wright-Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright-Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called "embarrassingly parallel," consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency...
September 7, 2017: G3: Genes—Genomes—Genetics
https://www.readbyqxmd.com/read/28749354/real-time-cloth-rendering-with-fiber-level-detail
#18
Kui Wu, Cem Yuksel
Modeling cloth with fiber-level geometry can produce highly realistic details. However, rendering fiber-level cloth models not only has a high memory cost but it also has a high computation cost even for offline rendering applications. In this paper we present a real-time fiber-level cloth rendering method for current GPUs. Our method procedurally generates fiber-level geometric details on-the-fly using yarn-level control points for minimizing the data transfer to the GPU. We also reduce the rasterization operations by collectively representing the fibers near the center of each ply that form the yarn structure...
July 26, 2017: IEEE Transactions on Visualization and Computer Graphics
https://www.readbyqxmd.com/read/28746339/openmm-7-rapid-development-of-high-performance-algorithms-for-molecular-dynamics
#19
Peter Eastman, Jason Swails, John D Chodera, Robert T McGibbon, Yutong Zhao, Kyle A Beauchamp, Lee-Ping Wang, Andrew C Simmonett, Matthew P Harrigan, Chaya D Stern, Rafal P Wiewiora, Bernard R Brooks, Vijay S Pande
OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility. It allows users to easily add new features, including forces with novel functional forms, new integration algorithms, and new simulation protocols. Those features automatically work on all supported hardware types (including both CPUs and GPUs) and perform well on all of them. In many cases they require minimal coding, just a mathematical description of the desired function. They also require no modification to OpenMM itself and can be distributed independently of OpenMM...
July 2017: PLoS Computational Biology
https://www.readbyqxmd.com/read/28692677/fux-sim-implementation-of-a-fast-universal-simulation-reconstruction-framework-for-x-ray-systems
#20
Monica Abella, Estefania Serrano, Javier Garcia-Blas, Ines García, Claudia de Molina, Jesus Carretero, Manuel Desco
The availability of digital X-ray detectors, together with advances in reconstruction algorithms, creates an opportunity for bringing 3D capabilities to conventional radiology systems. The downside is that reconstruction algorithms for non-standard acquisition protocols are generally based on iterative approaches that involve a high computational burden. The development of new flexible X-ray systems could benefit from computer simulations, which may enable performance to be checked before expensive real systems are implemented...
2017: PloS One
keyword
keyword
40374
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"