journal
Journals Proceedings of the IEEE Intern...

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing

https://read.qxmd.com/read/38682049/towards-interpretable-seizure-detection-using-wearables
#1
JOURNAL ARTICLE
Irfan Al-Hussaini, Cassie S Mitchell
Seizure detection using machine learning is a critical problem for the timely intervention and management of epilepsy. We propose SeizFt, a robust seizure detection framework using EEG from a wearable device. It uses features paired with an ensemble of trees, thus enabling further interpretation of the model's results. The efficacy of the underlying augmentation and class-balancing strategy is also demonstrated. This study was performed for the Seizure Detection Challenge 2023, an ICASSP Grand Challenge.
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37701064/acoustically-driven-phoneme-removal-that-preserves-vocal-affect-cues
#2
JOURNAL ARTICLE
Camille Noufi, Jonathan Berger, Michael Frank, Karen Parker, Daniel L Bowling
In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. The immediate utility of this method lies in clinical tests of sensitivity to vocal affect that are not confounded by language, which is impaired in a variety of clinical populations. The method is based on simultaneous recordings of speech audio and electroglotto-graphic (EGG) signals. The speech audio signal is used to estimate the average vocal tract filter response and amplitude envelop...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37584045/robust-time-series-recovery-and-classification-using-test-time-noise-simulator-networks
#3
JOURNAL ARTICLE
Eun Som Jeon, Suhas Lohit, Rushil Anirudh, Pavan Turaga
Time-series are commonly susceptible to various types of corruption due to sensor-level changes and defects which can result in missing samples, sensor and quantization noise, unknown calibration, unknown phase shifts etc. These corruptions cannot be easily corrected as the noise model may be unknown at the time of deployment. This also results in the inability to employ pre-trained classifiers, trained on (clean) source data. In this paper, we present a general framework and models for time-series that can make use of (unlabeled) test samples to estimate the noise model-entirely at test time...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37577180/online-binaural-speech-separation-of-moving-speakers-with-a-wavesplit-network
#4
JOURNAL ARTICLE
Cong Han, Nima Mesgarani
Binaural speech separation in real-world scenarios often involves moving speakers. Most current speech separation methods use utterance-level permutation invariant training (u-PIT) for training. In inference time, however, the order of outputs can be inconsistent over time particularly in long-form speech separation. This situation which is referred to as the speaker swap problem is even more problematic when speakers constantly move in space and therefore poses a challenge for consistent placement of speakers in output channels...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37577179/phoneme-level-bert-for-enhanced-prosody-of-text-to-speech-with-grapheme-predictions
#5
JOURNAL ARTICLE
Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani
Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37388235/multimodal-microscopy-image-alignment-using-spatial-and-shape-information-and-a-branch-and-bound-algorithm
#6
JOURNAL ARTICLE
Shuonan Chen, Bovey Y Rao, Stephanie Herrlinger, Attila Losonczy, Liam Paninski, Erdem Varol
Multimodal microscopy experiments that image the same population of cells under different experimental conditions have become a widely used approach in systems and molecular neuroscience. The main obstacle is to align the different imaging modalities to obtain complementary information about the observed cell population (e.g., gene expression and calcium signal). Traditional image registration methods perform poorly when only a small subset of cells are present in both images, as is common in multimodal experiments...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37388234/robust-online-multiband-drift-estimation-in-electrophysiology-data
#7
JOURNAL ARTICLE
Charlie Windolf, Angelique C Paulk, Yoav Kfir, Eric Trautmann, Domokos Meszéna, William Muñoz, Irene Caprara, Mohsen Jamali, Julien Boussard, Ziv M Williams, Sydney S Cash, Liam Paninski, Erdem Varol
High-density electrophysiology probes have opened new possibilities for systems neuroscience in human and non-human animals, but probe motion poses a challenge for downstream analyses, particularly in human recordings. We improve on the state of the art for tracking this motion with four major contributions. First, we extend previous decentralized methods to use multiband information, leveraging the local field potential (LFP) in addition to spikes. Second, we show that the LFP-based approach enables registration at sub-second temporal resolution...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37266485/glacier-glass-box-transformer-for-interpretable-dynamic-neuroimaging
#8
JOURNAL ARTICLE
Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis
Deep learning models can perform as well or better than humans in many tasks, especially vision related. Almost exclusively, these models are used to perform classification or prediction. However, deep learning models are usually of black-box nature, and it is often difficult to interpret the model or the features. The lack of interpretability causes a restrain from applying deep learning to fields such as neuroimaging, where the results must be transparent, and interpretable. Therefore, we present a 'glass-box' deep learning model and apply it to the field of neuroimaging...
June 2023: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37193061/optimize-wav2vec2s-architecture-for-small-training-set-through-analyzing-its-pre-trained-models-attention-pattern
#9
JOURNAL ARTICLE
Liu Chen, Meysam Asgari, Hiroko H Dodge
Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model's block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37064829/towards-interpretability-of-speech-pause-in-dementia-detection-using-adversarial-learning
#10
JOURNAL ARTICLE
Youxiang Zhu, Bang Tran, Xiaohui Liang, John A Batsis, Robert M Roth
Speech pause is an effective biomarker in dementia detection. Recent deep learning models have exploited speech pauses to achieve highly accurate dementia detection, but have not exploited the interpretability of speech pauses, i.e., what and how positions and lengths of speech pauses affect the result of dementia detection. In this paper, we will study the positions and lengths of dementia-sensitive pauses using adversarial learning approaches. Specifically, we first utilize an adversarial attack approach by adding the perturbation to the speech pauses of the testing samples, aiming to reduce the confidence levels of the detection model...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/37064828/speech-tasks-relevant-to-sleepiness-determined-with-deep-transfer-learning
#11
JOURNAL ARTICLE
Bang Tran, Youxiang Zhu, Xiaohui Liang, James W Schwoebel, Lindsay A Warrenburg
Excessive sleepiness in attention-critical contexts can lead to adverse events, such as car crashes. Detecting and monitoring sleepiness can help prevent these adverse events from happening. In this paper, we use the Voiceome dataset to extract speech from 1,828 participants to develop a deep transfer learning model using Hidden-Unit BERT (HuBERT) speech representations to detect sleepiness from individuals. Speech is an under-utilized source of data in sleep detection, but as speech collection is easy, cost-effective, and non-invasive, it provides a promising resource for sleepiness detection...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/36628172/embedding-signals-on-graphs-with-unbalanced-diffusion-earth-mover-s-distance
#12
JOURNAL ARTICLE
Alexander Tong, Guillaume Huguet, Dennis Shung, Amine Natik, Manik Kuchroo, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/36311383/infant-crying-detection-in-real-world-environments
#13
JOURNAL ARTICLE
Xuewen Yao, Megan Micheletti, Mckensey Johnson, Edison Thomaz, Kaya de Barbaro
Most existing cry detection models have been tested with data collected in controlled settings. Thus, the extent to which they generalize to noisy and lived environments is unclear. In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.613 (Precision: 0.672, Recall: 0.552), showing improved external validity over existing methods at cry detection in everyday real-world settings...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/36212702/cmri2spec-cine-mri-sequence-to-spectrogram-synthesis-via-a-pairwise-heterogeneous-translator
#14
JOURNAL ARTICLE
Xiaofeng Liu, Fangxu Xing, Maureen Stone, Jerry L Prince, Jangwon Kim, Georges El Fakhri, Jonghye Woo
Multimodal representation learning using visual movements from cine magnetic resonance imaging (MRI) and their acoustics has shown great potential to learn shared representation and to predict one modality from another. Here, we propose a new synthesis framework to translate from cine MRI sequences to spectrograms with a limited dataset size. Our framework hinges on a novel fully convolutional heterogeneous translator, with a 3D CNN encoder for efficient sequence encoding and a 2D transpose convolution decoder...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/36093040/scattering-statistics-of-generalized-spatial-poisson-point-processes
#15
JOURNAL ARTICLE
Michael Perlmutter, Jieqian He, Matthew Hirn
We present a machine learning model for the analysis of randomly generated discrete signals, modeled as the points of an inhomogeneous, compound Poisson point process. Like the wavelet scattering transform introduced by Mallat, our construction is naturally invariant to translations and reflections, but it decouples the roles of scale and frequency, replacing wavelets with Gabor-type measurements. We show that, with suitable nonlinearities, our measurements distinguish Poisson point processes from common self-similar processes, and separate different types of Poisson point processes...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/36035505/improving-phase-rectified-signal-averaging-for-fetal-heart-rate-analysis
#16
JOURNAL ARTICLE
Tong Chen, Guanchao Feng, Cassandra Heiselman, J Gerald Quirk, Petar M Djurić
Low umbilical artery pH is a marker for neonatal acidosis and is associated with an increased risk for neonatal complications. The phase-rectified signal averaging (PRSA) features have demonstrated superior discriminatory or diagnostic ability and good interpretability in many biomedical applications including fetal heart rate analysis. However, the performance of PRSA method is sensitive to values of the selected parameters which are usually either chosen based on a grid search or empirically in the literature...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/36035504/unsupervised-clustering-and-analysis-of-contraction-dependent-fetal-heart-rate-segments
#17
JOURNAL ARTICLE
Liu Yang, Cassandra Heiselman, J Gerald Quirk, Petar M Djurić
The computer-aided interpretation of fetal heart rate (FHR) and uterine contraction (UC) has not been developed well enough for wide use in delivery rooms. The main challenges still lie in the lack of unclear and nonstandard labels for cardiotocography (CTG) recordings, and the timely prediction of fetal state during monitoring. Rather than traditional supervised approaches to FHR classification, this paper demonstrates a way to understand the UC-dependent FHR responses in an unsupervised manner. In this work, we provide a complete method for FHR-UC segment clustering and analysis via the Gaussian process latent variable model, and density-based spatial clustering...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/35990520/boost-ensemble-learning-for-classification-of-ctg-signals
#18
JOURNAL ARTICLE
Marzieh Ajirak, Cassandra Heiselman, J Gerald Quirk, Petar M Djurić
During the process of childbirth, fetal distress caused by hypoxia can lead to various abnormalities. Cardiotocography (CTG), which consists of continuous recording of the fetal heart rate (FHR) and uterine contractions (UC), is routinely used for classifying the fetuses as hypoxic or non-hypoxic. In practice, we face highly imbalanced data, where the hypoxic fetuses are significantly underrepresented. We propose to address this problem by boost ensemble learning, where for learning, we use the distribution of classification error over the dataset...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/35645618/mri-recovery-with-a-self-calibrated-denoiser
#19
JOURNAL ARTICLE
Sizhuo Liu, Philip Schniter, Rizwan Ahmad
Plug-and-play (PnP) methods that employ application-specific denoisers have been proposed to solve inverse problems, including MRI reconstruction. However, training application-specific denoisers is not feasible for many applications due to the lack of training data. In this work, we propose a PnP-inspired recovery method that does not require data beyond the single, incomplete set of measurements. The proposed self-supervised method, called recovery with a self-calibrated denoiser (ReSiDe), trains the denoiser from the patches of the image being recovered...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
https://read.qxmd.com/read/35645617/expectation-consistent-plug-and-play-for-mri
#20
JOURNAL ARTICLE
Saurav K Shastri, Rizwan Ahmad, Christopher A Metzler, Philip Schniter
For image recovery problems, plug-and-play (PnP) methods have been developed that replace the proximal step in an optimization algorithm with a call to an application-specific denoiser, often implemented using a deep neural network. Although such methods have been successful, they can be improved. For example, the denoiser is often trained using white Gaussian noise, while PnP's denoiser input error is often far from white and Gaussian, with statistics that are difficult to predict from iteration to iteration...
May 2022: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
journal
journal
46543
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.