Most recent papers with the keyword scene recognition

#1

JOURNAL ARTICLE

NeuralRecon: Real-Time Coherent 3D Scene Reconstruction from Monocular Video.

Xi Chen, Jiaming Sun, Yiming Xie, Hujun Bao, Xiaowei Zhou

We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction...

38656855

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#2

JOURNAL ARTICLE

Exploring the Semantic-Inconsistency Effect in Scenes Using a Continuous Measure of Linguistic-Semantic Similarity.

Claudia Damiano, Maarten Leemans, Johan Wagemans

Viewers use contextual information to visually explore complex scenes. Object recognition is facilitated by exploiting object-scene relations (which objects are expected in a given scene) and object-object relations (which objects are expected because of the occurrence of other objects). Semantically inconsistent objects deviate from these expectations, so they tend to capture viewers' attention (the semantic-inconsistency effect ). Some objects fit the identity of a scene more or less than others, yet semantic inconsistencies have hitherto been operationalized as binary (consistent vs...

38652604

April 23, 2024: Psychological Science

#3

JOURNAL ARTICLE

DeepMesh: Differentiable Iso-Surface Extraction.

Benoit Guillard, Edoardo Remelli, Artem Lukoianov, Pierre Yvernay, Stephan R Richter, Timur Bagautdinov, Pierre Baque, Pascal Fua

Geometric Deep Learning has recently made striking progress with the advent of continuous deep implicit fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is unlimited in resolution. Unfortunately, these methods are often unsuitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field...

38648137

April 22, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#4

JOURNAL ARTICLE

A Multi-Featured Expression Recognition Model Incorporating Attention Mechanism and Object Detection Structure for Psychological Problem Diagnosis.

Xiufeng Zhang, Bingyi Li, Guobin Qi

Expression is the main method for judging the emotional state and psychological condition of the human body, and the prediction of changes in facial expressions can effectively determine the mental health of a person, thus avoiding serious psychological or psychiatric disorders due to early negligence. From a computer vision perspective, most researchers have focused on studying facial expression analysis, and in some cases, body posture is also considered. However their performance is more limited under unconstrained natural conditions, which requires more information to be used in human emotion analysis...

38641188

April 17, 2024: Physiology & Behavior

#5

JOURNAL ARTICLE

Macaque claustrum, pulvinar and putative dorsolateral amygdala support the cross-modal association of social audio-visual stimuli based on meaning.

Mathilda Froesel, Maëva Gacoin, Simon Clavagnier, Marc Hauser, Quentin Goudard, Suliann Ben Hamed

Social communication draws on several cognitive functions such as perception, emotion recognition and attention. The association of audio-visual information is essential to the processing of species-specific communication signals. In this study, we use functional magnetic resonance imaging in order to identify the subcortical areas involved in the cross-modal association of visual and auditory information based on their common social meaning. We identified three subcortical regions involved in audio-visual processing of species-specific communicative signals: the dorsolateral amygdala, the claustrum and the pulvinar...

38637993

April 18, 2024: European Journal of Neuroscience

#6

JOURNAL ARTICLE

Beyond visual integration: sensitivity of the temporal-parietal junction for objects, places, and faces.

Johannes Rennig, Christina Langenberger, Hans-Otto Karnath

One important role of the TPJ is the contribution to perception of the global gist in hierarchically organized stimuli where individual elements create a global visual percept. However, the link between clinical findings in simultanagnosia and neuroimaging in healthy subjects is missing for real-world global stimuli, like visual scenes. It is well-known that hierarchical, global stimuli activate TPJ regions and that simultanagnosia patients show deficits during the recognition of hierarchical stimuli and real-world visual scenes...

38637870

April 18, 2024: Behavioral and Brain Functions: BBF

#7

JOURNAL ARTICLE

Mediating sequential turn-on and turn-off fluorescence signals for discriminative detection of Ag + and Hg 2+ via readily available CdSe quantum dots.

Rong Wang, Zi Yi Xu, Ting Li, Nian Bing Li, Hong Qun Luo

Realizing the accurate recognition and quantification of heavy metal ions is pivotal but challenging in the environmental, biological, and physiological science fields. In this work, orange fluorescence emitting quantum dots (OQDs) have been facilely synthesized by one-step method. The participation of silver ion (Ag+ ) can evoke the unique aggregation-induced emission (AIE) of OQDs, resulting in prominent fluorescence enhancement, which is scarcely reported previously. Moreover, the Ag+ -triggered turn-on fluorescence can be continuously shut down by mercury ion (Hg2+ )...

38636427

April 13, 2024: Spectrochimica Acta. Part A, Molecular and Biomolecular Spectroscopy

#8

JOURNAL ARTICLE

Multiscale apple recognition method based on improved CenterNet.

Han Zhou

Traditional apple-picking robots are unable to detect apples in real-time in complex environments. In order to improve detection efficiency, a fast CenterNet apple recognition method for multiple apple targets in dense scenes is proposed. This method can quickly and accurately identify multiple apple targets in dense scenes. The backbone network mainly consists of resnet-44 fully convolutional network, region of interest network (RPN), and region of interest (ROI). The experimental results show that the improved YoloV5 network model has a higher recognition accuracy of 94...

38633658

April 15, 2024: Heliyon

#9

JOURNAL ARTICLE

Weakly supervised temporal action localization with actionness-guided false positive suppression.

Zhilin Li, Zilei Wang, Qinying Liu

Weakly supervised temporal action localization aims to locate the temporal boundaries of action instances in untrimmed videos using video-level labels and assign them the corresponding action category. Generally, it is solved by a pipeline called "localization-by-classification", which finds the action instances by classifying video snippets. However, since this approach optimizes the video-level classification objective, the generated activation sequences often suffer interference from class-related scenes, resulting in a large number of false positives in the prediction results...

38626617

April 15, 2024: Neural Networks: the Official Journal of the International Neural Network Society

#10

JOURNAL ARTICLE

Fast Building Instance Proxy Reconstruction for Large Urban Scenes.

Jianwei Guo, Haobo Qin, Yinchang Zhou, Xin Chen, Liangliang Nan, Hui Huang

Digitalization of large-scale urban scenes (in particular buildings) has been a long-standing open problem, which attributes to the challenges in data acquisition, such as incomplete scene coverage, lack of semantics, low efficiency, and low reliability in path planning. In this paper, we address these challenges in urban building reconstruction from aerial images, and we propose an effective workflow and a few novel algorithms for efficient 3D building instance proxy reconstruction for large urban scenes. Specifically, we propose a novel learning-based approach to instance segmentation of urban buildings from aerial images followed by a voting-based algorithm to fuse the multi-view instance information to a sparse point cloud (reconstructed using a standard Structure from Motion pipeline)...

38625775

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#11

JOURNAL ARTICLE

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.

Ruonan Zhang, Gaoyun An, Yiqing Hao, Dapeng Oliver Wu

Scene Graph Generation (SGG) aims to detect visual relationships in an image. However, due to long-tailed bias, SGG is far from practical. Most methods depend heavily on the assistance of statistics co-occurrence to generate a balanced dataset, so they are dataset-specific and easily affected by noises. The fundamental cause is that SGG is simplified as a classification task instead of a reasoning task, thus the ability capturing the fine-grained details is limited and the difficulty in handling ambiguity is increased...

38625774

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#12

JOURNAL ARTICLE

Figure-ground segmentation based on motion in the archerfish.

Svetlana Volotsky, Ronen Segev

Figure-ground segmentation is a fundamental process in visual perception that involves separating visual stimuli into distinct meaningful objects and their surrounding context, thus allowing the brain to interpret and understand complex visual scenes. Mammals exhibit varying figure-ground segmentation capabilities, ranging from primates that can perform well on figure-ground segmentation tasks to rodents that perform poorly. To explore figure-ground segmentation capabilities in teleost fish, we studied how the archerfish, an expert visual hunter, performs figure-ground segmentation...

38616235

April 15, 2024: Animal Cognition

#13

JOURNAL ARTICLE

Using a flashlight-contingent window paradigm to investigate visual search and object memory in virtual reality and on computer screens.

Julia Beitner, Jason Helbing, Erwan Joël David, Melissa Lê-Hoa Võ

A popular technique to modulate visual input during search is to use gaze-contingent windows. However, these are often rather discomforting, providing the impression of visual impairment. To counteract this, we asked participants in this study to search through illuminated as well as dark three-dimensional scenes using a more naturalistic flashlight with which they could illuminate the rooms. In a surprise incidental memory task, we tested the identities and locations of objects encountered during search. Importantly, we tested this study design in both immersive virtual reality (VR; Experiment 1) and on a desktop-computer screen (Experiment 2)...

38615047

April 13, 2024: Scientific Reports

#14

JOURNAL ARTICLE

Wi-CHAR: A WiFi Sensing Approach with Focus on Both Scenes and Restricted Data.

Zhanjun Hao, Kaikai Han, Zinan Zhang, Xiaochao Dang

Significant strides have been made in the field of WiFi-based human activity recognition, yet recent wireless sensing methodologies still grapple with the reliance on copious amounts of data. When assessed in unfamiliar domains, the majority of models experience a decline in accuracy. To address this challenge, this study introduces Wi-CHAR, a novel few-shot learning-based cross-domain activity recognition system. Wi-CHAR is meticulously designed to tackle both the intricacies of specific sensing environments and pertinent data-related issues...

38610574

April 8, 2024: Sensors

#15

JOURNAL ARTICLE

An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR.

Jie Chen, Wenbo Li, Pengshuai Hou, Zipeng Yang, Haoyu Zhao

In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance...

38610414

March 29, 2024: Sensors

#16

JOURNAL ARTICLE

Cataract-1K Dataset for Deep-Learning-Assisted Analysis of Cataract Surgery Videos.

Negin Ghamsarian, Yosuf El-Shabrawi, Sahar Nasirihaghighi, Doris Putzgruber-Adamitsch, Martin Zinkernagel, Sebastian Wolf, Klaus Schoeffmann, Raphael Sznitman

In recent years, the landscape of computer-assisted interventions and post-operative surgical video analysis has been dramatically reshaped by deep-learning techniques, resulting in significant advancements in surgeons' skills, operation room management, and overall surgical outcomes. However, the progression of deep-learning-powered surgical technologies is profoundly reliant on large-scale datasets and annotations. In particular, surgical scene understanding and phase recognition stand as pivotal pillars within the realm of computer-assisted surgery and post-operative assessment of cataract surgery videos...

38609405

April 12, 2024: Scientific Data

#17

JOURNAL ARTICLE

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.

Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with numerous hand-crafted components. While these methods have achieved initial success, the cascade pipeline tends to accumulate errors because of duplicated and inaccurate box estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR, a simple-yet-effective transformer framework that decouples the decoding process of caption generation and object localization through parallel decoding...

38607711

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#18

JOURNAL ARTICLE

Anhedonia reflects an encoding deficit for pleasant stimuli in schizophrenia: Evidence from the emotion-induced memory trade-off eye-tracking paradigm.

Kayla M Whearty, Ivan Ruiz, Anna R Knippenberg, Gregory P Strauss

OBJECTIVE: The present study explored the hypothesis that anhedonia reflects an emotional memory impairment for pleasant stimuli, rather than diminished hedonic capacity in individuals with schizophrenia (SZ). METHOD: Participants included 30 SZ and 30 healthy controls (HCs) subjects who completed an eye-tracking emotion-induced memory trade-off task where contextually relevant pleasant, unpleasant, or neutral items were inserted into the foreground of neutral background scenes...

38602815

April 11, 2024: Neuropsychology

#19

JOURNAL ARTICLE

PERF: Panoramic Neural Radiance Field from a Single Panorama.

Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu

Neural Radiance Field (NeRF) has achieved substantial progress in novel view synthesis given multi-view images. Recently, some works have attempted to train a NeRF from a single image with 3D priors. They mainly focus on a limited field of view with a few occlusions, which greatly limits their scalability to real-world 360-degree panoramic scenarios with large-size occlusions. In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama...

38598389

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#20

JOURNAL ARTICLE

On investigating drivers' attention allocation during partially-automated driving.

Reem Jalal Eddine, Claudio Mulatti, Francesco N Biondi

The use of partially-automated systems require drivers to supervise the system functioning and resume manual control whenever necessary. Yet literature on vehicle automation show that drivers may spend more time looking away from the road when the partially-automated system is operational. In this study we answer the question of whether this pattern is a manifestation of inattentional blindness or, more dangerously, it is also accompanied by a greater attentional processing of the driving scene. Participants drove a simulated vehicle in manual or partially-automated mode...

38598036

April 10, 2024: Cognitive Research: Principles and Implications

Use the keywords feature with a free QxMD account.

scene recognition

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips