☆ 4.7 Article

Efficient segmentation-free keyword spotting in historical document collections

PATTERN RECOGNITION (2015)

期刊

PATTERN RECOGNITION

卷 48, 期 2, 页码 545-555

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2014.08.021

关键词

Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

Spanish Ministry of Education and Science [TIN2011-25606, TIN2012-37475-C02-02]
European project [ERC-2010-AdG-20100407-269796]
People Programme (Marie Curie Actions) of the Seventh Framework Programme of the European Union (FP7) under REA grant [600388]
Agency of Competitiveness for Companies of the Government of Catalonia, ACCIO

向作者/读者索取更多资源

Protocol

Reagent

摘要

In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches. (C) 2014 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Engineering, Electrical & Electronic

Error-Diffusion Based Speech Feature Quantization for Small-Footprint Keyword Spotting

Mengjie Luo, Dingyi Wang, Xiaoqin Wang, Shushan Qiao, Yumei Zhou

Summary: This letter proposes an error-diffusion based speech feature quantization method that adapts image processing to quantize the input speech feature maps in arbitrary bits. Experimental results show that the method achieves good accuracy and adaptability in keyword spotting tasks.

IEEE SIGNAL PROCESSING LETTERS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Learning-free handwritten word spotting method for historical handwritten documents

Hanadi Hassen Mohammed, Nandhini Subramanian, Somaya Al-Madeed

Summary: This study introduces a new technique for multi-language word spotting on degraded and noisy historical documents, utilizing different feature extraction methods to improve performance. The use of cross-correlation measure, feature extraction, and matching methods in extracting and ranking regions of interest is highlighted.

IET IMAGE PROCESSING (2021)

添加到收藏夹

Article Computer Science, Information Systems

Evaluating Robustness to Noise and Compression of Deep Neural Networks for Keyword Spotting

Pedro H. Pereira, Wesley Beccaro, Miguel A. Ramirez

Summary: Keyword Spotting (KWS) has gained attention in embedded systems for command recognition. This study evaluates keyword recognition using deep learning models and explores transfer learning, pruning, and quantization strategies. Compression techniques like pruning and quantization are also assessed. The approach achieves 94.6% accuracy with a 70% reduction in model size by pruning 80% of the parameters in the SqueezeNet network using Google's Speech Commands dataset and additive babble noise signal.

IEEE ACCESS (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

Gianmarco Cerutti, Lukas Cavigelli, Renzo Andri, Michele Magno, Elisabetta Farella, Luca Benini

Summary: Keyword spotting (KWS) is a crucial function for interacting with smart devices, and this study focuses on improving KWS energy efficiency on low-cost microcontroller units (MCUs). By combining analog binary feature extraction with binary neural networks, the energy consumption for data acquisition and preprocessing is significantly reduced. Experimental results show that the proposed system outperforms state-of-the-art accuracy and energy efficiency, offering a compelling accuracy-energy trade-off.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2022)

添加到收藏夹

Article Mathematics, Interdisciplinary Applications

Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic Smoothing

Osama A. Khan, Shaukat Wasi, Muhammad Shoaib Siddiqui, Asim Karim

Summary: Keyword extraction is the process of selecting the most significant, relevant, and descriptive terms as keywords from a single document, with major applications in various domains of information retrieval. This paper presents a novel supervised technique called CCSS for keyword extraction, which outperforms other existing techniques according to experiments on the INSPEC dataset.

COMPLEXITY (2022)

添加到收藏夹

Article Engineering, Civil

Transfer Beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation

Jiaming Zhang, Chaoxiang Ma, Kailun Yang, Alina Roitberg, Kunyu Peng, Rainer Stiefelhagen

Summary: This study introduces panoramic semantic segmentation through the perspective of domain adaptation, establishing a new dataset and framework to address the issue of annotated training data scarcity for panoramic images while achieving unsupervised domain adaptation from conventional pinhole camera images.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

W-Net: Dense and diagnostic semantic segmentation of subcutaneous and breast tissue in ultrasound images by incorporating ultrasound RF waveform data

Gautam Rajendrakumar Gare, Jiayuan Li, Rohan Joshi, Rishikesh Magar, Mrunal Prashant Vaze, Michael Yousefpour, Ricardo Luis Rodriguez, John Michael Galeotti

Summary: This study focuses on the semantic segmentation of ultrasound scans using raw ultrasound waveforms. The W-Net CNN framework is introduced as the first deep-learning approach to analyze ultrasound RF data, showing improved segmentation performance. Subcutaneous tissue segmentation is chosen as the primary clinical goal, with potential applications in plastic surgery and other areas. The impact of RF data on dense labeling and generalization of the networks are explored, showcasing the diagnostic capabilities of W-Net.

MEDICAL IMAGE ANALYSIS (2022)

添加到收藏夹

Article Agriculture, Multidisciplinary

Pixelwise instance segmentation of leaves in dense foliage

Jehan-Antoine Vayssade, Gawain Jones, Christelle Gee, Jean-Noel Paoli

Summary: Detecting and identifying plants using image analysis is crucial in precision agriculture. This study proposes a pixelwise instance segmentation method based on Convolutional Neural Networks to detect leaves in dense foliage. The method combines several techniques such as deep contour aware, leaf segmentation through edge classification, and Pyramid CNN for Dense Leaves. Experimental results show that the proposed method performs well in leaf segmentation challenges and outperforms the traditional RCNN method on a new dataset.

COMPUTERS AND ELECTRONICS IN AGRICULTURE (2022)

添加到收藏夹

Article Chemistry, Multidisciplinary

Rwin-FPN plus plus : Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting

Chengbin Zeng, Yi Liu, Chunli Song

Summary: The study proposes a method called Rwin-FPN++ that incorporates the features of the Rwin Transformer into the FPN to improve the performance of scene text detection. The research conducts experiments on a dense scene text dataset and shows that the Rwin-FPN++ network outperforms other methods.

APPLIED SCIENCES-BASEL (2022)

添加到收藏夹

Article Business

Keyword Selection Strategies in Search Engine Optimization: How Relevant is Relevance?

Mayank Nagpal, J. Andrew Petersen

Summary: By studying the interaction of search characteristics and website characteristics, we can affect the expected organic clicks and rank a website receives from the SERP. Content relevance has an impact on SEO effectiveness, but its role varies in different stages of the customer journey.

JOURNAL OF RETAILING (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Adversarial Dense Contrastive Learning for Semi-Supervised Semantic Segmentation

Ying Wang, Ziwei Xuan, Chiuman Ho, Guo-Jun Qi

Summary: Semi-supervised dense prediction tasks, such as semantic segmentation, can be greatly improved through the use of contrastive learning. This approach faces challenges in selecting informative negative samples and implementing effective data augmentation. To address these challenges, an adversarial contrastive learning method is proposed for semi-supervised semantic segmentation. The method incorporates direct learning of adversarial negatives, an advanced data augmentation strategy called AdverseMix, and the use of auxiliary labels and classifiers.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Rethinking Local and Global Feature Representation for Dense Prediction

Mohan Chen, Li Zhang, Rui Feng, Xiangyang Xue, Jianfeng Feng

Summary: Although FCNs have limitations in capturing long-range structured relationship, recent Transformer-based models have achieved success in computer vision tasks. This study proposes a Dual-Stream Convolution-Transformer architecture to combine local and global feature representation for powerful dense prediction.

PATTERN RECOGNITION (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Dense Pixel-Level Interpretation of Dynamic Scenes With Video Panoptic Segmentation

Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

Summary: This paper proposes a new computer vision benchmark called Video Panoptic Segmentation (VPS), and introduces two datasets, a new evaluation metric, and an advanced video panoptic segmentation network called VPSNet++. VPSNet++ achieves state-of-the-art results on both Cityscapes-VPS and VIPER datasets.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

DyGAT: Dynamic stroke classification of online handwritten documents and sketches

Yu-Ting Yang, Yan-Ming Zhang, Xiao-Long Yun, Fei Yin, Cheng-Lin Liu

Summary: Online handwriting is widely used in various domains. This paper introduces a method called Dynamic Graph ATtention network (DyGAT) to solve the dynamic stroke classification problem. The core idea is to formalize a document/sketch as a multifeature graph with nodes representing strokes and edges representing their relationships. The proposed method is applicable to different types of online handwritten data and achieves competitive performance in various tasks.

PATTERN RECOGNITION (2023)

添加到收藏夹

Article Remote Sensing

Dense context distillation network for semantic parsing of oblique UAV images

Youli Ding, Xianwei Zheng, Yiping Chen, Shuhan Shen, Hanjiang Xiong

Summary: In this paper, a dense context distillation network (DCDNet) is proposed for semantic segmentation of oblique unmanned aerial vehicle (UAV) images. DCDNet effectively learns distortion-robust feature representation by densely and selectively gathering useful context from dual-scale feature maps. It also incorporates joint supervision and multi-scale feature aggregation for better learning and prediction, achieving a state-of-the-art segmentation performance on the challenging UAVid dataset with a mIoU score of 72.38%.

INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Real-time Lexicon-free Scene Text Retrieval

Andres Mafla, Ruben Tito, Sounak Dey, Lluis Gomez, Marcal Rusinol, Ernest Valveny, Dimosthenis Karatzas

Summary: In this study, the task of scene text retrieval is addressed by proposing a single shot CNN architecture for predicting bounding boxes and building compact representations of spotted words. Experimental results demonstrate that the proposed model outperforms previous state-of-the-art while offering significant increase in processing speed and unmatched expressiveness.

PATTERN RECOGNITION (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Candidate fusion: Integrating language modelling into a sequence-to-sequence handwritten word recognition architecture

Lei Kang, Pau Riba, Mauricio Villegas, Alicia Fornes, Marcal Rusinol

Summary: Sequence-to-sequence models have gained popularity for handwritten word recognition, but integrating an external language model effectively remains a challenge. Candidate Fusion introduces a new input to the recognizer, improving the performance of handwritten word recognition tasks.

PATTERN RECOGNITION (2021)

添加到收藏夹

Editorial Material Computer Science, Artificial Intelligence

Editorial for special issue on Advanced Topics in Document Analysis and Recognition

Josep Llados, Daniel Lopresti, Seiichi Uchida

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Pay attention to what you read: Non-recurrent handwritten text-Line recognition

Lei Kang, Pau Riba, Marcal Rusinol, Alicia Fornes, Mauricio Villegas

Summary: This paper introduces a novel method that bypasses recurrence during the training process using transformer models for handwriting recognition. By utilizing multi-head self-attention layers, the model is able to handle character recognition and learn the language-related dependencies of character sequences to be decoded. The model is capable of recognizing out-of-vocabulary words.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Content and Style Aware Generation of Text-Line Images for Handwriting Recognition

Lei Kang, Pau Riba, Marcal Rusinol, Alicia Fornes, Mauricio Villegas

Summary: Handwritten Text Recognition has achieved impressive results in public benchmarks, but the high variability between handwriting styles requires a large amount of labeled training data. To address this, synthetic data has been used to increase the training data volume and style variability. However, there is a style bias between synthetic and real data that hinders recognition performance improvement. To overcome this, a generative method that considers visual appearance and textual content is proposed, able to produce diverse handwritten text-line samples.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

VLCDoC: Vision-Language contrastive pre-training model for cross-Modal document classification

Souhail Bakkali, Zuheng Ming, Mickael Coustaty, Marcal Rusinol, Oriol Ramos Terrades

Summary: This paper proposes a method for document classification by learning cross-modal representations through language and vision cues, focusing on intra-and inter-modality relationships. Instead of merging features, the method exploits high-level interactions and attention flows within and across modalities to learn relevant semantic information. The proposed learning objective computes the similarity distribution between intra-and inter-modality alignment tasks using positive and negative sample pairs in the joint representation space. Extensive experiments on benchmark datasets demonstrate the effectiveness and generality of the model.

PATTERN RECOGNITION (2023)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Easing Automatic Neurorehabilitation via Classification and Smoothness Analysis

Asma Bensalah, Alicia Fornes, Cristina Carmona-Duarte, Josep Llados

Summary: Assessing the quality of movements for post-stroke patients during rehabilitation is crucial due to the lack of a standardized rehabilitation plan. To address this, an automatic assessment pipeline is proposed using deep learning to recognize patients' movements and measure their quality. The clinical relevance of the dataset used allows for detecting the contrast between healthy and patient movements, as well as evaluating patients' progress during rehabilitation sessions.

INTERTWINING GRAPHONOMICS WITH HUMAN MOVEMENTS, IGS 2021 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

The RPM3D Project: 3D Kinematics for Remote Patient Monitoring

Alicia Fornes, Asma Bensalah, Cristina Carmona-Duarte, Jialuo Chen, Miguel A. Ferrer, Andreas Fischer, Josep Llados, Cristina Martin, Eloy Opisso, Rejean Plamondon, Anna Scius-Bertrand, Josep Maria Tormos

Summary: This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smart-watches. The research has been validated in a real case scenario for stroke rehabilitation, showing promising results. The work could have a great impact in remote healthcare applications, improving medical efficiency and reducing healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures, and exploring the application of the technology to monitor other neurodegenerative diseases.

INTERTWINING GRAPHONOMICS WITH HUMAN MOVEMENTS, IGS 2021 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts

Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma Bensalah, Josep Llados, Alicia Fornes, Angelo Marcelli

Summary: Despite advances in automatic text recognition, historical manuscripts remain challenging due to the lack of labelled data. This paper proposes a few-shot learning paradigm to reduce dependency on vocabulary in Handwritten Text Recognition (HTR), achieving promising results.

FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Read While You Drive - Multilingual Text Tracking on the Road

Sergi Garcia-Bordils, George Tom, Sangeeth Reddy, Minesh Mathew, Marcal Rusinol, C. Jawahar, Dimosthenis Karatzas

Summary: This paper presents RoadText-3K, a large driving video dataset with fully annotated text, which is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions, and multiple languages and scripts. The article also offers a comprehensive analysis of the limitations of state-of-the-art text detection methods and proposes a new tracking model that achieves state-of-the-art results.

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

A Generic Image Retrieval Method for Date Estimation of Historical Document Collections

Adria Molina, Lluis Gomez, Oriol Ramos Terrades, Josep Llados

Summary: This paper presents a robust date estimation system that performs well on different types of real document images. The system can be used for historical contextual retrieval, allowing scholars to compare and analyze historical images.

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach

Adria Molina, Pau Riba, Lluis Gomez, Oriol Ramos-Terrades, Josep Llados

Summary: This paper presents a novel method that formulates date estimation of historical photographs as a retrieval task, ranking images based on estimated date similarity. Unlike traditional models, the proposed method uses a learning objective based on the nDCG ranking metric. Experimental results on the DEW public database show improved performance in date estimation and date-sensitive image retrieval tasks compared to baseline methods.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting

Pau Riba, Adria Molina, Lluis Gomez, Oriol Ramos-Terrades, Josep Llados

Summary: This paper explores and evaluates the use of ranking-based objective functions for learning word string and image encoders simultaneously, achieving competitive performance in word spotting tasks.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Graph-Based Deep Generative Modelling for Document Layout Generation

Sanket Biswas, Pau Riba, Josep Llados, Umapada Pal

Summary: This study introduces an automated deep generative model using Graph Neural Networks to generate synthetic data for training document interpretation systems, especially in digital mailroom applications. It is the first graph-based approach experimented on administrative document images, including invoices, for document layout generation tasks.

DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Sanket Biswas, Pau Riba, Josep Llados, Umapada Pal

Summary: This paper introduces a novel approach called DocSynth for automatically synthesizing document images based on a given layout. The model learns to generate realistic document images consistent with the defined layout and can be used as a superior baseline model for creating synthetic document image datasets. The results show that the model successfully generates realistic and diverse document images with multiple objects, making it a significant advancement in image generation tasks.

DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Exploiting sublimated deep features for image retrieval

Guang-Hai Liu, Zuo-Yong Li, Jing-Yu Yang, David Zhang

Summary: This article introduces a novel image retrieval method that improves retrieval performance by using sublimated deep features. The method incorporates orientation-selective features and color perceptual features, effectively mimicking these mechanisms to provide a more discriminating representation.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Region-adaptive and context-complementary cross modulation for RGB-T semantic segmentation

Fengguang Peng, Zihan Ding, Ziming Chen, Gang Wang, Tianrui Hui, Si Liu, Hang Shi

Summary: RGB-Thermal (RGB-T) semantic segmentation is an emerging task that aims to improve the robustness of segmentation methods under extreme imaging conditions by using thermal infrared modality. The challenges of foreground-background distinguishment and complementary information mining are addressed by proposing a cross modulation process with two collaborative components. Experimental results show that the proposed method achieves state-of-the-art performances on current RGB-T segmentation benchmarks.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

Baihong Han, Xiaoyan Jiang, Zhijun Fang, Hamido Fujita, Yongbin Gao

Summary: This paper proposes a novel automatic prompt generation method called F-SCP, which focuses on generating accurate prompts for low-accuracy classes and similar classes. Experimental results show that our approach outperforms state-of-the-art methods on six multi-domain datasets.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Residual Deformable Convolution for better image de-weathering

Huikai Liu, Ao Zhang, Wenqian Zhu, Bin Fu, Bingjian Ding, Shengwu Xiong

Summary: Adverse weather conditions present challenges for computer vision tasks, and image de-weathering is an important component of image restoration. This paper proposes a multi-patch skip-forward structure and a Residual Deformable Convolutional module to improve feature extraction and pixel-wise reconstruction.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A linear transportation LP distance for pattern recognition

Oliver M. Crook, Mihai Cucuringu, Tim Hurst, Carola-Bibiane Schonlieb, Matthew Thorpe, Konstantinos C. Zygalakis

Summary: The transportation LP distance (TLP) is a generalization of the Wasserstein WP distance that can be applied directly to color or multi-channelled images, as well as multivariate time-series. TLP interprets signals as functions, while WP interprets signals as measures. Although both distances are powerful tools in modeling data with spatial or temporal perturbations, their computational cost can be prohibitively high for moderate pattern recognition tasks. The linear Wasserstein distance offers a method for projecting signals into a Euclidean space, and in this study, we propose linear versions of the TLP distance (LTLP) that show significant improvement over the linear WP distance in signal processing tasks while being several orders of magnitude faster to compute than the TLP distance.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Learning a target-dependent classifier for cross-domain semantic segmentation: Fine-tuning versus meta-learning

Haitao Tian, Shiru Qu, Pierre Payeur

Summary: This paper proposes a method of target-dependent classifier, which optimizes the joint hypothesis of domain adaptation into a target-dependent hypothesis that better fits with the target domain clusters through an unsupervised fine-tuning strategy and the concept of meta-learning. Experimental results demonstrate that this method outperforms existing techniques in synthetic-to-real adaptation and cross-city adaptation benchmarks.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

KGSR: A kernel guided network for real-world blind super-resolution

Qingsen Yan, Axi Niu, Chaoqun Wang, Wei Dong, Marcin Wozniak, Yanning Zhang

Summary: Deep learning-based methods have achieved remarkable results in the field of super-resolution. However, the limitation of paired training image sets has led researchers to explore self-supervised learning. However, the assumption of inaccurate downscaling kernel functions often leads to degraded results. To address this issue, this paper introduces KGSR, a kernel-guided network that trains both upscaling and downscaling networks to generate high-quality high-resolution images even without knowing the actual downscaling process.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Gait feature learning via spatio-temporal two-branch networks

Yifan Chen, Xuelong Li

Summary: Gait recognition is a popular technology for identification due to its ability to capture gait features over long distances without cooperation. However, current methods face challenges as they use a single network to extract both temporal and spatial features. To solve this problem, we propose a two-branch network that focuses on spatial and temporal feature extraction separately. By combining these features, we can effectively learn the spatio-temporal information of gait sequences.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

PAMI: Partition Input and Aggregate Outputs for Model Interpretation

Wei Shi, Wentao Zhang, Wei-shi Zheng, Ruixuan Wang

Summary: This article proposes a simple yet effective visualization framework called PAMI, which does not require detailed model structure and parameters to obtain visualization results. It can be applied to various prediction tasks with different model backbones and input formats.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Disturbance rejection with compensation on features

Xiaobo Hu, Jianbo Su, Jun Zhang

Summary: This paper reviews the latest technologies in pattern recognition, highlighting their instabilities and failures in practical applications. From a control perspective, the significance of disturbance rejection in pattern recognition is discussed, and the existing problems are summarized. Finally, potential solutions related to the application of compensation on features are discussed to emphasize future research directions.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

ECLAD: Extracting Concepts with Local Aggregated Descriptors

Andres Felipe Posada-Moreno, Nikita Surya, Sebastian Trimpe

Summary: Convolutional neural networks are widely used in critical systems, and explainable artificial intelligence has proposed methods for generating high-level explanations. However, these methods lack the ability to determine the location of concepts. To address this, we propose a novel method for automatic concept extraction and localization based on pixel-wise aggregations, and validate it using synthetic datasets.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Dynamic Graph Contrastive Learning via Maximize Temporal Consistency

Peng Bao, Jianian Li, Rong Yan, Zhongyi Liu

Summary: In this paper, a novel Dynamic Graph Contrastive Learning framework, DyGCL, is proposed to capture the temporal consistency in dynamic graphs and achieve good performance in node representation learning.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

ConvGeN: A convex space learning approach for deep-generative oversampling and imbalanced classification of small tabular datasets

Kristian Schultz, Saptarshi Bej, Waldemar Hahn, Markus Wolfien, Prashant Srivastava, Olaf Wolkenhauer

Summary: Research indicates that deep generative models perform poorly compared to linear interpolation-based methods for synthetic data generation on small, imbalanced tabular datasets. To address this, a new approach called ConvGeN, combining convex space learning with deep generative models, has been proposed. ConvGeN improves imbalanced classification on small datasets while remaining competitive with existing linear interpolation methods.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

H-CapsNet: A capsule network for hierarchical image classification

Khondaker Tasrif Noor, Antonio Robles-Kelly

Summary: In this paper, the authors propose H-CapsNet, a capsule network designed for hierarchical image classification. The network effectively captures hierarchical relationships using dedicated capsules for each class hierarchy. A modified hinge loss is utilized to enforce consistency among the involved hierarchies. Additionally, a strategy for dynamically adjusting training parameters is presented to achieve better balance between the class hierarchies. Experimental results demonstrate that H-CapsNet outperforms competing hierarchical classification networks.

PATTERN RECOGNITION (2024)

添加到收藏夹

Article Computer Science, Artificial Intelligence

CS-net: Conv-simpleformer network for agricultural image segmentation

Lei Liu, Guorun Li, Yuefeng Du, Xiaoyu Li, Xiuheng Wu, Zhi Qiao, Tianyi Wang

Summary: This study proposes a new agricultural image segmentation model called CS-Net, which uses Simple-Attention Block and Simpleformer to improve accuracy and inference speed, and addresses the issue of performance collapse of Transformers in agricultural image processing.

PATTERN RECOGNITION (2024)

添加到收藏夹

© Peeref 2019-2024. All rights reserved.