Article
Computer Science, Theory & Methods
Thi Tuyet Hai Nguyen, Adam Jatowt, Mickael Coustaty, Antoine Doucet
Summary: The article highlights the importance of improving the quality of OCR results, as historical materials often perform poorly in OCR processing and require post-correction. It defines the postOCR processing problem, describes its typical pipeline, and reviews the latest post-OCR processing methods, along with discussing evaluation metrics, accessible datasets, language resources, and toolkits. Additionally, the work identifies the current trend and outlines research directions in this field.
ACM COMPUTING SURVEYS
(2021)
Article
Computer Science, Artificial Intelligence
Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, Graham Neubig
Summary: The paper introduces a semi-supervised learning method to improve OCR system performance by utilizing raw images and self-training, and introduces a lexically aware decoding method. Results show that self-training and lexically aware decoding are essential for achieving consistent improvements.
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
(2021)
Article
Computer Science, Information Systems
Niddal H. Imam, Vassilios G. Vassilakis, Dimitris Kolovos
Summary: This paper presents an OCR-based system for detecting images with embedded text shared on social networks and proposes an OCR post-correction algorithm to improve the system's robustness. Experimental results demonstrate the effectiveness of the algorithm in detecting and correcting adversarial text images, leading to improved performance of the OCR system.
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
M. C. Shunmuga Priya, D. Karthika Renuka, L. Ashok Kumar
Summary: Speech recognition is widely used but still faces the challenge of spell errors. This research proposes a BERT-based spell correction module to enhance ASR system performance. Experimental results demonstrate the efficacy of this module in detecting and correcting spell errors.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS
(2022)
Review
Chemistry, Multidisciplinary
Rayyan Najam, Safiullah Faizullah
Summary: Arabic handwritten-text recognition uses OCR and text-correction techniques for accurate text extraction from images. Deep learning has been widely used in OCR, but recent deep-learning techniques for Arabic handwritten OCR and text correction have not been adequately studied or analyzed. This analysis fills this gap by uncovering recent developments and limitations, providing valuable insights for researchers, practitioners, and interested readers. The study finds that CNN-LSTM-CTC is the most suitable architecture for OCR, and DL models improve accuracy in OCR text correction. The study highlights the potential for applying text-embedding models to correct OCR results in Arabic OCR and emphasizes the need for high-quality datasets and future research in this area.
APPLIED SCIENCES-BASEL
(2023)
Article
Chemistry, Multidisciplinary
Wei Gou, Zheng Chen
Summary: Chinese Spelling Error Correction is a hot topic in natural language processing, with many solutions from rule-based to deep learning methods. Although SpellGCN has achieved the best results, it produces many false error correction results in practical tasks. The proposed post-processing method aims to improve performance by filtering out these false results.
APPLIED SCIENCES-BASEL
(2021)
Article
Computer Science, Artificial Intelligence
Lijun Lyu, Maria Koutraki, Martin Krickl, Besnik Fetahu
Summary: Optical character recognition is crucial for accessing historical collections, but it faces challenges such as orthographic variations and language evolution leading to transcription errors. A neural network approach is proposed to correct OCR errors, which significantly reduces the word error rate.
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
(2021)
Article
Engineering, Civil
Babak Alizadeh, Alireza Ghaderi Bafti, Hamid Kamangir, Yu Zhang, Daniel B. Wright, Kristie J. Franz
Summary: The study introduces a novel deep learning model SAINA-LSTM, which improves streamflow forecasting performance by incorporating attention mechanism into LSTM cells. SAINA-LSTM outperforms other models in various climatological basins and for 1- to 7-day ahead forecasts in different flow ranges.
JOURNAL OF HYDROLOGY
(2021)
Article
Mathematics
Vasyl Lytvyn, Petro Pukach, Victoria Vysotska, Myroslava Vovk, Nataliia Kholodna
Summary: A machine learning model has been developed to correct errors in Ukrainian texts. The neural network has the ability to correct simple sentences in Ukrainian, but a complete system requires the use of spell-checking dictionaries and rule checking. A pre-trained BERT neural network was used to save computing resources and showed satisfactory results in correcting grammatical and stylistic errors. Among the pre-trained models, the mT5 model performed the best according to BLEU and METEOR metrics.
Article
Chemistry, Multidisciplinary
Hitarth Choubisa, Md Azimul Haque, Tong Zhu, Lewei Zeng, Maral Vafaie, Derya Baran, Edward H. Sargent
Summary: The exploration of thermoelectric materials is challenging due to the large materials space and the complexity of synthesis. By incorporating historical data and using error-correction learning, this study discovers a previously unexplored family of thermoelectric materials and finds an optimized material with significantly improved power factor. It is observed that a closed-loop experimentation strategy reduces the required number of experiments by up to 3 times compared to high-throughput searches powered by state-of-the-art machine-learning models.
ADVANCED MATERIALS
(2023)
Article
Physics, Multidisciplinary
Thomas Wagner, Hermann Kampermann, Dagmar Bruss, Martin Kliesch
Summary: The characterization of quantum devices is important but costly. This study focuses on the characterization of quantum computers in the context of stabilizer quantum error correction. It is shown that the logical error channel induced by Pauli noise can be estimated from syndrome data under minimal conditions for different types of codes.
PHYSICAL REVIEW LETTERS
(2023)
Article
Engineering, Electrical & Electronic
Srinidhi Karthikeyan, Alba G. Seco de Herrera, Faiyaz Doctor, Asim Mirza
Summary: The COVID-19 pandemic has placed a significant burden on the global healthcare sector, driving digital transformation efforts for improved efficiency. The generation of medical data has increased dramatically, with much of it being unstructured and stored as part of patients' medical reports. Optical Character Recognition (OCR) is used to digitize this unstructured data, but OCR engines often struggle with accurately transcribing scanned or handwritten documents. The proposed method utilizes a deep neural network pre-training technique called RoBERTa to predict and fill in the gaps in non-transcribable sections of the documents. Evaluation on domain-specific datasets, including real medical documents, demonstrates a significantly reduced word error rate and showcases the effectiveness of this approach.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2022)
Article
Computer Science, Artificial Intelligence
Yu Wang, Yuelin Wang, Kai Dang, Jie Liu, Zhuo Liu
Summary: This study provides a comprehensive review of the literature in the field of grammatical error correction (GEC), covering task definition, basic approaches, performance boosting techniques, data augmentation methods, and evaluation results. Emphasis is placed on approaches related to machine translation, with an analysis of error types and system advancements for a clear view of progress in GEC. Future research directions in GEC are also discussed.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
(2021)
Article
Chemistry, Multidisciplinary
Ahmad Musyafa, Ying Gao, Aiman Solyman, Chaojie Wu, Siraj Khan
Summary: This paper proposes an automatic model for Indonesian grammar correction based on the Transformer architecture, addressing the lack of research on the GEC task for low-resource languages (especially Indonesian). It also builds a large corpus of the Indonesian language for evaluating future Indonesian GEC tasks. Experimental results demonstrate significant and satisfactory performance of the Transformer-based automatic error correction model.
APPLIED SCIENCES-BASEL
(2022)
Article
Neurosciences
Julia Moser, Laura Batterink, Yiwen Li Hegner, Franziska Schleger, Christoph Braun, Ken A. Paller, Hubert Preissl
Summary: Humans are highly sensitive to patterns in the environment and use statistical learning for cognition. This study examined the neural mechanisms of statistical learning using an auditory nonlinguistic paradigm. Neural entrainment reflects implicit learning of patterns, while the emergence of explicit knowledge varies across individuals depending on factors such as attention and exposure time.
Article
Computer Science, Information Systems
Sang-Bing Tsai, Xusen Cheng, Yanwu Yang, Jason Xiong, Alex Zarifis
Summary: This article structurally concludes the methods proposed and evidenced to develop digital entrepreneurship from a socio-technical perspective. The technology itself and the process of utilization should be carefully considered. From a social perspective, fulfilling the needs of customers in social interaction and nurturing characteristics and social skills for the digital work environment are crucial.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Xiaochang Fang, Hongchen Wu, Jing Jing, Yihong Meng, Bing Yu, Hongzhu Yu, Huaxiang Zhang
Summary: This study proposes a novel fake news detection framework, utilizing news semantic environment perception (NSEP) to identify fake news content. The framework consists of steps such as dividing the semantic environment into macro and micro levels, applying graph convolutional networks, and utilizing multihead attention. Empirical experiments show that the NSEP framework achieves high accuracy in detecting Chinese fake news, outperforming other baseline methods and highlighting the importance of both micro and macro semantic environments in early detection of fake news.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Xudong Sun, Alladoumbaye Ngueilbaye, Kaijing Luo, Yongda Cai, Dingming Wu, Joshua Zhexue Huang
Summary: This paper proposes a scalable distributed frequent itemset mining (ScaDistFIM) algorithm to address the data scalability and flexibility issues in basket analysis in the big data era. Experiment results demonstrate that the ScaDistFIM algorithm is more efficient compared to the Spark FP-Growth algorithm.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Boxu Guan, Xinhua Zhu, Shangbo Yuan
Summary: This paper aims to improve the interpretability of machine reading comprehension models by utilizing the pre-trained T5 model for evidence inference. They propose an interpretable reading comprehension model based on T5, which is trained on a more accurate evidence corpus and can infer precise interpretations for answers. Experimental results show that their model outperforms the baseline BERT model on the SQuAD1.1 task.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Yanhao Wang, Baohua Zhang, Weikang Liu, Jiahao Cai, Huaping Zhang
Summary: In this study, we propose a data augmentation-based semantic text matching model called STMAP. By using Gaussian noise and noise mask signal for data augmentation, as well as employing an adaptive optimization network for training target optimization, our model achieves good performance in few-shot learning and semantic deviation problems.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Jiahao Yang, Shuo Feng, Wenkai Zhang, Ming Zhang, Jun Zhou, Pengyuan Zhang
Summary: To pursue profit from stock markets, researchers utilize deep learning methods to forecast asset price movements. However, there are two issues in current research, the discrepancy between forecasting results and profits, and heavy reliance on prior knowledge. To address these issues, researchers propose a novel optimization objective and modeling method, and conduct experiments to validate their approach.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Heng Zhang, Chengzhi Zhang, Yuzhuo Wang
Summary: This study provides an accurate analysis of technology development in the field of Natural Language Processing (NLP) from an entity-centric perspective. The findings indicate an increase in the average number of entities per paper, with pre-trained language models becoming mainstream and the impact of Wikipedia dataset and BLEU metric continuing to rise. There has been a surge in popularity for new high-impact technologies in recent years, with researchers accepting them at an unprecedented speed.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Davide Buscaldi, Danilo Dessi, Enrico Motta, Marco Murgia, Francesco Osborne, Diego Reforgiato Recupero
Summary: In scientific papers, citing other articles is a common practice to support claims and provide evidence. This paper proposes two automatic methods using Transformer models to address citation placement, and achieves significant improvements in experiments.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Baozhuang Niu, Lingfeng Wang, Xinhu Yu, Beibei Feng
Summary: This paper examines whether the incumbent brand should adopt digital technology to forecast demand and adjust order decisions in the face of soaring demand for medical supply caused by frequent outbreaks of regional COVID-19 epidemic. The study finds that digital transformation can lead to a triple-win situation among the incumbent brand, social welfare, and consumer surplus, as well as bring benefits to the manufacturer. Furthermore, the research provides insights for firms' digital entrepreneurship decisions through theoretical optimization and data processing/policy simulation.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Xueyang Qin, Lishang Li, Fei Hao, Meiling Ge, Guangyao Pang
Summary: Image-text retrieval is important in connecting vision and language. This paper proposes a method that utilizes prior knowledge to enhance feature representations and optimize network training for better retrieval results.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Review
Computer Science, Information Systems
Gang Ren, Lei Diao, Fanjia Guo, Taeho Hong
Summary: This paper proposes a novel approach for predicting the helpfulness of reviews by utilizing both textual and image features. The proposed method considers the correlation between features through self-attention and co-attention mechanisms, and fuses multi-modal features for prediction. Experimental results demonstrate the superior performance of the proposed method compared to benchmark methods.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Zhongquan Jian, Jiajian Li, Qingqiang Wu, Junfeng Yao
Summary: Aspect-Level Sentiment Classification (ALSC) is a crucial challenge in Natural Language Processing (NLP). Most existing methods fail to consider the correlations between different instances, leading to a lack of global viewpoint. To address this issue, we propose a Retrieval Contrastive Learning (RCL) framework that extracts intrinsic knowledge across instances for improved instance representation. Experimental results demonstrate that training ALSC models with RCL leads to substantial performance improvements.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Ying Hu, Yanping Chen, Ruizhang Huang, Yongbin Qin, Qinghua Zheng
Summary: Biomedical relation extraction aims to extract the interactive relations between biomedical entities in a sentence. This study proposes a hierarchical convolutional model to address the semantic overlapping and data imbalance problems. The model encodes both local contextual features and global semantic dependencies, enhancing the discriminability of the neural network for biomedical relation extraction.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Zhou Yang, Yucai Pang, Xuehong Li, Qian Li, Shihong Wei, Rong Wang, Yunpeng Xiao
Summary: This study proposes a rumor detection model based on topic audiolization, which transforms the topic space into audio-like signals. Experimental results show that the model achieves significant performance improvements in rumor identification.
INFORMATION PROCESSING & MANAGEMENT
(2024)
Article
Computer Science, Information Systems
Alistair Moffat
Summary: This paper proposes the buying power metric for assessing the quality of product rankings on e-commerce sites. It discusses the relationship between the buying power metric and user reactions, and introduces an alternative product ranking effectiveness metric.
INFORMATION PROCESSING & MANAGEMENT
(2024)