☆ 4.1 Article

Robust semantic text similarity using LSA, machine learning, and linguistic resources

LANGUAGE RESOURCES AND EVALUATION (2016)

期刊

LANGUAGE RESOURCES AND EVALUATION

卷 50, 期 1, 页码 125-161

出版社

SPRINGER

DOI: 10.1007/s10579-015-9319-2

关键词

Latent semantic analysis; WordNet; Term alignment; Semantic similarity

类别

Computer Science, Interdisciplinary Applications

资金

US National Science Foundation [1228198, 1250627, 0910838]
Direct For Computer & Info Scie & Enginr
Division Of Computer and Network Systems [1228673] Funding Source: National Science Foundation
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [1250627, 0910838] Funding Source: National Science Foundation
Division Of Computer and Network Systems
Direct For Computer & Info Scie & Enginr [1228198] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

Reagent

摘要

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines latent semantic analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014 task on Cross-Level Semantic Similarity, we ranked first in Sentence-Phrase, Phrase-Word, and Word-Sense subtasks and second in the Paragraph-Sentence subtask.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Information Systems

Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia

Muhammad Jawad Hussain, Heming Bai, Shahbaz Hassan Wasti, Guangjian Huang, Yuncheng Jiang

Summary: This paper proposes a comprehensive method for semantic similarity and relatedness based on WordNet and Wikipedia. By integrating the semantic knowledge of both resources at the feature level, the proposed method combines semantic similarity and relatedness into a single measure. Experimental results demonstrate its effectiveness over existing measures on various benchmarks.

INFORMATION SCIENCES (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Multi-knowledge resources-based semantic similarity models with application for movie recommender system

Guangjian Huang, Xingtu Zhu, Shahbaz Hassan Wasti, Yuncheng Jiang

Summary: Researchers have proposed feature-based methods using knowledge resources like Wikipedia and WordNet to measure semantic similarity. While Wikipedia has limitations such as limited content and concept ambiguity, WordNet offers unambiguous terms and can enrich the limited content of Wikipedia articles. Combining both resources can enhance previous methods of semantic similarity.

ARTIFICIAL INTELLIGENCE REVIEW (2023)

添加到收藏夹

Article Psychology, Mathematical

Calculating semantic relatedness of lists of nouns using WordNet path length

Tyler M. Ensor, Molly B. MacMillan, Ian Neath, Aimee M. Surprenant

Summary: The study conducted three experiments to evaluate various measures of semantic relatedness and their ability to predict the recall of related and unrelated word lists in immediate memory tests. The results showed that lists of semantically related words are better recalled than lists of unrelated words. Different measures had slightly different predictions on the recall of related and unrelated word lists.

BEHAVIOR RESEARCH METHODS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Extending latent semantic analysis to manage its syntactic blindness

Raja Muhammad Suleman, Ioannis Korkontzelos

Summary: Natural Language Processing (NLP) is a sub-field of Artificial Intelligence focused on automatically understanding and analyzing human language. Semantic analysis, such as Latent Semantic Analysis (LSA), is important in extracting meaning from text. However, LSA has limitations, such as syntactic blindness, in accurately distinguishing between sentences with similar words but opposite meanings.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Exploiting non-taxonomic relations for measuring semantic similarity and relatedness in WordNet

Mohannad AlMousa, Rachid Benlamri, Richard Khoury

Summary: This paper discusses the benefits of using all types of non-taxonomic relations to enhance semantic similarity measures, proposing a comprehensive poly-relational approach. Experimental results show significant improvements over existing methods on various gold standard datasets.

KNOWLEDGE-BASED SYSTEMS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A new ontology-based similarity approach for measuring caching coverages provided by mediation systems

Ouafa Ajarroud, Ahmed Zellou, Ali Idri

Summary: Most mediation systems use caching policies, with semantic caching being a widely adopted strategy. However, the current semantic caching approach compares syntax rather than semantics, leading to delays when multiple requests are stored in the cache. This work proposes a new ontology-based semantic approach and algorithm to filter regions in the cache that do not semantically cover a user query, optimizing cache usage for faster retrieval.

KNOWLEDGE AND INFORMATION SYSTEMS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A novel model for semantic similarity measurement based on wordnet and word embedding

Fuqiang Zhao, Zhengyu Zhu, Ping Han

Summary: A novel model DFRVec is proposed in this paper to encode multiple semantic information of a word in WordNet into a vector space for measuring semantic similarity between words. By combining different sub-models with existing word embedding, a new method DFRVec+Path is introduced to utilize path information in WordNet for semantic similarity measurement, which outperforms many existing methods in experiments on ten benchmark datasets.

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS (2021)

添加到收藏夹

Article Computer Science, Interdisciplinary Applications

Validation of scientific topic models using graph analysis and corpus metadata

Manuel A. Vazquez, Jorge Pereira-Delgado, Jesus Cid-Sueiro, Jeronimo Arenas-Garcia

Summary: Probabilistic topic modeling algorithms, such as Latent Dirichlet Allocation (LDA), have become powerful tools in the analysis of large collections of documents. However, selecting the right hyperparameters for a specific application is not easy. This study proposes two graph metrics to optimize the similarity metrics derived from the topic model, aiming to select appropriate hyperparameters. Experimental results on various corpora related to science, technology, and innovation (STI) show that these metrics provide relevant indicators for selecting the number of topics and building persistent topic models consistent with the metadata. This approach can be extended beyond LDA and facilitate the systematic adoption of similar techniques in STI policy analysis and design.

SCIENTOMETRICS (2022)

添加到收藏夹

Article Computer Science, Information Systems

Mapping WordNet onto human brain connectome in emotion processing and semantic similarity recognition

Jan Kocon, Marek Maziarz

Summary: In this study, a WordNet structure was expanded to link synsets to Desikan's brain regions, mapping from synset semantic categories to behavioral and cognitive functions to brain lobes. Transition probabilities between brain regions were captured using a human brain connectome (HBC) adjacency matrix. The new structure was evaluated in tasks related to semantic similarity and emotion processing, showing that the novel HBC vector representation outperformed proposed baselines.

INFORMATION PROCESSING & MANAGEMENT (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

An enhanced guided LDA model augmented with BERT based semantic strength for aspect term extraction in sentiment analysis

Manju Venugopalan, Deepa Gupta

Summary: Aspect level sentiment analysis is a fine-grained task that extracts aspects and their sentiment polarity from opinionated text. This research proposes an unsupervised model that uses minimal aspect seed words to guide the extraction process and enhance the performance. The model incorporates guided inputs, multiple pruning strategies, and semantic filters to improve performance. Evaluation results show competitive and appreciable performance on restaurant domain datasets.

KNOWLEDGE-BASED SYSTEMS (2022)

添加到收藏夹

Article Biochemical Research Methods

HPOFiller: identifying missing protein-phenotype associations by graph convolutional network

Lizhi Liu, Hiroshi Mamitsuka, Shanfeng Zhu

Summary: Exploring the relationship between human proteins and abnormal phenotypes is crucial for disease prevention, diagnosis and treatment. HPOFiller, a graph convolutional network-based approach, aims to predict missing HPO annotations and outperforms other state-of-the-art methods through stringent evaluations.

BIOINFORMATICS (2021)

添加到收藏夹

Article Medical Ethics

Weighted semantic plagiarism detection approach based on AHP decision model

SeyyedMohammad JavadiMoghaddam, Fatemeh Roosta, Asadolla Noroozi

Summary: The rising trend of academic plagiarism poses a social problem that requires collaboration between institutions and publishers. Plagiarists attempt to deceive detection systems by using synonyms and altering word order, prompting algorithmic efforts to address these challenges, particularly in terms of time complexity.

ACCOUNTABILITY IN RESEARCH-POLICIES AND QUALITY ASSURANCE (2022)

添加到收藏夹

Article Computer Science, Information Systems

Content analysis-based documentation and exploration of research articles

Shwe Sin Phyo

Summary: With the wealth of information available on the World Wide Web, it is difficult for anyone from a general user to the researcher to easily fulfill their information need. The main challenge is to categorize the documents systematically and also take into account more valuable data such as semantic information. The purpose of this paper is to develop a concept-based search system that leverages the external knowledge resources as the background knowledge for getting the accurate and efficient meaningful search results.

DATA TECHNOLOGIES AND APPLICATIONS (2022)

添加到收藏夹

Article Biochemical Research Methods

Evaluating disease similarity based on gene network reconstruction and representation

Yang Li, Wang Keqi, Guohua Wang

Summary: The article introduces a novel approach to compute disease similarity by integrating disease-related genes and gene ontology hierarchy to learn disease representation based on deep representation learning. In the experiments, the AUC value of this method is 0.8074, improving the most competitive baseline method by 10.1%.

BIOINFORMATICS (2021)

添加到收藏夹

Article Chemistry, Multidisciplinary

Deep Semantic Parsing with Upper Ontologies

Algirdas Laukaitis, Egidijus Ostasius, Darius Plikynas

Summary: This paper introduces a new method for semantic parsing with upper ontologies using FrameNet annotations and BERT-based sentence context distributed representations, designed for long text parsing. A manually annotated corpus is created as a benchmark for future studies in semantic parsing.

APPLIED SCIENCES-BASEL (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Multi-qubit correction for quantum annealers

Ramin Ayanzadeh, John Dorband, Milton Halem, Tim Finin

Summary: MQC is a novel postprocessing method for quantum annealers that views the evolution in an open-system as a Gibbs sampler, reducing excited states to new synthetic states with lower energy value. Experimental results show that MQC finds samples with notably lower energy values and improves reproducibility compared to recent hardware/software advances in quantum annealing, such as spin-reversal transforms and classical postprocessing techniques.

SCIENTIFIC REPORTS (2021)

添加到收藏夹

Article Computer Science, Information Systems

The SEMIOTIC Ecosystem: A Semantic Bridge between IoT Devices and Smart Spaces

Roberto Yus, Georgios Bouloukakis, Sharad Mehrotra, Nalini Venkatasubramanian

Summary: Smart space administration and application development face challenges due to the semantic gap between user requirements and IoT device capabilities. The SEMIOTIC ecosystem provides a holistic approach to IoT smart spaces, enabling application development, space management, and service provision. Using a centralized repository and the SEMIOTIC system deployed in each smart space, developers can advertise their applications and interact with them to provide required information, improving reusability and bridging the semantic gap.

ACM TRANSACTIONS ON INTERNET TECHNOLOGY (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

One-Shot Federated Group Collaborative Filtering

Maksim E. Eren, Manish Bhattarai, Nicholas Solovyev, Luke E. Richards, Roberto Yus, Charles Nicholas, Boian S. Alexandrov

Summary: This paper presents the first one-shot federated CF implementation, called One-FedCF, to address the privacy problem and communication bottleneck in collaborative filtering. In this approach, clients first apply local CF in parallel to build independent recommenders, then extract global item patterns through joint factorization and build local models through information retrieval transfer.

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Knowledge Guided Two-player Reinforcement Learning for Cyber Attacks and Defenses

Aritran Piplai, Mike Anoruo, Kayode Fasaye, Anupam Joshi, Tim Finin, Ahmad Ridley

Summary: Cyber defense exercises are crucial for understanding the technical capacity of organizations in facing cyber-threats and discovering unknown vulnerabilities for better defense mechanisms. This paper introduces a two-player game-based reinforcement learning environment that improves the performance of both attacker and defender agents. The convergence of the agents is accelerated through expert knowledge from Cybersecurity Knowledge Graphs.

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA (2022)

添加到收藏夹

Article Computer Science, Information Systems

JENNER: Just-in-time Enrichment in Query Processing

Dhrubajyoti Ghosh, Peeyush Gupta, Sharad Mehrotra, Roberto Yus, Yasser Altowim

Summary: This study introduces a strategy called JENNER for interactive analytics over incoming data. JENNER progressively improves query answers by exploiting the tradeoffs between cost and quality. Experimental results show that JENNER performs significantly better than naive strategies.

PROCEEDINGS OF THE VLDB ENDOWMENT (2022)

添加到收藏夹

Proceedings Paper Geosciences, Multidisciplinary

QUANTUM-ASSISTED GREEDY ALGORITHMS

Ramin Ayanzadeh, John Dorband, Milton Halem, Tim Finin

Summary: This paper demonstrates how to improve candidate selection in greedy algorithms by leveraging quantum annealers (QAs). By sampling from the ground state of a problem-dependent Hamiltonian using QAs and estimating the probability distribution of problem variables, the proposed quantum-assisted greedy algorithm (QAGA) scheme outperforms state-of-the-art techniques in quantum annealing.

2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022) (2022)

添加到收藏夹

Review Computer Science, Information Systems

Computational Understanding of Narratives: A Survey

Priyanka Ranade, Sanorita Dey, Anupam Joshi, Tim Finin

Summary: Storytelling and the delivery of societal narratives are important for human communication, connection, and understanding. In today's digital age, narratives are conveyed through online mediums such as social media. This shift has made narratives more fragmented and complex, with the potential to influence cultural sentiments, geopolitical events, and more. Therefore, narratives are being used strategically to shape events and promote ideologies. It is crucial to identify and analyze these narratives in order to understand their themes and intentions.

IEEE ACCESS (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Jointly Identifying and Fixing Inconsistent Readings from Information Extraction Systems

Ankur Padia, Francis Ferraro, Tim Finin

Summary: This paper investigates the problem of errors in information extraction systems' outputs and explores methods to detect and correct these errors. The authors contrast consistency with credibility, define and explore consistency and repair tasks, and present a simple yet effective model. Evaluation on three datasets shows consistent improvement in both consistency and repair using a simple MLP model with attention and lexical features.

PROCEEDINGS OF DEEP LEARNING INSIDE OUT (DEELIO 2022): THE 3RD WORKSHOP ON KNOWLEDGE EXTRACTION AND INTEGRATION FOR DEEP LEARNING ARCHITECTURES (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

CAPD: A Context-Aware, Policy-Driven Framework for Secure and Resilient IoBT Operations

Sai Sree Laya Chukkapalli, Anupam Joshi, Tim Finin, Robert F. Erbacher

Summary: The Internet of Battlefield Things (IoBT) enhances the operational effectiveness of infantry units by enabling collaboration, secure information sharing, and resilience to attacks. CAPD provides a framework for data and knowledge exchange among autonomous entities, with an IoBT ontology that facilitates controlled information sharing. It enables situational awareness and mitigation of adversary actions, ensuring the resilience of IoBT systems in contested conditions.

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

SmartSPEC: Customizable Smart Space Datasets via Event-driven Simulations

Andrew Chio, Daokun Jiang, Peeyush Gupta, Georgios Bouloukakis, Roberto Yus, Sharad Mehrotra, Nalini Venkatasubramanian

Summary: This paper presents SmartSPEC, an approach to generate customizable smart space datasets using sensorized spaces. It creates a digital representation of a smart space and generates realistic simulated data. The evaluation results show that the trajectories produced by SmartSPEC are more realistic than synthetic data.

2022 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

CyBERT: Contextualized Embeddings for the Cybersecurity Domain

Priyanka Ranade, Aritran Piplai, Anupam Joshi, Tim Finin

Summary: CyBERT is a domain-specific BERT model fine-tuned with cybersecurity data, providing high accuracy in performing cybersecurity tasks and offering use-cases in the field of cybersecurity.

2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Generating Fake Cyber Threat Intelligence Using Transformer-Based Models

Priyanka Ranade, Aritran Piplai, Sudip Mittal, Anupam Joshi, Tim Finin

Summary: This paper demonstrates the automatic generation of fake CTI text descriptions using transformers for data poisoning attacks. The attacks result in negative impacts such as incorrect reasoning outputs and disruption of AI-based cyber defense systems.

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) (2021)

添加到收藏夹

Article Computer Science, Information Systems

A BERT Based Approach to Measure Web Services Policies Compliance With GDPR

Lavanya Elluri, Sai Sree Laya Chukkapalli, Karuna Pande Joshi, Tim Finin, Anupam Joshi

Summary: Data confidentiality is increasingly important, with authorities creating new laws to control how web services data is handled. Web service providers face challenges in complying with evolving regulations across jurisdictions and must update their policies. Comparing web service provider privacy policies with regulatory policies is difficult due to the large and complex nature of regulatory texts.

IEEE ACCESS (2021)

添加到收藏夹

Article Computer Science, Information Systems

Understanding Cybersecurity Threat Trends Through Dynamic Topic Modeling

Jennifer Sleeman, Tim Finin, Milton Halem

Summary: Cybersecurity threats are on the rise and understanding the changing vulnerabilities can help combat new threats. Analyzing cybersecurity document collections through dynamic topic modeling reveals the importance of evolving concepts. Integrating different temporal corpora and representing data in a semantic knowledge graph supports integration, inference, and discovery, enhancing the quality of models.

FRONTIERS IN BIG DATA (2021)

添加到收藏夹

Article Urban Studies

Managing cybersecurity at the grassroots: Evidence from the first nationwide survey of local government cybersecurity

Donald F. Norris, Laura Mateczun, Anupam Joshi, Tim Finin

Summary: This paper examines the management of cybersecurity among local governments in the United States based on the first nationwide survey. The study shows that local governments are largely failing to effectively manage cybersecurity, despite the increasing importance of this function due to constant cyberattacks. Recommendations for improving local government cybersecurity management are provided.

JOURNAL OF URBAN AFFAIRS (2021)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.