☆ 4.1 Article

Authorship attribution in the wild

LANGUAGE RESOURCES AND EVALUATION (2011)

期刊

LANGUAGE RESOURCES AND EVALUATION

卷 45, 期 1, 页码 83-94

出版社

SPRINGER

DOI: 10.1007/s10579-009-9111-2

关键词

Authorship attribution; Open candidate set; Randomized feature set

类别

Computer Science, Interdisciplinary Applications

向作者/读者索取更多资源

Protocol

Reagent

摘要

Most previous work on authorship attribution has focused on the case in which we need to attribute an anonymous document to one of a small set of candidate authors. In this paper, we consider authorship attribution as found in the wild: the set of known candidates is extremely large (possibly many thousands) and might not even include the actual author. Moreover, the known texts and the anonymous texts might be of limited length. We show that even in these difficult cases, we can use similarity-based methods along with multiple randomized feature sets to achieve high precision. Moreover, we show the precise relationship between attribution precision and four parameters: the size of the candidate set, the quantity of known-text by the candidates, the length of the anonymous text and a certain robustness score associated with a attribution.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Exploring syntactic and semantic features for authorship attribution

Haiyan Wu, Zhiqiang Zhang, Qingfeng Wu

Summary: This paper discusses the importance of authorship attribution and the limitations of existing methods, proposing a novel approach that combines features from multiple dimensions, with experimental results demonstrating its effectiveness compared to state-of-the-art models.

APPLIED SOFT COMPUTING (2021)

添加到收藏夹

Article Computer Science, Information Systems

Machine Learning and Feature Selection for Authorship Attribution: The Case of Mill, Taylor Mill and Taylor, in the Nineteenth Century

Andreas Neocleous, Antis Loizides

Summary: This article revisits a divisive issue regarding the authorship of John Stuart Mill's corpus, analyzing experts' differing opinions and the research team's methods and experimental results. By training classifiers, disputed texts are attributed to John Stuart Mill.

IEEE ACCESS (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Word synonym relationships for text analysis: A graph-based approach

Hend Alrasheed

Summary: Keyword extraction involves detecting the most relevant terms and expressions in text. Using graph analysis tools for keyword extraction to assess topic diversity and sentiment within the text.

PLOS ONE (2021)

添加到收藏夹

Article Computer Science, Information Systems

Multidimensional Domain Knowledge Framework for Poet Profiling

Ai Zhou, Yijia Zhang, Mingyu Lu

Summary: This study proposes an approach to analyze the authorship of classical Chinese poetry, by evaluating the popularity of poets and building a public corpus for authorship profiling. A novel framework named M-DKPP is proposed, which combines authorship attribution knowledge, text's stylistic features, and domain knowledge from experts in traditional poetry studies. The validity and applicability of the framework are demonstrated through a case study on Li Bai, and its performance is evaluated on four poem datasets, outperforming several baseline approaches for authorship attribution.

ELECTRONICS (2023)

添加到收藏夹

Article Computer Science, Theory & Methods

Psychographic traits identification based on political ideology: An author analysis study on Spanish politicians' tweets posted in 2020

Jose Antonio Garcia-Diaz, Ricardo Colomo-Palacios, Rafael Valencia-Garcia

Summary: This study investigates the reliability of determining psychographic traits concerning political ideology and presents the PoliCorpus-2020 dataset for authorship analysis tasks. The results show that linguistic features are effective indicators for identifying political affiliation and improve the performance of neural network models.

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Open-Set Adversarial Defense with Clean-Adversarial Mutual Learning

Rui Shao, Pramuditha Perera, Pong C. Yuen, Vishal M. Patel

Summary: This paper studies two critical aspects of deep learning: open-set recognition and adversarial defense. It finds that open-set recognition systems are vulnerable to adversarial samples, and adversarial defense mechanisms trained on known classes are ineffective for open-set samples. Based on these findings, the paper proposes an Open-Set Defense Network with Clean-Adversarial Mutual Learning (OSDN-CAML) to address this problem.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2022)

添加到收藏夹

Review Multidisciplinary Sciences

Cracking double-blind review: Authorship attribution with deep learning

Leonard Bauersfeld, Angel Romero, Manasi Muglikar, Davide Scaramuzza

Summary: In this study, a transformer-based neural network architecture is proposed to attribute an anonymous manuscript to an author using only the text content and author names. The largest authorship-identification dataset to date was created by leveraging over 2 million publicly available research papers on arXiv. The method achieves an unprecedented authorship attribution accuracy, correctly attributing up to 73% of papers in subsets with up to 2,000 different authors.

PLOS ONE (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Authorship attribution using author profiling classifiers

Caio Deutsch, Ivandre Paraboni

Summary: Authorship attribution and author profiling are two related fields that can benefit from each other. This paper improves the authorship attribution model by adding author demographics predictions, and evaluates the enriched model in different domains and languages, showing better performance compared to the standard method.

NATURAL LANGUAGE ENGINEERING (2023)

添加到收藏夹

Article Mathematics

Enhancing the Performance of Software Authorship Attribution Using an Ensemble of Deep Autoencoders

Gabriela Czibula, Mihaiela Lupea, Anamaria Briciu

Summary: This paper discusses the problem of code authorship attribution and introduces the AutoSoft model for identifying developers based on their programming style. The model, built using autoencoders, shows superior performance in various test settings compared to existing solutions. AutoSoft not only outperforms other methods in code authorship attribution, but also offers adaptability and extensions.

MATHEMATICS (2022)

添加到收藏夹

Article Medicine, Legal

Verifying authorship for forensic purposes: A computational protocol and its validation

Patrick Juola

Summary: The paper introduces a computer program to identify the author of anonymous or disputed documents, and validates its accuracy through a series of controlled experiments involving English language blogs. The system achieved a measured accuracy of 77% across over 32,000 different document pairs, providing a solution to a key problem in forensic linguistics.

FORENSIC SCIENCE INTERNATIONAL (2021)

添加到收藏夹

Article Computer Science, Information Systems

Text Mining in 19th-Century Essays for Investigating a Possible Collaborative Authorship Problem: John Stuart Mill and Harriet Taylor Mill

Andreas Neocleous, Giorgos Kataliakos, Antis Loizides

Summary: This study uses machine learning techniques to investigate the authorship of two famous essays in the nineteenth century. The classifiers trained in this research show that John Stuart Mill is the primary author of the essays, but also highlight the contribution of Harriet Taylor Mill to certain portions of text.

IEEE ACCESS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Survey of Authorship Identification Tasks on Arabic Texts

Fatimah Alqahtani, Mischa Dohler

Summary: Authorship identification is the process of analyzing writing styles to determine the author's identity, which is important in digital forensics and cyber investigations. This survey focuses on authorship identification in the Arabic language and reviews 27 prominent studies, considering data, features, methods, and results. The results vary based on features and datasets, and challenges are faced in data preprocessing due to the complexities of Arabic morphology.

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING (2023)

添加到收藏夹

Article Computer Science, Theory & Methods

Pkg2Vec: Hierarchical package embedding for code authorship attribution

Roni Mateless, Oren Tsur, Robert Moskovitch

Summary: This paper introduces a novel approach for software package authorship attribution called Pkg2Vec, based on a hierarchical deep neural network architecture, which better reflects real-world scenarios where code is organized in packages and written by teams. By utilizing a hierarchical neural network model and resilient features like keywords and API calls, Pkg2Vec outperforms other approaches in a large dataset of public packages.

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A novel self-learning feature selection approach based on feature attributions

Jianting Chen, Shuhan Yuan, Dongdong Lv, Yang Xiang

Summary: Feature selection plays a crucial role in improving the accuracy and generalization of machine learning models, especially for high-dimensional data tasks. In this study, a novel self-learning feature selection approach based on feature attributions was proposed, showing improved search efficiency for optimal feature subset selection. Experimental results demonstrated the effectiveness of the SLFS approach in achieving optimal subsets with fewer iterations and utilizing SHAP values for enhanced search efficiency.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

添加到收藏夹

Article Medicine, General & Internal

Mapping author taxonomies and author criteria: good practices for thinking through complex authorship situations

Lisa M. DeTora

Summary: Applying authorship criteria in complex situations can be challenging. Existing guidelines emphasize intellectual input and accountability, while contributor taxonomies list additional activities that should be credited. However, no publication has mapped specific authorship criteria to contributor taxonomies. Suggestions are needed to differentiate activities that meet author criteria from other contributions outlined in existing taxonomies.

CURRENT MEDICAL RESEARCH AND OPINION (2022)

添加到收藏夹

暂无数据

暂无数据

© Peeref 2019-2024. All rights reserved.