4.1 Article

Measuring and analyzing code authorship in 1+118 open source projects

Journal

SCIENCE OF COMPUTER PROGRAMMING
Volume 176, Issue -, Pages 14-32

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.scico.2019.03.001

Keywords

Code authorship; Linux kernel; Developer networks

Funding

  1. FAPEMIG [PPM-00803-15]
  2. CAPES [88881.131987/2016-01]
  3. CNPq [306554/2015-1, 140205/2017-9]

Ask authors/readers for more resources

Code authorship is a key information about large-scale software projects. Among others, it reveals the division of work, key collaborators, and developers' profiles. Seeking to better understand authorship in large and successful open source communities, we take the Linux kernel as our first case study. In total, we analyze authorship across 66 stable releases. Our analysis is centered around the Degree-of-Authorship (DOA) metric, which accounts for first authorship events (file creation), as well as further code changes. Authorship along the Linux kernel evolution reveals that (a) only a small portion of developers (26%) makes significant contributions to the code base; this ratio is almost constant during the Linux kernel evolution; (b) the number of files per author is highly skewed-a small group of top-authors (2%) is responsible for hundreds of files, while most authors (75%) are responsible for at most 10 files: (c) most authors in Linux (76%) are specialists and the relation between specialists and generalists tends to be constant; (d) authors with a high number of co-authorship connections tend to work with authors with fewer connections. Furthermore, we replicate the study in an extended dataset, composed of 118 well-known GitHub projects. We identify that most of the authorship patterns observed in the Linux kernel are also common to other open source projects. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Software Engineering

You broke my code: understanding the motivations for breaking changes in APIs

Aline Brito, Marco Tulio Valente, Laerte Xavier, Andre Hora

EMPIRICAL SOFTWARE ENGINEERING (2020)

Article Computer Science, Software Engineering

What are the characteristics of popular APIs? A large-scale study on Java, Android, and 165 libraries

Caroline Lima, Andre Hora

SOFTWARE QUALITY JOURNAL (2020)

Article Computer Science, Software Engineering

Characteristics of method extractions in Java: a large scale empirical study

Andre Hora, Romain Robbes

EMPIRICAL SOFTWARE ENGINEERING (2020)

Article Computer Science, Software Engineering

APISonar: Mining API usage examples

Andre Hora

Summary: Developers often search for code examples on the web, manually created or automatically mined from code repositories. Current solutions for automatic mining of API usage examples still have limitations, such as poor quality and duplication. In this article, a new approach called APISonar is proposed to provide readable and reusable API examples. Evaluation shows that APISonar outperforms popular programming websites in terms of quality and has attracted a significant user base globally within a short period of time.

SOFTWARE-PRACTICE & EXPERIENCE (2021)

Article Computer Science, Software Engineering

Characterizing refactoring graphs in Java and JavaScript projects

Aline Brito, Andre Hora, Marco Tulio Valente

Summary: Refactoring is an essential activity in software evolution to improve source code maintainability and quality. The study of refactoring graphs provides quantitative and qualitative investigation into the size, commits, age, composition, ownership, operations, and patterns of refactorings. It can be used to improve code comprehension, detect refactoring patterns, and support software evolution studies.

EMPIRICAL SOFTWARE ENGINEERING (2021)

Article Computer Science, Software Engineering

Characterizing top ranked code examples in Google

Andre Hora

Summary: In this study, it was found that Google search engine tends to rank pages with multiple code examples higher. However, single code examples that are higher ranked are not necessarily more readable and reusable. Predicting top ranked examples, generic factors are more important than code quality factors.

JOURNAL OF SYSTEMS AND SOFTWARE (2021)

Article Computer Science, Software Engineering

How and why we end up with complex methods: a multi-language study

Mateus Lopes, Andre Hora

Summary: As software systems become more complex and harder to maintain over time, it is important to understand the reasons behind the persistence of complex methods despite the known drawbacks. This paper provides a multi-language empirical study on the evolution of complex methods and developers' perceptions in JavaScript, Python, Java, C++, and C#. The study finds that programming language plays a significant role in code complexity, and developers' perception of complexity varies across languages. Additionally, the authors discuss insights for researchers and practitioners based on their findings.

EMPIRICAL SOFTWARE ENGINEERING (2022)

Article Computer Science, Software Engineering

JavaScript API Deprecation Landscape: A Survey and Mining Study

Romulo Nascimento, Eduardo Figueiredo, Andre Hora

Summary: This article reports the results of a survey and mining study on JavaScript developers and projects, revealing several solutions for deprecating JavaScript APIs but no standard solution.

IEEE SOFTWARE (2022)

Article Computer Science, Software Engineering

How are framework code samples maintained and used by developers? The case of Android and Spring Boot

Gabriel Menezes, Bruno Cafeo, Andre Hora

Summary: This study analyzes the characteristics, maintenance, and usage of framework code samples in modern software systems. Most code samples are small and simple, providing a working environment for clients and relying on automated build tools. Clients commonly fork code samples, but rarely modify them.

JOURNAL OF SYSTEMS AND SOFTWARE (2022)

Article Computer Science, Software Engineering

How do developers collaborate? Investigating GitHub heterogeneous networks

Gabriel P. Oliveira, Ana Flavia C. Moura, Natercia A. Batista, Michele A. Brandao, Andre Hora, Mirella M. Moro

Summary: Assessing collaboration among GitHub developers through social networks, this study models three aspects: social collaboration, collaboration time in a repository, and technical features. The results indicate that the considered metrics are not correlated, providing new insights into collaboration. The information gathered is beneficial for social developer ranking.

SOFTWARE QUALITY JOURNAL (2023)

Article Computer Science, Software Engineering

Excluding code from test coverage: practices, motivations, and impact

Andre Hora

Summary: Test coverage measures the percentage of code covered by tests. Some code, such as non-runnable, debug-only, defensive, platform-specific, and conditional importing code, tends to be excluded from coverage analysis. Excluding code can decrease test coverage, but following code exclusion recommendations can improve coverage.

EMPIRICAL SOFTWARE ENGINEERING (2023)

Proceedings Paper Computer Science, Software Engineering

Exploring API Deprecation Evolution in JavaScript

Romulo Nascimento, Andre Hora, Eduardo Figueiredo

Summary: This paper presents an empirical study on how API deprecation evolves in JavaScript, analyzing 1,918 releases of 50 popular packages. The results show that the majority of deprecated APIs have an increasing trend, and deprecation events usually occur in minor releases.

2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022) (2022)

Proceedings Paper Computer Science, Software Engineering

How and Why Developers Migrate Python Tests

Livia Barbosa, Andre Hora

Summary: This paper presents the first empirical study on testing framework migration, specifically focusing on the migration from unittest to pytest in the Python ecosystem. The study analyzes the methods and reasons behind developers' migration to pytest, finding that Python projects are moving towards pytest but the migration process is not always straightforward. The study also reveals that the migrated test code is smaller than the original code, and developers migrate to pytest due to reasons such as easier syntax, interoperability, maintenance, and fixture flexibility/reuse, although concerns exist about pytest's implicit mechanics.

2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Characterizing High-Quality Test Methods: A First Empirical Study

Victor Veloso, Andre Hora

Summary: This paper proposes an empirical study that assesses the quality of test methods using mutation testing at the method level. The study finds that there are no major differences between high-quality and low-quality test methods in terms of size, number of asserts, and modifications. However, high-quality test methods are less affected by critical test smells.

2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022) (2022)

Proceedings Paper Computer Science, Software Engineering

Assessing Mock Classes: An Empirical Study

Gustavo Pereira, Andre Hora

2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020) (2020)

Article Computer Science, Software Engineering

A formal approach for the correct deployment of cloud applications

Amel Mammar, Meriem Belguidoum, Saddam Hocine Hiba

Summary: This paper introduces a formal EVENT-B-based approach for modeling and verifying the deployment of component-based applications. By gradually refining an abstract model, a precise specification is built, and mathematical reasoning is used to prove its correctness. The presented approach validates the deployment in a cloud environment using PROB and ensures the construction of a correct system that meets the constraints.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Enhancing test reuse with GUI events deduplication and adaptive semantic matching

Shuqi Liu, Yu Zhou, Longbing Ji, Tingting Han, Taolue Chen

Summary: In this paper, we propose a framework that combines GUI events deduplication with an adaptive semantic matching strategy to enhance the usability of reused tests. Experimental evaluation demonstrates that the framework improves widget mapping performance, significantly reduces event redundancy, and reduces the manual effort of creating tests for similar applications.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

A method of test case set generation in the commutativity test of reduce functions

Xiangyu Mu, Lei Liu, Peng Zhang, Jingyao Li, Hui Li

Summary: The aim of this study is to reduce the size of the test case set required to detect the commutativity problem of the reduce function. By determining the pattern of the function and selecting corresponding test cases, the proposed test case generation strategy can achieve the same accuracy with a smaller test case set. It has been shown to be effective and has a high recall rate.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

An industrial experience report on model-based, AI-enabled proposal development for an RFP/RFI

Padmalata Nistala, Asha Rajbhoj, Vinay Kulkarni, Sapphire Noronha, Ankit Joshi

Summary: This paper presents an automated proposal development approach using a combination of model-based and AI-enabled techniques, and discusses the successful deployment and user feedback of the system.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Translation certification for smart contracts

Jacco O. G. Krijnen, Manuel M. T. Chakravarty, Gabriele Keller, Wouter Swierstra

Summary: Compiler correctness is a long-standing problem, and it becomes more significant with the rise of smart contracts on blockchains. A translation certification framework can address the trust issue for low-level code on the blockchain, allowing users to have confidence in the compilation process of smart contracts.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

OnTrack: Reflecting on domain specific formal methods for railway designs

Phillip James, Faron Moller, Filippos Pantekis

Summary: OnTrack is a tool that supports railway verification workflows using model driven engineering frameworks, allowing railway engineers to interact with verification procedures through encapsulating formal methods.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Generating C: Heterogeneous metaprogramming system description

Oleg Kiselyov

Summary: Heterogeneous metaprogramming systems leverage higher-level host languages to generate lower-level object language code, enabling faster production of high-performant code with correctness guarantees. This paper presents two systems with OCaml as the host language and C as the object language, discussing their implementation and applications.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Reasoning about logical systems in the Coq proof assistant

Conor Reynolds, Rosemary Monahan

Summary: This paper provides a detailed approach to formalize a fragment of the theory of institutions in the Coq proof assistant. The approach is illustrated and evaluated by instantiating the framework with specific institution examples.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Stochastic formal model of PI3K/mTOR pathway in Alzheimer's disease for drug repurposing: An evaluation of rapamycin, LY294002, and NVP-BEZ235

Herbert Rausch Fernandes, Giovanni Freitas Gomes, Antonio Carlos Pinheiro de Oliveira, Sergio Vale Aguiar Campos

Summary: Alzheimer's disease is a common form of dementia with no effective drug treatment available. In this study, a statistical model checking approach was used to analyze protein and drug interactions and evaluate the effects of different drugs on the components contributing to Alzheimer's disease. The results showed that rapamycin could slow down the biological process causing neuronal death, while LY294002 and NVP-BEZ235 may increase tau phosphorylation. These findings provide important insights for the scientific community and raise awareness about potential side effects of PI3K inhibitor drugs.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Denotational and operational semantics for interaction languages: Application to trace analysis

Erwan Mahe, Christophe Gaston, Pascale Le Gall

Summary: This paper presents an Interaction Language to encode Sequence Diagrams (SD) and associates it with three different formal semantics. This allows for direct formal verification of SD, while preserving traceability of SD concepts and executed actions, and addressing the translation of problematic operators.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

DescribeML: A dataset description tool for machine learning

Joan Giner-Miguelez, Abel Gomez, Jordi Cabot

Summary: Datasets are crucial for training and evaluating machine learning models, but they can also lead to undesirable behaviors like biased predictions. To tackle this issue, the machine learning community suggests adopting consistent guidelines for dataset descriptions. However, these guidelines rely on natural language descriptions, which hinder automated computation and analysis. To overcome this, we present DescribeML, a language engineering tool that provides precise, structured descriptions of machine learning datasets, including their composition, provenance, and social concerns.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

An iterative approach for model-based requirements engineering in large collaborative projects: A detailed experience report

Andrey Sadovykh, Bilal Said, Dragos Truscan, Hugo Bruneliere

Summary: In this paper, the authors report on their 7 years of practical experience with an iterative Model-based Requirements Engineering (MBRE) approach and language in five large European collaborative projects. They demonstrate through significant data sets that this model-based approach provides interesting benefits in terms of scalability, heterogeneity, adaptability, traceability, automation, consistency and quality, and usefulness or usability. Concrete examples from these projects are provided to illustrate the application of the MBRE approach and language, and the authors discuss the general benefits and limitations of using such an approach, as well as the lessons learned over the years.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

Exploring complex models with picto web

Alfa Yohannis, Dimitris Kolovos, Antonio Garcia-Dominguez

Summary: Picto Web is a multi-tenant web-based tool that allows exploration of complex models by transforming them into various transient web-based views using rule-based transformations. It uses a lazy view computation approach to efficiently support large models and complex transformations, and includes monitoring and push notification facilities for automatic recomputation of views and updated delivery to clients.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

GaMoVR: Gamification-based UML learning environment in virtual reality

Enes Yigitbas, Maximilian Schmidt, Antonio Bucchiarone, Sebastian Gottschalk, Gregor Engels

Summary: UML has become a popular modeling language used in computer science courses, and various interactive learning applications have been developed to improve student engagement and learning outcomes. However, these applications have not successfully created immersive environments for students. Therefore, this study introduces GaMoVR, a VR-based and gamified learning environment, which provides an interactive and fun learning experience for students learning about UML modeling.

SCIENCE OF COMPUTER PROGRAMMING (2024)

Article Computer Science, Software Engineering

How accessibility affects other quality attributes of software? A case study of GitHub

Yaxin Zhao, Lina Gong, Wenhua Yang, Yu Zhou

Summary: Accessible design aims to enable as many people as possible to access software products and services. This study investigates the interaction between accessibility issues and other factors affecting software performance. By analyzing a large number of accessibility issues, the study reveals the characteristics of these issues and their relationship with software quality attributes.

SCIENCE OF COMPUTER PROGRAMMING (2024)