4.4 Article

Why and how developers fork what from whom in GitHub

Journal

EMPIRICAL SOFTWARE ENGINEERING
Volume 22, Issue 1, Pages 547-578

Publisher

SPRINGER
DOI: 10.1007/s10664-016-9436-6

Keywords

Fork; Open source software; GitHub

Funding

  1. National Natural Science Foundation of China [61300006]
  2. State Key Laboratory of Software Development Environment [SKLSDE-2015ZX-24]
  3. Beijing Natural Science Foundation [4163074]

Ask authors/readers for more resources

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer's preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Software Engineering

Code Structure-Guided Transformer for Source Code Summarization

Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lunyiu Nie, Xin Xia, Michael Lyu

Summary: Code summaries help developers understand programs and save time during software maintenance. Recent studies have used deep learning techniques, such as Transformer-based approaches, to generate accurate code summaries. However, integrating code structure information into Transformers effectively has been under-explored in this task domain.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

DEGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

Chen Zeng, Yue Yu, Shanshan Li, Xin Xia, Zhiming Wang, Mingyang Geng, Linxiao Bai, Wei Dong, Xiangke Liao

Summary: With the rapid increase of public code repositories, developers have a strong interest in retrieving precise code snippets using natural language. Existing deep learning-based approaches for code search in large-scale repositories still have low accuracy due to limitations in code representation and modeling. In this paper, we propose deGraphCS, a learnable deep graph model that uses an intermediate representation technique to convert source code into variable-based flow graphs, enabling more precise modeling of code semantics. Experimental results show that deGraphCS achieves state-of-the-art performance in accurately retrieving code snippets from a large-scale dataset.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

Assessing the Alignment between the Information Needs of Developers and the Documentation of Programming Languages: A Case Study on Rust

Filipe Roseiro Cogo, Xin Xia, Ahmed E. Hassan

Summary: Programming language documentation is crucial for supporting application developers in effectively using a programming language. This article presents an automated approach for evaluating the alignment between developers' information needs and the current state of documentation. The approach leverages semi-supervised topic modelling and reveals both similarities and differences between Q&A posts and official documentation. The results show a relatively high level of topical alignment in Rust documentation, while also identifying areas where specific topics, such as network, game, and database development, are lacking in information.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

Duplicate Bug Report Detection: How Far Are We?

Ting Zhang, Donggyun Han, Venkatesh Vinayakarao, Ivana Clairine Irsan, Bowen Xu, Ferdian Thung, David Lo, Lingxiao Jiang

Summary: Many duplicate bug report detection techniques have been proposed, but insufficient comparison has been made among them. This study fills this gap by comparing these techniques, and a new benchmark is prepared to evaluate their performance. Surprisingly, a simpler technique outperforms sophisticated ones, and a simple technique already adopted in practice achieves comparable results as a research tool. The study provides insights on the current state of duplicate bug report detection and benefits future research in this area.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

How to Find Actionable Static Analysis Warnings: A Case Study With FindBugs

Rahul Yedida, Hong Jin Kang, Huy Tu, Xueqi Yang, David Lo, Tim Menzies

Summary: Automatically generated static code warnings commonly have many false alarms. To improve the accuracy of determining which warnings are actionable, analysts should delve deeper into their algorithms and make better choices. This study demonstrates that by locally adjusting the decision boundary, effective predictors for actionable static code warnings can be created, reaching a new benchmark of 92% median AUC across eight open-source Java projects.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2023)

Article Engineering, Electrical & Electronic

Reliable Transmission for NOMA Systems With Randomly Deployed Receivers

Yibo Zhang, Jingjing Wang, Lanjie Zhang, Yufang Zhang, Qi Li, Kwang-Cheng Chen

Summary: In this paper, the reliable transmission scheme of downlink Non-Orthogonal Multiple Access (NOMA) systems is investigated. The base station's coverage area is divided into multiple annular regions, with receivers randomly distributed within them, to achieve NOMA pairing. Bit Error Rate (BER) expressions with Quadrature Phase-Shift Keying (QPSK) modulation are derived. The BER performance of the receiver with the worst channel gain in each region is studied to ensure reliable communications. An optimal power allocation algorithm is proposed to minimize the transmission power while meeting a given BER constraint. Extensive simulations validate the accuracy of the obtained BER expressions and the effectiveness of the proposed algorithm. These findings offer valuable insights for achieving reliable transmission in NOMA systems with randomly deployed receivers.

IEEE TRANSACTIONS ON COMMUNICATIONS (2023)

Article Engineering, Electrical & Electronic

Knowledge Graph Aided Network Representation and Routing Algorithm for LEO Satellite Networks

Chenxi Li, Wenji He, Haipeng Yao, Tianle Mai, Jingjing Wang, Song Guo

Summary: The increasing applications of LEO satellite networks in various domains have highlighted the need for efficient routing algorithms to accommodate the dynamic changes in network topology. In this paper, we propose a knowledge graph-based representation of satellite network topologies and routing architecture to optimize path selection and calculation cost. Our approach incorporates predicting potential relations between data packets and nodes to select the best relay nodes, resulting in improved packet loss ratio and average delay. Extensive simulations are performed to evaluate the performance and availability of the proposed algorithm.

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY (2023)

Article Engineering, Electrical & Electronic

3U: Joint Design of UAV-USV-UUV Networks for Cooperative Target Hunting

Wei Wei, Jingjing Wang, Zhengru Fang, Jianrui Chen, Yong Ren, Yuhan Dong

Summary: In this paper, a joint design of the UAV-USV-UUV network, also referred to as 3U network, is proposed for cooperative underwater target hunting. An energy-oriented target hunting model is proposed by jointly optimizing the UAV's position, the UUV's trajectory as well as their inter-connectivity. Simulation results show the proposed scheme is suitable for underwater target hunting with a high success rate considering a trade-off between the system energy consumption and inter-connectivity.

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY (2023)

Article Computer Science, Software Engineering

Revisiting the Identification of the Co-evolution of Production and Test Code

Weifeng Sun, Meng Yan, Zhongxin Liu, Xin Xia, Yan Lei, David Lo

Summary: Many previous studies have focused on the co-evolution of production and test code based on samples mined from software repositories. However, the quality of the mined samples is crucial for reliable research conclusions. We conducted an empirical study and found that the existing assumption used in identifying production-test co-evolution samples is often noisy. Based on our findings, we proposed a method called CHOSEN which outperforms existing identification methods and helps draw more accurate conclusions regarding the co-evolution of production and test code.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

Experimental comparison of features, analyses, and classifiers for Android malware detection

Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Yan Naing Tun, David Lo, Lingxiao Jiang, Christoph Bienert

Summary: Android malware detection is an active research field, with machine learning-based approaches proposed using different features such as API usage and sequences. The study found that permission use features performed the best, package-level features were generally better than class-level features, and static features generally outperformed dynamic features.

EMPIRICAL SOFTWARE ENGINEERING (2023)

Article Computer Science, Software Engineering

Context-Aware Neural Fault Localization

Zhuo Zhang, Yan Lei, Xiaoguang Mao, Meng Yan, Xin Xia, David Lo

Summary: Numerous fault localization techniques identify suspicious statements that may cause program failures by analyzing the statistical correlation between test results and program executions. However, they often overlook the importance of failure context in fault analysis and localization. To address this, we propose a context-aware neural fault localization approach (CAN) that incorporates failure context into fault localization by constructing a program dependency graph and using graph neural networks. Experimental results on large-sized programs demonstrate that CAN achieves promising results and outperforms existing baselines by a substantial margin.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2023)

Article Engineering, Electrical & Electronic

Age of Information in UAV Aided Wireless Sensor Networks Relying on Blockchain

Houze Feng, Jingjing Wang, Zhengru Fang, Junhui Qian, Kwang-Cheng Chen

Summary: This paper examines the factors influencing the peak age of information in the transaction-confirmation process of blockchain technology in UAV-aided wireless sensor networks. The closed-form expressions for the average peak age of information are provided. The results indicate that reducing the block size and queue length can decrease the peak age of information, and fixed interval network traffic and network traffic with Markovian properties exhibit better aging behavior than other situations.

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY (2023)

Article Engineering, Electrical & Electronic

UAV-Enabled Covert Federated Learning

Xiangwang Hou, Jingjing Wang, Chunxiao Jiang, Xudong Zhang, Yong Ren, Merouane Debbah

Summary: Integrating unmanned aerial vehicles (UAVs) with federated learning (FL) is a promising approach for handling massive data generated by intelligent devices. This paper proposes a UAV-enabled covert federated learning architecture that emits artificial noise to enhance data security. The effectiveness of the proposed scheme is validated through experiments.

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (2023)

Article Computer Science, Information Systems

1+1 >2: Programming Know-What and Know-How Knowledge Fusion, Semantic Enrichment and Coherent Application

Qing Huang, Zhiqiang Yuan, Zhenchang Xing, Zhengkang Zuo, Changjing Wang, Xin Xia

Summary: This article introduces the need for both API reference (know-what) knowledge and programming task (know-how) knowledge in software programming and proposes a fusion of API-KG and Task-KG to construct an API-Task knowledge graph. The study confirms the necessity of combining both types of knowledge to answer API usage problems. The fused and semantically-enriched API-Task KG supports coherent API/Task-centric knowledge search.

IEEE TRANSACTIONS ON SERVICES COMPUTING (2023)

No Data Available