☆ 4.6 Article

Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2021)

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Volume 47, Issue 10, Pages 2296-2311

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TSE.2019.2946830

Keywords

Libraries; Feature extraction; Machine learning; Software; Computer architecture; Training data; Manuals; API extraction; CNN; word embedding; LSTM; transfer learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a multi-layer neural network based architecture for API extraction, which automatically learns text features to remove the need for manual feature engineering and advanced features dependency. Transfer learning is also adopted to reduce the manual training data labeling overhead when processing software texts of multiple programming languages and libraries.

Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. In this paper, we propose a multi-layer neural network based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazetteers) beyond the input texts. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed. We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture.

Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper