4.6 Article

Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
Volume 47, Issue 10, Pages 2296-2311

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2019.2946830

Keywords

Libraries; Feature extraction; Machine learning; Software; Computer architecture; Training data; Manuals; API extraction; CNN; word embedding; LSTM; transfer learning

Ask authors/readers for more resources

This paper proposes a multi-layer neural network based architecture for API extraction, which automatically learns text features to remove the need for manual feature engineering and advanced features dependency. Transfer learning is also adopted to reduce the manual training data labeling overhead when processing software texts of multiple programming languages and libraries.
Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. In this paper, we propose a multi-layer neural network based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazetteers) beyond the input texts. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed. We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available