4.6 Article

Automatically Categorizing Software Technologies

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
Volume 46, Issue 1, Pages 20-32

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2018.2836450

Keywords

Taxonomy; information retrieval; natural language processing; Wikipedia; tagging

Funding

  1. Natural Sciences and Engineering Council of Canada (NSERC)

Ask authors/readers for more resources

Informal language and the absence of a standard taxonomy for software technologies make it difficult to reliably analyze technology trends on discussion forums and other on-line venues. We propose an automated approach called $\mathrm{Witt}$ Witt for the categorization of software technologies (an expanded version of the hypernym discovery problem). $\mathrm{Witt}$ Witt takes as input a phrase describing a software technology or concept and returns a general category that describes it (e.g., integrated development environment), along with attributes that further qualify it (commercial, php, etc.). By extension, the approach enables the dynamic creation of lists of all technologies of a given type (e.g., web application frameworks). Our approach relies on Stack Overflow and Wikipedia, and involves numerous original domain adaptations and a new solution to the problem of normalizing automatically-detected hypernyms. We compared $\mathrm{Witt}$ Witt with six independent taxonomy tools and found that, when applied to software terms, $\mathrm{Witt}$ Witt demonstrated better coverage than all evaluated alternative solutions, without a corresponding degradation in false positive rate.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available