4.6 Article

Program Characterization Using Runtime Values and Its Application to Software Plagiarism Detection

期刊

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
卷 41, 期 9, 页码 925-943

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2015.2418777

关键词

Software plagiarism detection; dynamic code identification

资金

  1. US National Science Foundation (NSF) [CCF-1320605, NSF CNS-1223710, NSF CNS-0905131, NSF CNS-0916469]
  2. AFOSR (MURI) [FA9550-07-1-0527]
  3. ARO (MURI) [W911NF-09-1-0525, W911NF-13-1-0421]
  4. AFRL [FA8750-08-C-0137]
  5. National Natural Science Foundation of China (NSFC) [61100228]
  6. National High-tech R&D Program of China [2012AA013101]
  7. Strategic Priority Research Program of the Chinese Academy of Sciences [XDA06030601, XDA06010701]
  8. Direct For Computer & Info Scie & Enginr
  9. Division Of Computer and Network Systems [1223710] Funding Source: National Science Foundation
  10. Division of Computing and Communication Foundations
  11. Direct For Computer & Info Scie & Enginr [1320605] Funding Source: National Science Foundation

向作者/读者索取更多资源

Illegal code reuse has become a serious threat to the software community. Identifying similar or identical code fragments becomes much more challenging in code theft cases where plagiarizers can use various automated code transformation or obfuscation techniques to hide stolen code from being detected. Previous works in this field are largely limited in that (i) most of them cannot handle advanced obfuscation techniques, and (ii) the methods based on source code analysis are not practical since the source code of suspicious programs typically cannot be obtained until strong evidences have been collected. Based on the observation that some critical runtime values of a program are hard to be replaced or eliminated by semantics-preserving transformation techniques, we introduce a novel approach to dynamic characterization of executable programs. Leveraging such invariant values, our technique is resilient to various control and data obfuscation techniques. We show how the values can be extracted and refined to expose the critical values and how we can apply this runtime property to help solve problems in software plagiarism detection. We have implemented a prototype with a dynamic taint analyzer atop a generic processor emulator. Our value-based plagiarism detection method (VaPD) uses the longest common subsequence based similarity measuring algorithms to check whether two code fragments belong to the same lineage. We evaluate our proposed method through a set of real-world automated obfuscators. Our experimental results show that the value-based method successfully discriminates 34 plagiarisms obfuscated by SandMark, plagiarisms heavily obfuscated by KlassMaster, programs obfuscated by Thicket, and executables obfuscated by Loco/Diablo.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据