4.4 Article Proceedings Paper

Malware classification using dynamic features and Hidden Markov Model

Journal

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS
Volume 31, Issue 2, Pages 837-847

Publisher

IOS PRESS
DOI: 10.3233/JIFS-169015

Keywords

Malware classification; Hidden Markov Model; sequence classification; machine learning

Ask authors/readers for more resources

In recent years the number of new malware threats has increased significantly, causing a damage of billions of dollars globally. To counter this aggressive malware attack, the anti-malware industry needs to be able to correctly classify malware in order to provide defense against them. Consequently, malware classification has been an active area of research, and a multitude of malware classification approaches have been proposed in the literature. This paper evaluates two methods of sequence classification based on Hidden Markov Model, namely the maximum likelihood and similarity-based methods, for classification of malware using a large and comprehensive dataset. System calls generated by known malware during execution are used as observation sequences to train the Hidden Markov Models. Malware samples are evaluated against the trained models to produce similarity vectors, which are used in the maximum likelihood and similarity-based classification schemes to predict the family for an unknown malware sample. Comparison of the two schemes shows that combining the powerful statistical pattern analysis capability of Hidden Markov Models and discriminative classifiers in the similarity based method results in a significantly better classification performance as compared to the maximum likelihood approach. Furthermore, evaluation of different classifiers in the similarity-based method demonstrates that Random Forest classifier performs better than other classifiers on malware similarity vectors.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

Deep Learning Based Biomedical Literature Classification Using Criteria of Scientific Rigor

Muhammad Afzal, Beom Joo Park, Maqbool Hussain, Sungyoung Lee

ELECTRONICS (2020)

Article Computer Science, Interdisciplinary Applications

Making science computable: Developing code systems for statistics, study design, and risk of bias

Brian S. Alper, Joanne Dehnbostel, Muhammad Afzal, Vignesh Subbian, Andrey Soares, Ilkka Kunnamo, Khalid Shahin, Robert C. McClure

Summary: The COVID-19 crisis has accelerated the development of infrastructure for electronic data exchange, specifically in scientific and informatics fields, to improve the identification, processing, and reporting of scientific findings. The use of new standards and tools, such as the Fast Healthcare Interoperability Resources (FHIR) and the EBMonFHIR project, is overcoming interoperability issues in evidence-based medicine. This effort aims to make scientific communication more efficient and detailed, ultimately reducing costs and improving health outcomes, quality of life, and satisfaction among healthcare professionals and patients.

JOURNAL OF BIOMEDICAL INFORMATICS (2021)

Article Health Care Sciences & Services

COVID-19 Knowledge Resource Categorization and Tracking: Conceptual Framework Study

Muhammad Afzal, Maqbool Hussain, Jamil Hussain, Jaehun Bang, Sungyoung Lee

Summary: This study aims to categorize COVID-19 information resources into a defined structure to facilitate resource identification, track information workflows, and guide contextual dashboard design and development. By organizing resources at primary, secondary, and tertiary levels, a conceptual framework was developed to access global initiatives with enriched metadata and track interactions between different resources. This three-level structure allows for consistent organization and management of existing and future COVID-19 knowledge resources.

JOURNAL OF MEDICAL INTERNET RESEARCH (2021)

Article Computer Science, Interdisciplinary Applications

An NLP-based citation reason analysis using CCRO

Imran Ihsan, M. Abdul Qadir

Summary: In recent scientific advancements, Artificial Intelligence and Natural Language Processing play a key role in classifying documents and extracting information. This research focuses on understanding the reasons behind citations using an ontology-based approach, with an emphasis on sentiment analysis and collaborative meanings. By annotating citation texts and automatically extracting reasons, the study calculates accuracy in both publicly available and manually curated corpora.

SCIENTOMETRICS (2021)

Article Computer Science, Interdisciplinary Applications

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Humaira Waqas, Muhammad Abdul Qadir

Summary: Author name ambiguity is a significant challenge for digital libraries and scholarly data search engines, affecting the accuracy of authorship data provided. Traditional solutions are complex, feature dependent, and fail to effectively disambiguate authors with similar names but different citation numbers. A proposed multi-layer heuristics-based clustering framework addresses this issue by utilizing global and structure aware features, and incorporating contextual information for grouping similar publications. Experimental results demonstrate the framework's superior performance compared to other existing approaches, achieving an overall pF1 of 93.3% with only three features.

SCIENTOMETRICS (2021)

Article Environmental Sciences

Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation

Asim Abbas, Muhammad Afzal, Jamil Hussain, Taqdir Ali, Hafiz Syed Muhammad Bilal, Sungyoung Lee, Seokhee Jeon

Summary: The study introduces a comprehensive rule-based system for automatic extraction of clinical concepts from unstructured clinical narrative documents with higher accuracy and transparency. The system's performance comparison showed an average F1-score of 72.94%, significantly outperforming existing baseline systems, especially in terms of problem-related concepts.

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2021)

Article Environmental Sciences

Clinical Decision Support System Based on Hybrid Knowledge Modeling: A Case Study of Chronic Kidney Disease-Mineral and Bone Disorder Treatment

Syed Imran Ali, Su Woong Jung, Hafiz Syed Muhammad Bilal, Sang-Ho Lee, Jamil Hussain, Muhammad Afzal, Maqbool Hussain, Taqdir Ali, Taechoong Chung, Sungyoung Lee

Summary: Clinical decision support systems (CDSSs) are the latest technological transformation in healthcare, assisting clinicians in complex decision-making. This study proposes a CDSS for clinicians managing end-stage renal disease patients, aiming to aid in dosage prescription. The evaluation shows high compliance and positive user experience.

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2022)

Article Multidisciplinary Sciences

CustFRE: An annotated dataset for extraction of family relations from English text

Raabia Mumtaz, Muhammad Abdul Qadir, Asif Saeed

Summary: Meaningful Information extraction is a crucial task, and it requires annotated datasets which are scarce. This manuscript presents a dataset, CustFRE, for extracting family relations from text, which can be used as a benchmark for evaluating and training family relation extraction systems.

DATA IN BRIEF (2022)

Article Computer Science, Artificial Intelligence

CustRE: a rule based system for family relations extraction from english text

Raabia Mumtaz, Muhammad Abdul Qadir

Summary: This paper introduces a system CustRE for identifying and classifying family relations from English text. By using rules, regular expressions, and co-reference rules, it successfully extracts explicit and implicit family relations mentioned in the text.

KNOWLEDGE AND INFORMATION SYSTEMS (2022)

Review Chemistry, Multidisciplinary

User Experience Quantification Model from Online User Reviews

Jamil Hussain, Zahra Azhar, Hafiz Farooq Ahmad, Muhammad Afzal, Mukhlis Raza, Sungyoung Lee

Summary: This study proposes a user experience quantification model to understand customer satisfaction from online reviews. The model consists of three steps: selecting relevant reviews, extracting user experience dimensions, and mapping them to a customer satisfaction model. The results show that the proposed method performs well in terms of accuracy and topic coherence.

APPLIED SCIENCES-BASEL (2022)

Article Computer Science, Artificial Intelligence

MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput

Asif Muhammad, Muhammad Abdul Qadir

Summary: The study presented a job scheduler MF-Storm based on max-flow min-cut algorithm to achieve near-optimal schedule for maximizing throughput. The scheduler considers the processing and communication demands, available computational and communicational resources in a heterogeneous cluster to dynamically schedule streaming applications with minimized scheduling cost.

PEERJ COMPUTER SCIENCE (2022)

Article Computer Science, Information Systems

CNN-Fusion: An effective and lightweight phishing detection method based on multi-variant ConvNet

Musarat Hussain, Chi Cheng, Rui Xu, Muhammad Afzal

Summary: Phishing scams are on the rise and require rapid, precise, and low-cost prevention measures. CNN-Fusion, a character-level convolutional neural network, is proposed as an effective and lightweight method for detecting phishing URLs. It utilizes parallel one-layer CNN variants with different-sized kernels and applies techniques like SpatialDropout1D and max-over time pooling to enhance its robustness and feature selection. Evaluation on publicly available datasets and against AI adversarial attacks shows superior performance compared to existing methods with significantly reduced training time and memory consumption, achieving an average accuracy above 99%.

INFORMATION SCIENCES (2023)

Article Computer Science, Interdisciplinary Applications

Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation

Cynthia Lokker, Elham Bagheri, Wael Abdelkader, Rick Parrish, Muhammad Afzal, Tamara Navarro, Chris Cotoi, Federico Germini, Lori Linkins, R. Brian Haynes, Lingyang Chu, Alfonso Iorio

Summary: Deep learning models, especially variants of Bidirectional Encoder Representations from Transformers (BERT), can accurately identify high-quality evidence with high clinical relevance in the biomedical literature. This improves the efficiency of evidence discovery for clinical practice.

JOURNAL OF BIOMEDICAL INFORMATICS (2023)

Article Computer Science, Hardware & Architecture

MOPTIC-SM: Sleep mode-enabled multi-optimized intermittent computing for transiently powered systems

Kashif Javed, Naveed Anwar Bhatti, Mohammad Imran

Summary: The proliferation of IoT devices has increased their usage across various applications. To address the environmental and economic concerns of replacing conventional power supplies, TPESs utilize ambient energy. However, the non-uniform availability of ambient energy leads to frequent system reboots, and excessive energy consumption due to a high number of checkpoints. This research proposes a novel sleep mode-enabled multi-optimized intermittent computing method that reduces the number of checkpoints by combining data sampling and memoization.

JOURNAL OF SYSTEMS ARCHITECTURE (2023)

Article Computer Science, Information Systems

Identifying Driver Genes Mutations with Clinical Significance in Thyroid Cancer

Hyeong Won Yu, Muhammad Afzal, Maqbool Hussain, Hyungju Kwon, Young Joo Park, June Young Choi, Kyu Eun Lee

Summary: This study aimed to identify mutations in genes that co-exist with mutated BRAF in papillary thyroid carcinoma (PTC) and analyze their frequency and clinical relevance. Results revealed that mutations in ALK, ATM, COL1A1, MSTIR, PRKCA, and WNK1 most commonly coincide with mutated BRAF in PTC.

CMC-COMPUTERS MATERIALS & CONTINUA (2021)

No Data Available