Article
Chemistry, Multidisciplinary
Anh-Cang Phan, Thuong-Cang Phan, Hung-Phi Cao, Thanh-Ngoan Trieu
Summary: In the era of data deluge, handling skewed data with join operations in MapReduce poses significant challenges. This study aims to evaluate skew-join strategies for large-scale datasets in Spark, providing both theoretical and practical insights through cost models and experiments.
APPLIED SCIENCES-BASEL
(2022)
Article
Multidisciplinary Sciences
Jun-Ha Lee, Hyuk-Yoon Kwon
Summary: In this study, we conduct a large-scale digital forensic investigation on Apache Spark using a Windows registry. By using distributed algorithms and Apache Spark operations, we are able to efficiently analyze large-scale registry data collected from multiple Windows systems to detect suspicious data modifications. Experimental results demonstrate the efficiency and scalability of our method in processing large-scale registry data.
Article
Computer Science, Theory & Methods
Haosong Li, Phillip C-Y Sheu
Summary: This paper proposes a scalable association rule learning algorithm for efficiently learning gene association rules from large-scale microarray datasets. The algorithm ranks the rules based on their importance and outperforms the traditional Apriori algorithm in terms of performance.
JOURNAL OF BIG DATA
(2022)
Article
Computer Science, Theory & Methods
Haosong Li, Phillip C-Y Sheu
Summary: This paper presents a heuristic approach based on divide-and-conquer to address the scalability issues in association rule learning, showing significant speedup and approximate results close to accurate results compared to existing algorithms.
JOURNAL OF BIG DATA
(2021)
Article
Environmental Sciences
Ning Wang, Fang Chen, Bo Yu, Lei Wang
Summary: Superpixel segmentation algorithms are widely used in image processing, but processing large-scale images faces challenges due to memory and computational resource limitations. This research proposes a distributed strategy based on Apache Spark to overcome these challenges and achieve higher accuracy and efficiency in superpixel segmentation.
Article
Computer Science, Hardware & Architecture
Gangmin Park, Yong Seok Heo, Kisung Lee, Hyuk-Yoon Kwon
Summary: This paper presents PSLIC-on-Spark and PASLIC-on-Spark parallel algorithms, analyzes the trade-off relationship between processing speed and accuracy, and proposes an improvement to enhance accuracy.
JOURNAL OF SUPERCOMPUTING
(2022)
Article
Computer Science, Information Systems
Mousumi Chaudhury, Amin Karami, Mustansar Ali Ghazanfar
Summary: This research paper analyzes the use of machine learning models supported by Apache Spark to classify music genres in an online music library. The experimental results demonstrate that the random forest classifier outperforms other classifiers, achieving 90% accuracy in music genre classification.
Article
Multidisciplinary Sciences
Paul Stapor, Leonard Schmiester, Christoph Wierling, Simon Merkt, Dilan Pathirana, Bodo M. H. Lange, Daniel Weindl, Jan Hasenauer
Summary: This study applies mini-batch optimization methods to ODE models and benchmarks them on a large-scale cancer signaling model. The results show improved optimization performance compared to established methods and significantly reduced computation.
NATURE COMMUNICATIONS
(2022)
Article
Computer Science, Information Systems
Sabrina De Capitani di Vimercati, Dario Facchinetti, Sara Foresti, Giovanni Livraga, Gianluca Oldani, Stefano Paraboschi, Matthew Rossi, Pierangela Samarati
Summary: k-Anonymity and -diversity are privacy metrics used to protect the privacy of individuals in a dataset. Existing solutions for enforcing these metrics are limited by their assumption of a centralized scenario. This article proposes a distributed solution that extends Mondrian, enabling k-anonymity and -diversity on large datasets through parallel computation. The approach efficiently distributes computation among workers, allowing each worker to independently anonymize a portion of the dataset. Experimental evaluation shows scalability without compromising anonymization quality.
IEEE TRANSACTIONS ON BIG DATA
(2023)
Article
Computer Science, Information Systems
Bin Li, Jia Liu, Bo Ji
Summary: With the growth of IoT applications, there is a demand for efficient and low-overhead uplink scheduling algorithms for large-scale IoT applications. By studying sampling constraints and joint sampling and transmission algorithms, it is possible to develop low-overhead scheduling algorithms that reduce throughput loss.
IEEE TRANSACTIONS ON MOBILE COMPUTING
(2021)
Article
Chemistry, Multidisciplinary
Xiang Wu, Yueshun He
Summary: This paper proposes a lightweight distributed data filtering model using RoaringBitmap to compress and distribute the dimension table Key to each node. Experimental results show that this optimization method can reduce disk usage, shorten running time, and decrease network I/O and disk I/O for Spark Join tasks in the case of massive data.
APPLIED SCIENCES-BASEL
(2023)
Article
Computer Science, Information Systems
Aaron M. Wesley, Timothy C. Matisziw
Summary: Geographic variation in object appearance on Earth can impact machine learning models trained on geo-tagged image datasets. The use of geospatial adaptations of Frechet Inception Distance and Inception Score methods can help detect and assess geodiversity issues in large remote sensing image datasets. Rigorous testing on simulated datasets demonstrates the stability, sensitivity, and broad applicability of these methods for dataset analysis.
Article
Computer Science, Artificial Intelligence
Linzi Yin, Liyang Qin, Zhaohui Jiang, Xuemei Xu
Summary: The paper proposed a novel parallel attribute reduction algorithm by considering the Apache Spark framework, which improved computing efficiency by designing core attribute decision strategy and batch processing strategy, speeding up the algorithm with three techniques, and achieving significant improvements in experiments.
KNOWLEDGE-BASED SYSTEMS
(2021)
Article
Computer Science, Information Systems
Wen Xiong, Xiaoxuan Wang, Hao Li
Summary: This paper presents a method of parallel compression of GPS trajectory datasets using Spark, and experimental results show that this method can significantly reduce storage cost and compression time.
Article
Computer Science, Information Systems
Chandan Misra, Sourangshu Bhattacharya, Soumya K. Ghosh
Summary: This article introduces Stark, a new fast and highly scalable distributed matrix multiplication algorithm based on Strassen's algorithm, implemented on Apache Spark. By creating a distributed recursion tree and parallel processing divide and combine steps, Stark achieves faster execution time. Experimental results show that Stark outperforms existing distributed matrix multiplication implementations for high matrix sizes and exhibits strong scalability.
IEEE TRANSACTIONS ON BIG DATA
(2022)
Article
Computer Science, Information Systems
Krishan Kumar Sethi, Dharavath Ramesh, Munesh Chandra Trivedi
Summary: HUI mining is a data mining technique to discover profitable patterns, and this research proposes new strategies and a distributed algorithm to make it suitable for big data processing. Experimental results demonstrate that the proposed algorithm outperforms existing algorithms.
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Rahul Mishra, Dharavath Ramesh, Damodar Reddy Edla, Munesh Chandra Trivedi
Summary: Cloud storage offers efficient data management, but security concerns arise. Public auditing models, utilizing third-party auditors, have been developed to address data integrity issues. However, these models are vulnerable to procrastinating auditors. This paper introduces a blockchain-based methodology, employing a certificateless public auditing model, to combat malicious and procrastinating auditors with efficient user revocation.
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
(2022)
Article
Automation & Control Systems
Pankaj Pal, Rashmi Priya Sharma, Sachin Tripathi, Chiranjeev Kumar, Dharavath Ramesh
Summary: This proposal investigates the impact of grass vegetation elevation and density on path loss in an IoT-enabled wireless sensor network for crop monitoring. Real-time measurements at different node heights and vegetation depths reveal that using a free-space or tree-based path loss model leads to network disconnections due to changes in vegetation density throughout a crop growth cycle. An empirical path loss model is formulated to estimate signal strength during different development phases of medium grass vegetation. The 2.4 GHz RF path loss coefficient is estimated using collected data, and a generic path loss model is developed through multiple regression analysis. The effectiveness of the model is validated through proof of concept experiments.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
(2022)
Article
Automation & Control Systems
Yuwen Liu, Huiping Wu, Khosro Rezaee, Mohammad R. Khosravi, Osamah Ibrahim Khalaf, Arif Ali Khan, Dharavath Ramesh, Lianyong Qi
Summary: In this study, an Interaction-enhanced and Time-aware Graph Convolution Network (ITGCN) is proposed for successive point-of-interest (POI) recommendation. By using an improved graph convolution network and a self-attention aggregator, the dynamic representation of users and POIs can be learned, capturing high-order connectivity. Experimental results show that ITGCN outperforms existing methods.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
(2023)
Article
Computer Science, Information Systems
Rahul Mishra, Dharavath Ramesh, Salil S. Kanhere, Damodar Reddy Edla
Summary: This paper introduces a blockchain-based secure decentralized public auditing model and an efficient deduplication scheme. By using blockchain instead of a centralized third-party auditor, it reduces the waste of computational and storage resources. By employing redactability to address security issues and efficient deduplication scheme, it achieves storage savings and data protection.
ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS
(2023)
Article
Agronomy
Sonal Jain, Dharavath Ramesh, Munesh C. Trivedi, Damodar Reddy Edla
Summary: Given the extensive variability in current climate conditions, it is important to plan water resources optimally to efficiently manage socio-economic and environmental requirements. This study introduced a multi-objective model to maximize crop net return and effectively manage water resources. The model was applied to a case study in the Pennar-Palar-Cauvery link canal command in India, and three meta-heuristic approaches were employed to solve the model and evaluate their performance.
AGRICULTURAL WATER MANAGEMENT
(2023)
Article
Automation & Control Systems
Naela Rizvi, Dharavath Ramesh, P. C. Srinivasa Rao, Koushik Mondal
Summary: This study proposes an intelligent fuzzy scheduler that utilizes the salp swarm algorithm to learn and optimize fuzzy task-resource allocation rules. It addresses complex and uncertain computation offloading problems in fog computing. Experimental results demonstrate that the proposed approach outperforms other classical algorithms in workflow scheduling problems.
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING
(2023)
Article
Engineering, Electrical & Electronic
Pankaj Pal, Rashmi Priya Sharma, Sachin Tripathi, Chiranjeev Kumar, Dharavath Ramesh
Summary: This proposal analyzes the impact of varying vegetation density on the Received Signal Strength (RSS), coverage, and energy consumption of an IoT assisted Wireless Sensor Network (IoWSN) through a measurement campaign. The study suggests an empirically formulated Path Loss Model (PLM) to estimate excess attenuation and performs a Non-dominated Sorting Genetic Algorithm (NSGA-III) optimization for initial node deployment with a heterogeneous transmission range. Transmitter output power scheduling is used to minimize over-coverage by dynamically adjusting the power based on changes in the captured RSS. The Proof of Concept validates the improvements in coverage, connectivity, and energy efficiency compared to existing approaches.
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS
(2023)
Article
Computer Science, Information Systems
Naela Rizvi, Dharavath Ramesh, Lipo Wang, Annappa Basava
Summary: This article introduces an algorithm called MFGA (Modified Fuzzy Adaptive Genetic Algorithm) to minimize the makespan and improve resource utilization of workflows under deadline and budget constraints. The algorithm utilizes a fuzzy logic controller to control crossover and mutation rates and incorporates novel crossover and mutation techniques. Simulation experiments demonstrate that MFGA outperforms other state-of-the-art algorithms.
IEEE TRANSACTIONS ON SERVICES COMPUTING
(2023)
Review
Agronomy
Gabrijel Ondrasek, Jelena Horvatinec, Marina Bubalo Kovacic, Marko Reljic, Marko Vincekovic, Santosha Rathod, Nirmala Bandumula, Ramesh Dharavath, Muhammad Imtiaz Rashid, Olga Panfilova, Kodikara Arachchilage Sunanda Kodikara, Jasmina Defterdarovic, Vedran Krevh, Vilim Filipovic, Lana Filipovic, Tajana Cop, Mario Njavro
Summary: Organic agriculture is an increasingly popular global concept that focuses on sustainable and environmentally-friendly practices. It has the potential to improve ecosystems, reduce pollution, and provide safe and nutritious food. This study reviews the global utilization of land resources in organic agriculture, with a focus on EU countries, and highlights the challenges and opportunities for expanding organic farming.
Article
Computer Science, Cybernetics
Xiao Liu, Shunmei Meng, Qianmu Li, Qiyan Liu, Qiang He, Dharavath Ramesh, Lianyong Qi
Summary: This article proposes a new feature-aware disentangled graph neural network (FDGNN) model for recommendation, aiming to achieve better recommendation performance and model interpretability by learning the relationship between user behavior and important features of items.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS
(2023)
Article
Computer Science, Information Systems
Rashmi Priya Sharma, Ramesh Dharavath, Damodar R. Edla
Summary: Advanced farming techniques combined with IoT-compatible crop monitoring and data collection systems can enhance agricultural productivity by understanding environmental conditions, identifying crop diseases, and optimizing planting seasons. By analyzing data collected through an IoT monitoring system, the impact of weather parameters on crop yield and pest breeding conditions can be determined. The proposed fuzzy inference system uses fuzzy rules to find suitable cropping windows and low pest breeding conditions, benefiting farmers in achieving maximum yields.
INTERNET OF THINGS
(2023)
Proceedings Paper
Computer Science, Interdisciplinary Applications
Dharavath Ramesh, Munesh Chandra Trivedi
Summary: Data science has a growing demand for efficient collaboration in analyzing and manipulating large-scale datasets. The current ad-hoc versioning mechanism is no longer sufficient, thus a framework implemented on top of relational databases is proposed to enable efficient management and querying of dataset versions.
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART V
(2022)
Article
Computer Science, Interdisciplinary Applications
Rahul Mishra, Dharavath Ramesh, Damodar Reddy Edla, Lianyong Qi
Summary: In recent years, cloud storage service has gained popularity in the healthcare industry. Outsourcing EHRs to the cloud provides scalability, flexibility, low-cost operations, and availability, but also raises security concerns. This study proposes a secure EHR storage model based on a consortium blockchain to ensure confidentiality, integrity, and correctness by integrating EHR outsourcing operations into blockchain transactions.
JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION
(2022)
Article
Computer Science, Hardware & Architecture
Rashmi Priya, Dharavath Ramesh, Venkanna Udutalapally
Summary: Advanced technology in agriculture can increase yield by understanding suitable environmental conditions, soil health status, water and fertilizer requirements, and crop monitoring. This study proposes a rule-based fuzzy classification method for predicting sowing time, optimizing the rule base, and correlating the fuzziness of sowing slots with yield to measure the model's effectiveness.
IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING
(2022)