4.6 Article

A Data Quality Control Method for Seafloor Observatories: The Application of Observed Time Series Data in the East China Sea

Journal

SENSORS
Volume 18, Issue 8, Pages -

Publisher

MDPI
DOI: 10.3390/s18082628

Keywords

seafloor observatory; data quality control; ARIMA; outlier detection; data interpolation

Funding

  1. Science and Technology Commission of Shanghai [15DZ1207104, 15DZ1203100]
  2. Shanghai Oceanic Administration [Huhaike 2016-07]

Ask authors/readers for more resources

With the construction and deployment of seafloor observatories around the world, massive amounts of oceanographic measurement data were gathered and transmitted to data centers. The increase in the amount of observed data not only provides support for marine scientific research but also raises the requirements for data quality control, as scientists must ensure that their research outcomes come from high-quality data. In this paper, we first analyzed and defined data quality problems occurring in the East China Sea Seafloor Observatory System (ECSSOS). We then proposed a method to detect and repair the data quality problems of seafloor observatories. Incorporating data statistics and expert knowledge from domain specialists, the proposed method consists of three parts: a general pretest to preprocess data and provide a router for further processing, data outlier detection methods to label suspect data points, and a data interpolation method to fill up missing and suspect data. The autoregressive integrated moving average (ARIMA) model was improved and applied to seafloor observatory data quality control by using a sliding window and cleaning the input modeling data. Furthermore, a quality control flag system was also proposed and applied to describe data quality control results and processing procedure information. The real observed data in ECSSOS were used to implement and test the proposed method. The results demonstrated that the proposed method performed effectively at detecting and repairing data quality problems for seafloor observatory data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Interdisciplinary Applications

Neurophysiological Measurements in Higher Education: A Systematic Literature Review

Ali Darvishi, Hassan Khosravi, Shazia Sadiq, Barbara Weber

Summary: The literature review examines the use of neurophysiological measurements in higher education, finding that electroencephalography and facial expression recognition are the dominant measurement types used, experiments mainly utilize pre-experimental designs, and the focus is on the impact of attention and emotion on learning outcomes.

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION (2022)

Article Computer Science, Artificial Intelligence

DataOps-4G: On Supporting Generalists in Data Quality Discovery

Shaochen Yu, Tianwa Chen, Lei Han, Gianluca Demartini, Shazia Sadiq

Summary: Data preparation is a labor-intensive step in data analytics, and manual effort from experts is still required. This paper focuses on data quality discovery and introduces DataOps-4G, a platform that allows users to interact with data without coding. A user study evaluates the effectiveness and efficiency of the platform, showing the potential for non-experts to perform data quality discovery tasks.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Information Systems

Business process and rule integration approaches-An empirical analysis of model understanding

Wei Wang, Tianwa Chen, Marta Indulska, Shazia Sadiq, Barbara Weber

Summary: In this study, an experiment was conducted to investigate whether rule linking can improve understanding performance. The results show that rule linking outperforms separated modeling in terms of understanding effectiveness, efficiency, perceived mental effort, and visual attention. Further analysis reveals that rule linking decreases the occurrence of rule scanning and screening processes, leading to an increase in visual association and improved task performance.

INFORMATION SYSTEMS (2022)

Article Computer Science, Hardware & Architecture

Information Resilience: the nexus of responsible and agile approaches to information use

Shazia Sadiq, Amir Aryani, Gianluca Demartini, Wen Hua, Marta Indulska, Andrew Burton-Jones, Hassan Khosravi, Diana Benavides-Prado, Timos Sellis, Ida Someh, Rhema Vaithianathan, Sen Wang, Xiaofang Zhou

Summary: The demand for effective use of information assets is increasing in both public and private sector organizations. However, there are complex socio-technical challenges in balancing regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. This paper presents a series of case studies to highlight these challenges and introduces Information Resilience as a framework for responsible and agile information use. It aims to develop a manifesto for Information Resilience to guide future research and development in responsible data management.

VLDB JOURNAL (2022)

Article Education & Educational Research

Incorporating AI and learning analytics to build trustworthy peer assessment systems

Ali Darvishi, Hassan Khosravi, Shazia Sadiq, Dragan Gasevic

Summary: This paper presents AI-assisted and analytical approaches to address common concerns in peer assessment systems and increase their trustworthiness.

BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY (2022)

Article Computer Science, Information Systems

A Data-Driven Analysis of Behaviors in Data Curation Processes

Lei Han, Tianwa Chen, Gianluca Demartini, Marta Indulska, Shazia Sadiq

Summary: Understanding data worker behaviors during data preparation is crucial for designing systems that support their exploration of datasets. However, research on data workers' strategies in data preparation activities is lacking. In this study, we investigate the behavior of data workers in discovering data quality issues and explore factors that affect their behaviors and performance. Our experiment using eye-tracking technology reveals strategies, proficiency in coding, and importance of external resource search. We also propose a systematic approach to improve data curation processes through collective intelligence.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2023)

Article Computer Science, Interdisciplinary Applications

The biggest business process management problems to solve before we die

Iris Beerepoot, Claudio Di Ciccio, Hajo A. Reijers, Stefanie Rinderle-Ma, Wasana Bandara, Andrea Burattin, Diego Calvanese, Tianwa Chen, Izack Cohen, Benoit Depaire, Gemma Di Federico, Marlon Dumas, Christopher van Dun, Tobias Fehrer, Dominik A. Fischer, Avigdor Gal, Marta Indulska, Vatche Isahagian, Christopher Klinkmueller, Wolfgang Kratsch, Henrik Leopold, Amy Van Looy, Hugo Lopez, Sanja Lukumbuzya, Jan Mendling, Lara Meyers, Linda Moder, Marco Montali, Vinod Muthusamy, Manfred Reichert, Yara Rizk, Michael Rosemann, Maximilian Roeglinger, Shazia Sadiq, Ronny Seiger, Tijs Slaats, Mantas Simkus, Ida Asadi Someh, Barbara Weber, Ingo Weber, Mathias Weske, Francesca Zerbato

Summary: This paper provides an overview of the major research problems in the field of Business Process Management. These challenges have been identified through an open call to the community, discussed and refined in a workshop, and described in detail in this paper with motivations for further investigation. This overview aims to inspire both novice and advanced scholars interested in innovative ideas for analyzing, designing, and managing work processes using information technology.

COMPUTERS IN INDUSTRY (2023)

Article Computer Science, Information Systems

On the role of human and machine metadata in relevance judgment tasks

Jiechen Xu, Lei Han, Shazia Sadiq, Gianluca Demartini

Summary: Collecting relevance judgments from human assessors is crucial to evaluate the effectiveness of Information Retrieval (IR) systems. Crowdsourcing has been successfully used to scale up the collection of manual relevance judgments, and previous studies have explored the impact of different judgment task design elements on judgment quality and efficiency. This research investigates the positive and negative effects of providing crowd assessors with additional metadata beyond the topic and document to be judged. It examines the impact of human and machine metadata on judgment quality, cost, and the influence of metadata quality on the collected judgments.

INFORMATION PROCESSING & MANAGEMENT (2023)

Article Computer Science, Interdisciplinary Applications

Assessing the Quality of Student-Generated Content at Scale: A Comparative Analysis of Peer-Review Models

Ali Darvishi, Hassan Khosravi, Afshin Rahimi, Shazia Sadiq, Dragan Gasevic

Summary: Engaging students in creating learning resources has pedagogical benefits, but a selection process is needed to separate high-quality from low-quality student-generated content (SGC). Peer-review is commonly used, but it introduces the challenge of achieving consensus among multiple reviewers. In this study, 18 inference models were investigated for inferring the quality of SGC, with the findings suggesting the need for advanced probabilistic and text analysis methods, as well as instructor oversight and training of students for reliable reviews.

IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES (2023)

Proceedings Paper Computer Science, Information Systems

Workshop on Human-in-the-loop Data Curation

Gianluca Demartini, Jie Yang, Shazia Sadiq

Summary: Data quality has recently received attention due to the proliferation of data analytics and machine learning applications, and its success relies on both the quantity and quality of data. Data curation, which includes activities like annotation, cleaning, and integration, is crucial in ensuring the quality of analytics results. Mishandling data challenges can have negative effects, particularly in critical domains like healthcare and finance.

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

A Behavioural Analysis of Metadata Use in Evaluating the Quality of Repurposed Data

Hui Zhou, Lei Han, Gianluca Dermatini, Marta Indulska, Shazia Sadiq

Summary: Existing approaches for evaluating data quality are not applicable to new, unfamiliar and repurposed datasets, where users need to evaluate the quality of such data despite the lack of involvement in the data collection process. This paper investigates the role of metadata in evaluating the quality of repurposed datasets, gathering user behavior data through a lab experiment to explore when, how and why users use metadata in such tasks. The results highlight the critical role of metadata in evaluating repurposed data and provide insights into metadata usage patterns.

CONCEPTUAL MODELING (ER 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Open Learner Models for Multi-activity Educational Systems

Solmaz Abdi, Hassan Khosravi, Shazia Sadiq, Ali Darvishi

Summary: In recent years, there has been an increasing trend in using student-centred approaches within educational systems, engaging students in various higher-order learning activities. This paper proposes an interpretable learner model called MA-Elo, which captures a student's knowledge state based on their engagement with multiple types of learning activities. Results show that MA-Elo outperforms baseline and some state-of-the-art learner models in predictive performance.

ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II (2021)

Proceedings Paper Computer Science, Interdisciplinary Applications

Charting the Design and Analytics Agenda of Learnersourcing Systems

Hassan Khosravi, Gianluca Demartini, Shazia Sadiq, Dragan Gasevic

Summary: Learnersourcing is an effective learner-centered approach for harnessing students' creativity and evaluation power in education. This paper presents lessons learned from the development and deployment of a learnersourcing system, highlighting best practices for assessing student contributions, incentivizing high-quality work, and providing actionable insights for instructors to guide student learning. These findings contribute to the growing literature on effective learnersourcing systems and technological educational solutions for learner-centered learning at scale.

LAK21 CONFERENCE PROCEEDINGS: THE ELEVENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE (2021)

Article Education & Educational Research

Intelligent Learning Analytics Dashboards: Automated Drill-Down Recommendations to Support Teacher Data Exploration

Hassan Khosravi, Shiva Shabaninejad, Aneesha Bakharia, Shazia Sadiq, Marta Indulska, Dragan Gasevic

Summary: This paper introduces a human-in-the-loop AI approach to assist educators in conducting more comprehensive analysis of student data, aiming to identify and take appropriate intervention measures for subpopulations with deviations in performance or learning process.

JOURNAL OF LEARNING ANALYTICS (2021)

Article Computer Science, Interdisciplinary Applications

Evaluating the Quality of Learning Resources: A Learnersourcing Approach

Solmaz Abdi, Hassan Khosravi, Shazia Sadiq, Gianluca Demartini

Summary: Learnersourcing is being considered as an alternative method for evaluating the quality of learning resources. Research shows that students' ratings strongly correlate with those of experts, and a consensus approach based on matrix factorization can improve the accuracy of aggregating learnersourced decisions. By incorporating information on student performance and domain experts' ratings, the accuracy of results can be further enhanced.

IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES (2021)

No Data Available