Article
Computer Science, Artificial Intelligence
Xiuwen Gong, Dong Yuan, Wei Bao, Fulin Luo
Summary: Partially labeled data learning is widely used in data science, but the challenge lies in handling ambiguities caused by false-positive labels. The current strategy is to identify the ground-truth labels from the candidate set, but it lacks theoretical interpretation. Instead, we propose a novel unifying probabilistic framework that provides a clear formulation and theoretical interpretation for PLL and PML. Our framework also integrates the identifying and embedding methods, considering feature and label correlations. Experimental results show the superiority of our derived framework in both PLL and PML scenarios.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Geography, Physical
Kamyar Hasanzadeh, Nora Fagerholm, Hans Skov-Petersen, Anton Stahl Olafsson
Summary: Public participation geographic information system (PPGIS) is a method that aims to capture individuals' spatial experiences. However, the third dimension of altitude is often overlooked in PPGIS research and practice due to the complexity of implementing 3D surveys and a lack of analytical preparedness. This study proposes an analytical framework for 3D PPGIS data and suggests the use of geospatial metrics for analysis. The authors argue that 3D adapted geometric and landscape metrics can help overcome the complexities associated with this emerging participatory data.
INTERNATIONAL JOURNAL OF DIGITAL EARTH
(2023)
Article
Automation & Control Systems
Ray Y. Zhong, Goran D. Putnik, Stephen T. Newman
Summary: This article proposes a data heterogeneous analytics framework for a radio-frequency identification (RFID) enabled factory, validated using RFID captured data from a real-life company. The performance of machining processes, logistics operations, and inspection behavior will be examined using the framework.
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
(2021)
Article
Physics, Multidisciplinary
Mianning Hu, Xin Li, Mingfeng Li, Rongchen Zhu, Binzhou Si
Summary: In the research of constructing a telecom-fraud risk warning and intervention model, the focus is on utilizing multivariate heterogeneous data for front-end prevention and management of telecommunication network fraud. The Bayesian network-based model was designed by incorporating existing data, literature, and expert knowledge. The model was improved using City S as an example, and a telecom-fraud analysis and warning framework was proposed. The evaluation shows that age sensitivity to telecom-fraud losses is 13.5%, anti-fraud propaganda reduces the probability of losses above 300,000 yuan by 2%, and telecom-fraud losses are more prevalent in summer and less prevalent in autumn, with special time points like Double 11 being prominent. The model has practical application value and the warning framework provides decision support for identifying susceptible groups, locations, and temporal environments to combat fraud and prevent losses.
Article
Computer Science, Theory & Methods
Shai Berkovitz, Amit Mazuz, Michael Fire
Summary: Open information about government organizations is important for citizens and researchers to monitor government activities and improve transparency. The Collecting and Analyzing Parliament Data (CAPD) framework allows for efficient collection and analysis of large-scale public governmental data from multiple sources. This study demonstrates the effectiveness of the CAPD framework in identifying anomalous meetings and detecting events that impact parliamentary functionality.
JOURNAL OF BIG DATA
(2023)
Article
Biochemical Research Methods
Methun Kamruzzaman, Ananth Kalyanaraman, Bala Krishnamoorthy, Stefan Hey, Patrick S. Schnable
Summary: Phenomics is a new branch of biology that captures environmental and phenotypic traits using high throughput tools, providing insights into how multiple factors interact and contribute to growth and behavior. Hyppo-X is a new algorithmic approach for exploring complex phenomics data visually and characterizing the role of the environment on phenotypic traits.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2021)
Article
Thermodynamics
George Halkos, Jaime Moll de Alba, Valentin Todorov
Summary: The study examines the importance of manufacturing in economic growth by analyzing the impact of economic growth, energy intensity, and competitiveness index on manufacturing value added. It reveals an inverted U shape in the relative importance of manufacturing in the economy, with different turning points across countries and economic development stages. Energy intensity generally has a negative effect, while the Competitive Industrial Performance score has a positive impact with high magnitudes.
Article
Mathematics
Abdullah Aljumah, Tariq Ahamed Ahanger, Imdad Ullah
Summary: Unmanned aerial vehicles, drones, and IoT devices are increasingly being used for aerial surveying of restricted or inaccessible locations. This study proposes a blockchain-based method to ensure the safety and confidentiality of data collected by virtual circuit-based devices. The suggested technique employs pentatope-based elliptic curve encryption and secure hash algorithm (SHA) for anonymity in data storage.
Article
Engineering, Civil
Philipp Zissner, Paulo H. L. Rettore, Bruno P. Santos, Johannes F. Loevenich, Roberto Rigolin F. Lopes
Summary: This paper introduces DataFITS, an open-source framework that collects and fuses traffic-related data from various sources, creating a comprehensive dataset. The hypothesis that a heterogeneous data fusion framework can enhance information coverage and quality for traffic models was verified through two applications utilizing traffic estimation and incident classification models. DataFITS significantly increased road coverage by 137% and improved information quality for up to 40% of all roads through data fusion. Traffic estimation achieved an R-2 score of 0.91 using a polynomial regression model, while incident classification achieved 90% accuracy on binary tasks and around 80% on classifying three different types of incidents.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
(2023)
Article
Economics
Ryo Okui, Wendun Wang
Summary: This paper introduces a new model and estimation procedure for panel data to identify heterogeneous structural breaks. By developing a hybrid estimation procedure that combines fixed effects approach and adaptive group fused Lasso, it can consistently identify latent group structure and demonstrate good performance in finite samples.
JOURNAL OF ECONOMETRICS
(2021)
Article
Computer Science, Artificial Intelligence
Yinan Yu, Samuel Scheidegger, Jasmine Elliott, Asa Lofgren
Summary: This paper applies computational linguistics learning methods to the banking industry and climate change fields. It introduces a data-driven framework, climateBUG, that aims to detect latent information about how banks discuss their activities related to climate change using natural language processing (NLP).
EXPERT SYSTEMS WITH APPLICATIONS
(2024)
Article
Environmental Sciences
Alexey Penenko, Vladimir Penenko, Elena Tsvetova, Alexander Gochakov, Elza Pyanova, Viktoriia Konopleva
Summary: Air quality monitoring systems vary in composition and accuracy, with performance assessed by evaluating the accuracy of emission sources identified by data. In an inverse modeling approach, a source identification problem is transformed using a quasi-linear operator equation with a sensitivity operator. Numerical experiments in the Baikal region show that projecting to the orthogonal complement of the sensitivity operator's kernel provides the most accurate source identification results. Our contribution lies in developing and testing a set of tools based on sensitivity operators for analyzing diverse air quality monitoring systems.
Article
Ecology
Martin Jung
Summary: Most knowledge about species and habitats is not evenly distributed and biased in space, time, and taxonomic and functional knowledge. However, there is a large amount of biodiversity data available. The challenge is to effectively integrate the various sources of data, especially for species distribution models, in order to accurately address global challenges. This paper presents a modeling framework that integrates different data sources and introduces innovative concepts for creating more realistic and constrained spatial predictions. The ibis.iSDM R-package provides convenience functions and supports parameter transformations, data preparation, spatial-temporal projections, and ecological constraints.
ECOLOGICAL INFORMATICS
(2023)
Article
Economics
Jiti Gao, Fei Liu, Bin Peng, Yayi Yan
Summary: In this study, we examine binary response models for heterogeneous panel data with interactive fixed effects, allowing both the cross-sectional dimension and the temporal dimension to diverge. The proposed framework can be applied to practical applications such as predicting corporate failure probability and conducting credit rating analysis. The study establishes a link between maximum likelihood estimation and least squares approach, provides an information criterion for factor detection, and establishes the corresponding asymptotic theory. Moreover, extensive simulations and empirical analysis on stock returns sign prediction and portfolio analysis are conducted to validate the theoretical findings.
JOURNAL OF ECONOMETRICS
(2023)
Article
Construction & Building Technology
Sanam Dabirian, Mostafa M. Saad, Sadam Hussain, Sareh Peyman, Negarsadat Rahimi, U. Pilar Monsalvete Alvarez, Peter Yefi, Ursula Eicker
Summary: Globally, there is increasing interest in understanding and proposing sustainable energy transformations in cities. However, the complexity of urban areas and the lack of a well-structured framework for data modeling make engineering modeling and predicting changes challenging. This article presents a framework for developing urban data models connected to energy modeling tools, aiming to assess energy in the city context. The main contribution is an implementation methodology for processing, modeling, and capturing results, as well as populating existing data formats.
ENERGY AND BUILDINGS
(2023)