4.3 Article

Identifying Outliers in Astronomical Images with Unsupervised Machine Learning

Journal

RESEARCH IN ASTRONOMY AND ASTROPHYSICS
Volume 22, Issue 8, Pages -

Publisher

NATL ASTRONOMICAL OBSERVATORIES, CHIN ACAD SCIENCES
DOI: 10.1088/1674-4527/ac7386

Keywords

Galaxy; Physical Data and Processes; Galaxy: fundamental parameters

Funding

  1. China Manned Space Project [CMS-CSST-2021-A01, CMS-CSST-2021-B05]
  2. Jiangsu Key Laboratory of Big Data Security and Intelligent Processing

Ask authors/readers for more resources

Studying astronomical outliers is crucial for discovering previously unknown knowledge. However, mining rare and unexpected targets from vast amounts of data is a significant challenge. In this study, unsupervised machine learning approaches were used to identify outliers in galaxy image data, leading to promising results.
Astronomical outliers, such as unusual, rare or unknown types of astronomical objects or phenomena, constantly lead to the discovery of genuinely unforeseen knowledge in astronomy. More unpredictable outliers will be uncovered in principle with the increment of the coverage and quality of upcoming survey data. However, it is a severe challenge to mine rare and unexpected targets from enormous data with human inspection due to a significant workload. Supervised learning is also unsuitable for this purpose because designing proper training sets for unanticipated signals is unworkable. Motivated by these challenges, we adopt unsupervised machine learning approaches to identify outliers in the data of galaxy images to explore the paths for detecting astronomical outliers. For comparison, we construct three methods, which are built upon the k-nearest neighbors (KNN), Convolutional Auto-Encoder (CAE) + KNN, and CAE + KNN + Attention Mechanism (attCAE_KNN) separately. Testing sets are created based on the Galaxy Zoo image data published online to evaluate the performance of the above methods. Results show that attCAE_KNN achieves the best recall (78%), which is 53% higher than the classical KNN method and 22% higher than CAE+KNN. The efficiency of attCAE_KNN (10 minutes) is also superior to KNN (4 h) and equal to CAE+KNN (10 minutes) for accomplishing the same task. Thus, we believe that it is feasible to detect astronomical outliers in the data of galaxy images in an unsupervised manner. Next, we will apply attCAE_KNN to available survey data sets to assess its applicability and reliability.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available