November 27, 2023 to December 1, 2023
Dual node
Europe/Paris timezone

Classifying X-ray sources with Supervised Machine Learning: Challenges and Solutions

Not scheduled
Dual node

Dual node

IAP (Paris) & CCA/Flatiron (New York) IAP 98bis Boulevard Arago 75014 Paris FRANCE CCA/Flatiron 5th Avenue New York (NY) USA
Poster Online Posters


Mr Hui Yang (The George Washington University)


Millions of serendipitous X-ray sources have been discovered by modern X-ray observatories like Chandra, XMM-Newton, and recently eROSITA. For the vast majority of Galactic X-ray sources the nature is unknown. We have developed a multiwavelength machine-learning (ML) classification pipeline (MUWCLASS) that uses the random forest algorithm to quickly perform classifications of a large number of sources to learn about their astrophysical nature. This approach enables quick follow-up observations of interesting sources and population studies of various kinds. MUWCLASS has been applied to Chandra Source Catalog and XMM-DR13 catalog, augmented with multiwavelength properties obtained by cross-matching to surveys performed at other wavelengths. In this talk, I will demonstrate and discuss some common obstacles encountered in supervised ML (e.g., biases between training data and unclassified data, imbalanced training data, missing values, high-dimensionality) in the context of X-ray source classification. I will also present recent developments we have implemented to address some of those issues (e.g., astrophysically-motivated oversampling, accounting for feature uncertainties and absorption/extinction biases, probabilistic cross-matching and probabilistic class inference).

Primary author

Mr Hui Yang (The George Washington University)


Dr Oleg Kargaltsev (The George Washington University) Dr Jeremy Hare (NASA GSFC) Mr Steven Chen (The George Washington University) Dr Igor Volkov (Whitespace) Dr Blagoy Rangelov (Texas State University) Mr Yichao Lin (The George Washington University)

Presentation materials