November 27, 2023 to December 1, 2023
Dual node
Europe/Paris timezone

Systematic biases in machine learning and their impact on astronomy research

Not scheduled
Dual node

Dual node

IAP (Paris) & CCA/Flatiron (New York) IAP 98bis Boulevard Arago 75014 Paris FRANCE CCA/Flatiron 5th Avenue New York (NY) USA
Poster Online Posters


Dr Lior Shamir (Kansas State University)


Machine learning, and in particular deep neural networks (DNNs), have become primary tools for automatic annotation and analysis of astronomical data. Given that astronomy have been becoming increasingly more dependent on Earth-based and space-based digital sky surveys generating vast pipelines of astronomical data, a large number of DNN-based solutions have already been proposed and applied. But although DNNs are accurate and effective, they also introduce biases that are difficult to notice, profile, and control. Here I describe simple experiments that show that even properly trained DNNs with no apparent flaws in the design process can lead to small but consistent biases that are very difficult to notice, and can therefore be viewed incorrectly as new discoveries. The experiments show that these biases exist in image data, as well as photometry and spectroscopy data when the data are analyzed by machine learning algorithms. Such biases can lead to unusual patterns that can be observed in catalogs and data products prepared with the involvement of machine learning. These biases are often difficult to notice, and their presence is not necessarily expected by unsuspecting data users. Therefore, such biases might lead to incorrect conclusions about astronomy, while they are in fact properties of the data annotation algorithms. Therefore, catalogs and data products generated with the involvement of DNNs should be used with caution, and consumers of such catalogs must be fully aware of the vulnerability of DNNs to complex biases.

Primary author

Dr Lior Shamir (Kansas State University)

Presentation materials