ML-IAP/CCA-2023

Name: ML-IAP/CCA-2023
Start: 2023-11-27T14:00:00+01:00
End: 2023-12-01T18:30:00+01:00
Location: Dual node

November 27, 2023 to December 1, 2023

Dual node

Europe/Paris timezone

Contact

ml2023@iap.fr

Doing More With Less; Label-Efficient Learning for Euclid and Rubin

Nov 29, 2023, 3:31 PM

15m

Dual node

IAP (Paris) & CCA/Flatiron (New York) IAP 98bis Boulevard Arago 75014 Paris FRANCE CCA/Flatiron 5th Avenue New York (NY) USA

Talk New York Contributed talks

Mike Walmsley (University of Toronto)

Deep learning is data-hungry; we typically need thousands to millions of labelled examples to train effective supervised models. Gathering these labels in citizen science projects like Galaxy Zoo can take years, delaying the science return of new surveys. In this talk, I’ll describe how we’re combining simple techniques to build better galaxy morphology models with fewer labels.

First [1], we’re using large-scale pretraining with supervised and self-supervised learning to reduce the number of labelled galaxy images needed to train effective models. For example, using self-supervised learning to pretrain on unlabelled Radio Galaxy Zoo images halves our error rate at distinguishing FRI and FRII radio galaxies in a separate dataset.

Second [2], we’re continually retraining our models to prioritise the most helpful galaxies for volunteers to label. Our probabilistic models filter out galaxies they can confidently classify, leaving volunteers able to focus on challenging and interesting galaxies. We used this to measure the morphology of every bright extended galaxy in HSC-Wide in weeks rather than years.

Third [3], we’re using natural language processing to capture radio astronomy classes (like “FRI” or “NAT”) through plain English words (like “hourglass”) that volunteers use to discuss galaxies. These words reveal which visual features are shared between astronomical classes, and, when presented as classification options, let volunteers classify complex astronomical classes in an intuitive way.

We are now preparing to apply these three techniques - pretraining, active learning, and natural language labels - to provide day-one galaxy morphology measurements for Euclid DR1.

Mike Walmsley (University of Toronto)

06_Walmsley.pdf

ML-IAP/CCA-2023

Contact

Doing More With Less; Label-Efficient Learning for Euclid and Rubin

Dual node

Speaker

Description

Primary author

Presentation materials