Extracting optimal information from upcoming cosmological surveys is a pressing task, for which a promising path to success is performing field-level inference with differentiable forward modeling. A key computational challenge in this approach is that it requires sampling a high-dimensional parameter space. In this talk I will present a new promising method to sample such large parameter...
Large-volume cosmological hydrodynamic simulations have become a primary tool to understand supermassive black holes (SMBHs), galaxies, and the large-scale structure of the Universe. However, current uncertainties in sub-grid models for core physical processes such as feedback from massive stars and SMBHs limit their predictive power and plausible use to extract information from extragalactic...
The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets using simulation-based augmentations....
Simulations of galaxy clusters that are well-matched to upcoming data sets are a key tool for addressing systematics (e.g., cluster mass inference) that limit current and future cluster-based cosmology constraints. However, most state-of-the-art simulations are too computationally intensive to produce multiple versions of relevant physics systematics. We present DeepSZSim, a lightweight...
Inflation remains one of the enigmas in fundamental physics. While it is difficult to distinguish different inflation models, information contained in primordial non-Gaussianity (PNG) offers a route to break the degeneracy. In galaxy surveys, the local type PNG is usually probed by measuring the scale-dependent bias in the power spectrum. We introduce a new approach to measure the local type...
In recent years, non-Gaussian statistics have been growing in popularity as powerful tools for efficiently extracting cosmological information from current weak lensing data. Their use can improve constraints on cosmological parameters over standard two-point statistics, can additionally help discriminate between general relativity and modified gravity theories, and can help to self-calibrate...
In the era of wide-field surveys and big data in astronomy, the SNAD team (https://snad.space) is exploiting the potential of modern datasets for discovery new, unforeseen, or rare astrophysical phenomena. The SNAD pipeline was built under the hypothesis that, although automatic learning algorithms have a crucial role to play in this task, the scientific discovery is only completely realized...
We present DE-VAE, a variational autoencoder (VAE) architecture to search for a compressed representation of beyond-ΛCDM models. We train DE-VAE on matter power spectra boosts generated at wavenumbers k ∈ (0.01 − 2.5) h/Mpc and at four redshift values z ∈ (0.1, 0.48, 0.78, 1.5) for a dynamic dark energy (DE) model with two extra parameters describing an evolving DE equation of state. The...
Observations of the Cosmic Microwave Background (CMB) radiation have made significant contributions to our understanding of cosmology. While temperature observations of the CMB have greatly advanced our knowledge, the next frontier lies in detecting the elusive B-modes and obtaining precise reconstructions of the CMB's polarized signal in general. In anticipation of proposed and upcoming CMB...
Symbolic Regression is a data-driven method that searches the space of mathematical equations with the goal of finding the best analytical representation of a given dataset. It is a very powerful tool, which enables the emergence of underlying behavior governing the data generation process. Furthermore, in the case of physical equations, obtaining an analytical form adds a layer of...
While the benefits of machine learning for data analysis are widely discussed, I will argue that machine learning has also the great potential to inform us on interesting directions in new physics. Indeed, the current approach to solve the big questions of cosmology today is to constrain a wide range of cosmological models (such as cosmic inflation or modified gravity models), which is costly....
Symbolic Regression is the study of algorithms that automate the search for analytic expressions that fit data. With new advances in deep learning there has been much renewed interest in such approaches, yet efforts have not been focused on physics, where we have important additional constraints due to the units associated with our data.
I will present Φ-SO, a Physical Symbolic Optimization...
Upcoming photometric surveys such as the Legacy Survey of Space and Time (LSST) will image billions of galaxies, an amount required for extracting the faint weak lensing signal at a large range of cosmological distances. The combination of depth and area coverage of the imagery will be unprecedented ($r \sim 27.5$, $\sim20\,000\,\text{deg}^2$), and processing it will be fraught with many...
Data compression to informative summaries is essential for modern data analysis. Neural regression is a popular simulation-based technique for mapping data to parameters as summaries over a prior, but is usually agnostic to how uncertainties in information geometry, or data-summary relationship, changes over parameter space. We present Fishnets, a general simulation-based, neural compression...
Studying Active Galactic Nuclei (AGN) is crucial to understand processes regarding birth and evolution of Super-Massive Black Holes and their connection with star formation and galaxy evolution. However, few AGN have been identified in the EoR (z > 6) making it difficult to study their properties. In particular, a very small fraction of these AGN have been radio detected. Simulations and...
Machine learning (ML) is having a transformative impact on astrophysics. The field is starting to mature, where we are moving beyond the naive application of off-the-shelf, black-box ML models towards approaches where ML is an integral component in a larger, principled analysis methodology. Furthermore, not only are astrophysical analyses benefiting from the use of ML, but ML models...
The matter power spectrum of cosmology, P(k), is of fundamental importance in cosmological analyses, yet solving the Boltzmann equations can be computationally prohibitive if required several thousand times, e.g. in a MCMC. Emulators for P(k) as a function of cosmology have therefore become popular, whether they be neural network or Gaussian process based. Yet one of the oldest emulators we...
A significant statement regarding the existence of primordial non-Gaussianity stands as one of the key objectives of next-generation galaxy surveys. However, traditional methods are burdened by a variety of issues, such as the handling of unknown systematic effects, the combination of multiple probes of primordial non-Gaussianity, and the capturing of information beyond the largest scales in...
Simulation-based inference (SBI) building on machine-learnt density estimation and massive data compression has the potential to become the method of choice for analysing large, complex datasets in survey cosmology. I will present recent work that implements every ingredient of the current Kilo-Degree Survey weak lensing analysis into an SBI framework which runs on similar timescales as a...
Normalizing Flows (NF) are Generative models which transform a simple prior distribution into the desired target. They however require the design of an invertible mapping whose Jacobian determinant has to be computable. Recently introduced, Neural Hamiltonian Flows (NHF) are Hamiltonian dynamics-based flows, which are continuous, volume-preserving and invertible and thus make for natural...
The cosmic web, or Large-Scale Structure (LSS) is the massive spiderweb- like arrangement of galaxy clusters and the dark matter holding them together under gravity. The lumpy, spindly universe we see today evolved from a much smoother, infant universe. How this structure formed and the information embedded within is considered one of the “Holy Grails” of modern cosmology, and might hold the...
Knowledge of the primordial matter density field from which the present non-linear observations formed is of fundamental importance for cosmology, as it contains an immense wealth of information about the physics, evolution, and initial conditions of the universe. Reconstructing this density field from the galaxy survey data is a notoriously difficult task, requiring sophisticated statistical...
Modern cosmological inference typically relies on likelihood expressions and covariance estimations, which can become inaccurate and cumbersome depending on the scales and summary statistics under consideration. Simulation-based inference, in contrast, does not require an analytical form for the likelihood but only a prior distribution and a simulator, thereby naturally circumventing these...
Unlocking the full potential of next-generation cosmological data requires navigating the balance between sophisticated physics models and computational demands. We propose a solution by introducing a machine learning-based field-level emulator within the HMC-based Bayesian Origin Reconstruction from Galaxies (BORG) inference algorithm. The emulator, an extension of the first-order Lagrangian...
Model misspecification is a long-standing problem for Bayesian inference: when the model differs from the actual data-generating process, posteriors tend to be biased and/or overly concentrated. This issue is particularly critical for cosmological data analysis in the presence of systematic effects. I will briefly review state-of-the-art approaches based on an explicit field-level likelihood,...
State of the art astronomical simulations have provided datasets which enabled the training of novel deep learning techniques for constraining cosmological parameters. However, differences in subgrid physics implementation and numerical approximations among simulation suites lead to differences in simulated datasets, which pose a hard challenge when trying to generalize across diverse data...
Detection, deblending, and parameter inference for large galaxy surveys have been and still are performed with simplified parametric models, such as bulge-disk or single Sersic profiles. The complex structure of galaxies, revealed by higher resolution imaging data, such as those gathered by HST or, in the future, by Euclid and Roman, makes these simplifying assumptions problematic. Biases...
Cosmic voids identified in the spatial distribution of galaxies provide complementary information to two-point statistics. In particular, constraints on the neutrino mass sum, $\sum m_\nu$, promise to benefit from the inclusion of void statistics. We perform inference on the CMASS NGC sample of SDSS-III/BOSS with the aim of constraining $\sum m_\nu$. We utilize the void size function, the...
Cosmic voids are the largest and most underdense structures in the Universe. Their properties have been shown to encode precious information about the laws and constituents of the Universe. We show that machine learning techniques can unlock the information in void features for cosmological parameter inference. Using thousands of void catalogs from the GIGANTES dataset, we explore three...
I present a novel, general-purpose Python-based framework for scalable and efficient statistical inference by means of hierarchical modelling and simulation-based inference.
The framework is built combining the JAX and NumPyro libraries. The combination of differentiable and probabilistic programming offers the benefits of automatic differentiation, XLA optimization, and the ability to...
Optimal extraction of the non-Gaussian information encoded in the Large-Scale Structure (LSS) of the universe lies at the forefront of modern precision cosmology. We propose achieving this task through the use of the Wavelet Scattering Transform (WST), which subjects an input field to a layer of non-linear transformations that are sensitive to non-Gaussianities through a generated set of WST...
The Lyman-$\alpha$ forest presents a unique opportunity to study the distribution of matter in the high-redshift universe and extract precise constraints on the nature of dark matter, neutrino masses, and other extensions to the ΛCDM model. However, accurately interpreting this observable requires precise modeling of the thermal and ionization state of the intergalactic medium, which often...
Rapid strides are currently being made in the field of artificial intelligence using Transformer-based models like Large Language Models (LLMs). The potential of these methods for creating a single, large, versatile model in astronomy has not yet been explored except for some uses of the basic component of Transformer – the attention mechanism. In this talk, we will talk about a framework for...
Field level likelihood-free inference is one of the brand new methods to extract cosmological information, over passing inferences of the usual and time-demanding traditional methods. In this work we train different machine learning models, without any cut on scale, considering a sequence of distinct selections on galaxy catalogs from the CAMELS suite in order to recover the main challenges of...
Upcoming cosmological weak lensing surveys are expected to constrain cosmological parameters with unprecedented precision. In preparation for these surveys, large simulations with realistic galaxy populations are required to test and validate analysis pipelines. However, these simulations are computationally very costly -- and at the volumes and resolutions demanded by upcoming cosmological...
New large-scale astronomical surveys such as the Vera Rubin Observatory's Legacy Survey of Space and Time (LSST) have the potential to revolutionize transient astronomy, providing opportunities to discover entirely new classes of transients while also enabling a deeper understanding of known supernovae. LSST is expected to observe over 10 million transient alerts every night, over an order of...
Amidst the era of astronomical surveys that collect massive datasets, neural networks have emerged as powerful tools to address the challenge of exploring and mining these enormous volumes of information from our sky. Among the obstacles in the study of these surveys is the identification of exoplanetary signatures in the photometric light curves. In this presentation, we will discuss how...
Weak gravitational lensing is an excellent quantifier of the growth of structure in our universe, as the distortion of galaxy ellipticities measures the spatial fluctuations in the matter field density along a line of sight. Traditional two-point statistical analyses of weak lensing only capture Gaussian features of the observable field, hence leaking information from smaller scales where...
Most applications of ML in astronomy pertain to classification, regression, or emulation, however, ML has the potential to address whole new categories of problems in astronomical big data. This presentation uses ML in a statistically principled approach to observing strategy selection, which encompasses the frequency and duration of visits to each portion of the sky and impacts the degree to...
Galaxy mergers are unique in their ability to transform the morphological, kinematic, and intrinsic characteristics of galaxies on short timescales. The redistribution of angular momentum brought on by a merger can revive, enhance, or truncate star formation, trigger or boost the accretion rate of an AGN, and fundamentally alter the evolutionary trajectory of a galaxy.
These effects are well...
Deep learning is data-hungry; we typically need thousands to millions of labelled examples to train effective supervised models. Gathering these labels in citizen science projects like Galaxy Zoo can take years, delaying the science return of new surveys. In this talk, I’ll describe how we’re combining simple techniques to build better galaxy morphology models with fewer labels.
First...
The CDM model is in remarkable agreement with large-scale observations but small-scale evidence remains scarce. Studying substructure through strong gravitational lensing can fill in the gap on small scales. In the upcoming years, we expect the number of observed strong lenses to increase by several orders of magnitude from ongoing and future surveys. Machine learning has the potential to...
In our previews works, e.g., arXiv:2209.10333, deep learning techniques have succeeded in estimating galaxy cluster masses in observations of Sunyaev Zel'dovich maps, e.g. in the Planck PSZ2 catalog and mass radial profiles from SZ mock maps. In the next step, we explore inferring 2D mass density maps from mock observations of SZ, X-ray and stars using THE THREE HUNDRED (The300) cosmological...
The halo mass function describes the abundance of dark matter halos as a function of halo mass and depends sensitively on the cosmological model. Accurately modelling the halo mass function for a range of cosmological models will enable forthcoming surveys such as Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST) and Euclid to place tight constraints on cosmological...
The data from the new generation of cosmological surveys, such as DESI (DESI Collaboration et al. 2022), have already started taking data, and even more will arrive with Euclid (Laureijs et al. 2011) and the LSST of Vera Rubin Observatory (Ivezić et al. 2019) starting soon. At the same time, the classical methods of analysing RSD and BAO with 2-point statistics provide less strenuous...
Conventional cosmic shear analyses, relying on two-point functions, do not have access to the non-Gaussian information present at the full field level, thus limiting our ability to constrain with precision cosmological parameters. Performing Full-Field inference is in contrast an optimal way to extract all available cosmological information, and it can be achieved with two widely different...
Accurately describing the relation between the dark matter over-density and the observable galaxy field is one of the significant challenges to analyzing cosmic structures with next-generation galaxy surveys. Current galaxy bias models are either inaccurate or computationally too expensive to be used for efficient inference of small-scale information.
In this talk, I will present a hybrid...
Whether it's calibrating our analytical predictions on small scales, or devising all new probes beyond standard two-point functions, the road to precision cosmology is paved with numerical simulations. The breadth of the parameter space we must simulate, and the associated computational cost, however, present a serious challenge. Fortunately, emulators based on Gaussian processes and neural...
We created an ML pipeline able to efficiently detect craters in a large dataset of georeferenced images. We used it to create a detailed database of craters on rocky bodies in the solar system including Mars. The Mars crater database was of sufficient detail to enable us to determine the likely origin of a number of meteorites that we have collected on Earth. As a consequence, it is possible...
Photometric redshifts and strong lensing are both integral for stellar physics and cosmological studies with the Rubin Observatory Legacy Survey of Space and Time (LSST), which will provide billions of galaxy images in six filters, including on the order of 100,000 galaxy-scale lenses. To efficiently exploit this huge amount of data, machine learning is a promising technique that leads to an...
A fundamental task of data analysis in many scientific fields is to determine the underlying causal relations between physical properties as well as the quantitative nature of these relations/laws. These laws are the fundamental building blocks of scientific models describing observable phenomena. Historically, causal methods were applied in the field of social sciences and economics (Pearl,...
The interstellar medium (ISM) is an important actor in the evolution of galaxies and provides key diagnostics of their activity, masses and evolutionary state. However, surveys of the atomic and molecular gas, both in the Milky Way and in external galaxies, produce huge position-position-velocity data cubes over wide fields of view with varying signal-to-noise ratios. Besides, inferring the...
Strong gravitational lensing has become one of the most important tools for investigating the nature of dark matter (DM). This is because it can be used to detect dark matter subhaloes in the environments of galaxies. The existence of a large number of these subhaloes is a key prediction of the most popular DM model, cold dark matter (CDM). With a technique called gravitational imaging, the...
Convolutional neural networks (CNNs) are now the standard tool for finding strong gravitational lenses in imaging surveys. Upcoming surveys like Euclid will rely completely on CNNs for strong lens finding but the selection function of these CNNs has not yet been studied. This is representative of the large gap in the literature in the field of machine learning applied to astronomy. Biases in...
The Gaia Collaboration's 3rd data release (DR3) provides comprehensive information including photometry and kinematics on more than a billion stars across the entire sky up to $G\approx21$, encompassing approximately 220 million stars with supplementary low-resolution spectra ($G<17.6$). These spectra offer derived valuable stellar properties like [Fe/H], $\log g$, and $T_{eff}$, serving as...
We present a novel methodology for hybrid simulation-based inference (HySBI) for large scale structure analysis in cosmology. Our approach combines perturbative analysis on the large scales which can be modeled analytically from first principles, with simulation based implicit inference (SBI) on small, non-linear scales that cannot be modeled analytically. As a proof-of-principle, we apply our...
The main goal of cosmology is to perform parameter inference and model selection, from astronomical observations. But, uniquely, it is a field that has to do this limited to a single experiment, the Universe we live in. With compelling existing and upcoming cosmological surveys, we need to leverage state-of-the-art inference techniques to extract as much information as possible from our data....
In this talk I will present the first cosmological constraints from only the observed photometry of galaxies. Villaescusa-Navarro et al. (2022) recently demonstrated that the internal physical properties of a single galaxy contain a significant amount of cosmological information. These physical properties, however, cannot be directly measured from observations. I will present how we can go...
We present cosmological constraints from the Subaru Hyper Suprime-Cam (HSC) first-year weak lensing shear catalogue using convolutional neural networks (CNNs) and conventional summary statistics. We crop 19 $3\times3$deg$^2$ sub-fields from the first-year area, divide the galaxies with redshift $0.3< z< 1.5$ into four equally-spaced redshift bins, and perform tomographic analyses. We develop a...
In this era of large and complex astronomical survey data, interpreting, validating, and comparing inference techniques becomes increasingly difficult. This is particularly critical for emerging inference methods like Simulation-Based Inference (SBI), which offer significant speedup potential and posterior modeling flexibility, especially when deep learning is incorporated. We present a study...
Visual inspections of the first optical rest-frame images from JWST have indicated a surprisingly high fraction of disk galaxies at high redshifts. Here, we alternatively apply self-supervised machine learning to explore the morphological diversity at $z \geq 3$.
Our proposed data-driven representation scheme of galaxy morphologies, calibrated on mock images from the TNG50 simulation, is...
Likelihood-free inference provides a rigorous way to preform Bayesian analysis using forward simulations only. It allows us to account for complex physical processes and observational effects in forward simulations. In this work, we use Density-Estimation Likelihood-Free Inference (DELFI) to perform a likelihood-free forward modelling for Bayesian cosmological inference, which uses the...
Weak Lensing Galaxy Cluster Masses are an important observable to test the cosmological standard model and modified gravity models. However cluster cosmology in optical surveys is challenged by sources of systematics like photometric redshift error. We use combinatorial optimization schemes and fast Machine Learning assisted model evaluation to select galaxy source samples that minimize the...
The ΛCDM cosmological model has been very successful, but cosmological data indicate that extensions are still highly motivated. Past explorations of extensions have largely been restricted to adding a small number of parameters to models of fixed mathematical form. Neural networks can account for more flexible model extensions and can capture unknown physics at the level of differential...
Simulations have revealed correlations between the properties of dark matter halos and their environment, made visible by the galaxies which inherit these connections through their host halos. We define a measure of the environment based on the location and observable properties of a galaxy’s nearest neighbors in order to capture the broad information content available in the environment. We...
Machine learning is becoming an essential component of the science operations processing pipelines of modern astronomical surveys. Space missions such as NASA’s Transiting Exoplanet Survey Satellite (TESS) are observing millions of stars each month. In order to select the relevant targets for our science cases from these large numbers of observations, we need highly automated and efficient...
Joint observations in electromagnetic and gravitational waves shed light on the physics of objects and surrounding environments with extreme gravity that are otherwise unreachable via siloed observations in each messenger. However, such detections remain challenging due to the rapid and faint nature of counterparts. Protocols for discovery and inference still rely on human experts manually...
When measuring photon counts from incoming sky fluxes, observatories imprint nuisance effects on the data that must be accurately removed. Some detector effects can be easily inverted, while others are not trivially invertible such as the point spread function and shot noise. Using information field theory and Bayes' theorem, we infer the posterior mean and uncertainty for the sky flux. This...
In this talk, we explore the use of Generative Topographic Mapping (GTM) as an alternative to self-organizing maps (SOM) for deriving accurate mean redshift estimates for cosmic shear surveys. We delve into the advantages of the GTM probabilistic modeling of the complex relationships within the data, enabling robust estimation of redshifts. Through comparative analysis, we showcase the...
During the Epoch of reionisation, the intergalactic medium is reionised by the UV radiation from the first generation of stars and galaxies. One tracer of the process is the 21 cm line of hydrogen that will be observed by the Square Kilometre Array (SKA) at low frequencies, thus imaging the distribution of ionised and neutral regions and their evolution.
To prepare for these upcoming...
We present an innovative clustering method, Significance Mode Analysis (SigMA), to extract co-spatial and co-moving stellar populations from large-scale surveys such as ESA Gaia. The method studies the topological properties of the density field in the multidimensional phase space. The set of critical points in the density field gives rise to the cluster tree, a hierarchical structure in...
As upcoming SKA-scale surveys open new regimes of observation, it is expected that some of the objects they detect will be "unknown unknowns": entirely novel classes of sources which are outside of our current understanding of astrophysics. The discovery of these sources has the potential to introduce new fields of study, as it did for e.g. pulsars, but relies upon us being able to identify...
How can we gain physical intuition in real-world datasets using `black-box' machine learning? In this talk, I will discuss how ordered component analyses can be used to seperate, identify, and understand physical signals in astronomical datasets. We introduce Information Ordered Bottlenecks (IOBs), a neural layer designed to adaptively compress data into latent variables organized by...
We are on the precipice of the Big Data gravitational wave (GW) era. Pairs of stellar-mass black holes (BHs) or neutron stars (NSs) across our vast Universe occasionally merge, unleashing bursts of gravitational waves that we can observe here on Earth since their first detection in 2015. Over the next few years, the population of detected mergers will rapidly increase from a few hundred to...