Speaker
Description
A fundamental task of data analysis in many scientific fields is to determine the underlying causal relations between physical properties as well as the quantitative nature of these relations/laws. These laws are the fundamental building blocks of scientific models describing observable phenomena. Historically, causal methods were applied in the field of social sciences and economics (Pearl, 2000), where causal relations were investigated by means of interventions (manipulating and varying features of systems to see how systems react). However, since we can observe one single world and one single Universe we cannot use interventions for recovering causal models describing our data in disciplines such as astrophysics or climate sciences. It is therefore necessary to discover causal relations by analyzing statistical properties of purely observational data, a task known as causal discovery or causal structure learning.
In S. Di Gioia et al, 2023 (in preparation), in collaboration with R. Trotta, V. Acquaviva, F. Bucinca and A. Maller, we perform causal model discovery on simulated galaxy data, to better understand which galaxy and halo properties are the drivers of galaxy size, initially at redshift z = 0. In particular, we used a constraint-based structure learning algorithm, called kernel-PC, based on a Python parallel code developed by the author, and, as input data, the simulated galaxy catalog generated with the Santa Cruz semi-analytic model (SC-SAM). The SC-SAM was built on the merger trees extracted from the dark matter-only version of the TNG-100 hydro-dynamical simulation, which showed to describe successfully the full spectra of observed galaxy properties, from z=0 to z=3-4 (Gabrielpillai et al., 2022).
In my talk I will present the main results of this work, together with an overview of the most common algorithms to perform causal discovery, in the framework of Causal Graphical Models, focusing on their potential applicability to upcoming astronomical surveys. Future applications of this method include dimensionality reduction and Bayesian model discovery.