Galaxy Legacy Models

SLAC scientists are developing an array of machine-learning models to perform image restoration tasks on high-resolution galaxy images. These models will allow them to identify and classify the various components of galaxy images in a much more efficient way.

GGL reveals the statistical properties of dark matter halos in which lens galaxies reside. This makes it a sensitive test for different models of galaxy formation.

Overview of the Galaxy Legacy Models

The Galaxy Legacy Models rely on deep learning to process celestial object images and generate galaxy image representations. They employ a multitask training strategy [36, 37] that automatically adjusts the training proportion of each downstream task based on its performance during training. For instance, a downstream task that performs poorly during training will be relegated to the back of the queue. This enables models to learn a more accurate image representation, and it also helps to reduce training variance between downstream tasks.

The model’s ML classification module performs downstream tasks using the Galaxy Zoo image cutouts from DESI Legacy Imaging Surveys DR9. It is trained to classify galaxies into distinct classes, including ellipticals and spirals, by introducing two fully connected layers following the LVM encoder. It is further tasked with identifying specific features of each galaxy type by adding another layer that computes the differences between pixels in each galaxy image.

This pixel-by-pixel identification capability is important because it goes beyond the traditional bulge and disc decomposition and may help to understand the evolution of galaxy structure. It can also identify a range of sub-structures such as dust lanes and strong lensing systems, and provide clues to the formation of galactic material. In this respect, the HITL outperforms competing methods on these specific identification tasks. Moreover, it achieves these results with relatively few prompts.

Description of the Galaxy Legacy Models

The Galaxy Legacy Models provide a foundation model for astronomical vision tasks such as galaxy classification, image restoration, object detection, and parameter extraction. These models use a multitask learning approach where each task is trained with a different loss function, such as cross-entropy and mean squared error (MSE).

The models are used to classify galaxies from DESI Legacy Imaging Surveys. They also perform several downstream tasks, such as object detection and morphological classification, which are important for understanding how galaxies form and evolve over the Universe’s 13.8 billion-year history.

In addition, the models predict the responses of volunteers to a given question from the Galaxy Zoo DESI project. This information can be useful for assessing the quality of results and to help make informed decisions about future surveys.

This release includes an updated version of the model that incorporates the latest data from the DESI Legacy Surveys. In particular, this version enables the model to better predict the fraction of volunteers selecting each answer from a given question. The retraining procedure has also been modified to take into account the fact that the sample of galaxies in the Galaxy Zoo DESI project contains both active and passive groups of galaxies. In the past, these groups were treated as separate classes, but this new model treats them as part of a single population. This change should result in a more accurate prediction of the observed galaxy number counts, especially at the faint end.

Description of the AstroCLIP Model

AstroCLIP is a self-supervised model that trains image and spectrum encoders independently and then aligns them with contrastive learning. This translates the two modalities into the same latent space, allowing for cross-modal processing. In this way, the model can handle a wide variety of downstream tasks without the need for fine-tuning for each modality.

For example, on the task of photometric redshift estimation—determining the distance of galaxies from their observed properties—AstroCLIP achieves performance comparable to a supervised baseline using just a single-hidden layer MLP regression. It also demonstrates strong performance on physical property prediction, including stellar mass, age, metallicity, and specific star formation rate. It even performs well on the task of galaxy morphology classification—categorizing galaxies into spirals or ellipticals based on their structural characteristics—without any additional task-specific training.

In addition, the model shows impressive results on mm-wavelength source number counts from the N2CLS GOODS-N and COSMOS surveys. In particular, it can account for the influence of both instrument noise and astrophysical clustering on the measured fluxes. This enables it to produce robust, repeatable results that are not influenced by the presence of hot or cold dust in the system, as has been suggested in previous studies. This is important since the presence of cold dust can lead to false positives, leading to underestimating the true source numbers at mm wavelengths.

Conclusions

The state of the numerical art has advanced in recent years to self-consistent models of galaxy mergers through the resulting merger remnants Barnes & Hernquist (1992). These models have shown that the dynamics of galaxies vary somewhat during major mergers, with dynamically hot halo components merging more rapidly than disk and stellar components. The models have also revealed that a fraction of the gas in the merger remnant falls out of extended tidal structures and into a small volume within the remnant center, potentially fueling ULIRG super-starbursts.

However, these models do not incorporate a large enough sample of galaxies for the statistical analysis of their characteristics that we want to do (e.g., determining the number counts of rare galaxy subclasses). They also do not incorporate the cosmological parameters required for the lensing model.

As a result, we are working on a new modeling technique that will allow us to extract the lensing signal from deep galaxy representations. Our approach uses a deep learning algorithm to compress galaxy images into feature vectors, and then reconstruct the original galaxy image using these vectors. This approach is an alternative to manual classification methods, which are laborious and sensitive to noise and other factors. Our first attempts at implementing this technique on the DESI Legacy Imaging Surveys DR9 data set have yielded promising results, though the full scope of the method is still under development.

Scroll to Top