CBL - Campus del Baix Llobregat

Projecte llegit

Títol: MACHINE LEARNING CLASSIFICATION OF THE GAIA 3rd DATA RELEASE BASED ON A RANDOM FOREST ALGORITHM


Estudiants que han llegit aquest projecte:


Director/a: TORRES GIL, SANTIAGO

Departament: FIS

Títol: MACHINE LEARNING CLASSIFICATION OF THE GAIA 3rd DATA RELEASE BASED ON A RANDOM FOREST ALGORITHM

Data inici oferta: 03-02-2021     Data finalització oferta: 03-10-2021



Estudis d'assignació del projecte:
    MU AEROSPACE S&T 15
Tipus: Individual
 
Lloc de realització: EETAC
 
Nom del segon director/a (UPC): CARLES CANTERO MITJANS
Departament 2n director/a:
 
Paraules clau:
Machine Learning, Random Forest Algorithm, Gaia Space Mission, Simulation, Classification
 
Descripció del contingut i pla d'activitats:
Content:
The Gaia space mission has provided an unprecedented wealth of
information for about nearly 1 billion stars of our Galaxy. As
with other scientific missions nowadays, that huge amount of
data must be efficiently handle. In this sense, the use of
Artificial Intelligent techniques, such as the Machine Learning,
becomes indispensable in such a big data frame. In particular,
the Random Forest algorithm has been demonstrated to be an
optimal tool in the analysis and classification of star
populations.


Main objectives:
.- Extract all the available information from the recent Gaia
EDR3 database concerning the white dwarf population.
- Analyze the most significant variables and parameters of the
sample in order to implement a supervised Random Forest
algorithm.
.- Classify the Gaia EDR3 white dwarf population and identified
them in the different components of our Galaxy.
.- Compare the results with previous classifications.
.- Analyze the capability to improve the performance of the
algorithm.
.- Study the possibility to apply the algorithm to classify
attending other properties of the sample, such as the spectral
type.
.- Summarize the results in a paper, specially showing the
capability of the Random Forest algorithm to extract the maximum
information in the classification of a population.
 
Overview (resum en anglès):
A new Gaia data release, EDR3, has been available since the end of last year containing
a complete catalogue of nearly 2 billion stars. This huge wealth of information provided by
this magnificent space astronomical mission needs to be analyzed and studied in detail.
In particular, we have fixated on a specific type of star, named white dwarf, and we have
studied to which population of the Galaxy they belong. This knowledge can help us to
understand how our Galaxy was formed and also help to discover large scale Galactic
events which are matter of debate.
This project is focused on obtaining the most adequate possible classification of the white
dwarf population among the three main components of our Galaxy (thin and thick disk
and halo) by means of artificial intelligence techniques, and more specifically, machine
learning. The algorithm chosen for our purpose has been the random forest algorithm,
which, as demonstrated in previous works, has already produced very positive results
when applied to the second Gaia data release.
We reproduce the results previously obtained for the white dwarf 100 pc Gaia DR2 sample,
and extend the analysis to the new EDR3 up to 500 pc. The number of white dwarf finally
classified by our algorithm have increased from 10,000 in DR2 up to 80,000 in EDR3.
Thus, we have managed to identify nearly 300 halo white dwarf candidates.


© CBLTIC Campus del Baix Llobregat - UPC