Projecte llegit
Títol: MACHINE LEARNING CLASSIFICATION OF THE GAIA 3rd DATA RELEASE BASED ON A RANDOM FOREST ALGORITHM
Estudiants que han llegit aquest projecte:
- LOPEZ MORENO, SERGIO (data lectura: 13-09-2021)
- Cerca aquest projecte a Bibliotècnica
Director/a: TORRES GIL, SANTIAGO
Departament: FIS
Títol: MACHINE LEARNING CLASSIFICATION OF THE GAIA 3rd DATA RELEASE BASED ON A RANDOM FOREST ALGORITHM
Data inici oferta: 03-02-2021 Data finalització oferta: 03-10-2021
Estudis d'assignació del projecte:
- MU AEROSPACE S&T 15
Tipus: Individual | |
Lloc de realització: EETAC | |
Nom del segon director/a (UPC): CARLES CANTERO MITJANS | |
Departament 2n director/a: | |
Paraules clau: | |
Machine Learning, Random Forest Algorithm, Gaia Space Mission, Simulation, Classification | |
Descripció del contingut i pla d'activitats: | |
Content:
The Gaia space mission has provided an unprecedented wealth of information for about nearly 1 billion stars of our Galaxy. As with other scientific missions nowadays, that huge amount of data must be efficiently handle. In this sense, the use of Artificial Intelligent techniques, such as the Machine Learning, becomes indispensable in such a big data frame. In particular, the Random Forest algorithm has been demonstrated to be an optimal tool in the analysis and classification of star populations. Main objectives: .- Extract all the available information from the recent Gaia EDR3 database concerning the white dwarf population. - Analyze the most significant variables and parameters of the sample in order to implement a supervised Random Forest algorithm. .- Classify the Gaia EDR3 white dwarf population and identified them in the different components of our Galaxy. .- Compare the results with previous classifications. .- Analyze the capability to improve the performance of the algorithm. .- Study the possibility to apply the algorithm to classify attending other properties of the sample, such as the spectral type. .- Summarize the results in a paper, specially showing the capability of the Random Forest algorithm to extract the maximum information in the classification of a population. |
|
Overview (resum en anglès): | |
A new Gaia data release, EDR3, has been available since the end of last year containing
a complete catalogue of nearly 2 billion stars. This huge wealth of information provided by this magnificent space astronomical mission needs to be analyzed and studied in detail. In particular, we have fixated on a specific type of star, named white dwarf, and we have studied to which population of the Galaxy they belong. This knowledge can help us to understand how our Galaxy was formed and also help to discover large scale Galactic events which are matter of debate. This project is focused on obtaining the most adequate possible classification of the white dwarf population among the three main components of our Galaxy (thin and thick disk and halo) by means of artificial intelligence techniques, and more specifically, machine learning. The algorithm chosen for our purpose has been the random forest algorithm, which, as demonstrated in previous works, has already produced very positive results when applied to the second Gaia data release. We reproduce the results previously obtained for the white dwarf 100 pc Gaia DR2 sample, and extend the analysis to the new EDR3 up to 500 pc. The number of white dwarf finally classified by our algorithm have increased from 10,000 in DR2 up to 80,000 in EDR3. Thus, we have managed to identify nearly 300 halo white dwarf candidates. |