CBL - Campus del Baix Llobregat

Projecte llegit

Títol: IMPLEMENTATION OF A RANDOM FOREST MACHINE LEARNING ALGORITHM IN THE CONTEXT OF GAIA SPACE ASTROMETRY MISSION


Estudiants que han llegit aquest projecte:


Director/a: TORRES GIL, SANTIAGO

Departament: FIS

Títol: IMPLEMENTATION OF A RANDOM FOREST MACHINE LEARNING ALGORITHM IN THE CONTEXT OF GAIA SPACE ASTROMETRY MISSION

Data inici oferta: 20-07-2017     Data finalització oferta: 20-03-2018



Estudis d'assignació del projecte:
    MU AEROSPACE S&T 15
Tipus: Individual
 
Lloc de realització: EETAC
 
Segon director/a (UPC): GARCÍA-BERRO MONTILLA, ENRIQUE
 
Paraules clau:
algoritme, classificació, simulación, misión Gaia
 
Descripció del contingut i pla d'activitats:
Content:
The Gaia space astrometry mission will scan about one billion stars an average of 70 times each over five years. During the mission time repeated astrometric, photometric and spectroscopic observations of the entire sky down to magnitude 20 will be recorded.
The first objective is to get acquainted with the Gaia public domain data base, already available Data Release 1, DR1 and forthcoming releases. The objective is to extract all the information concerning a particular type of stars for which we will focus our analysis.
All these large amount of astronomical data, provided by Gaia, must be efficiently handle. The use of machine learning algorithms and other automatic classification strategies becomes essential in such a big data frame.

Main objectives:
.- Understand the methodology used for the data transmission and data format in the Gaia information packages.
.- Extract all the available information from Gaia database concerning the white dwarf population and cross matching with other existent catalogs suchs as Hipparcos or Tycho.
.- Implement and optimize a new supervised Random Forest
.- Apply the Random Forest Algorithm to the available observed data and to the next Gaia data releases.
.- Classify the white dwarf population and identified them in the different components of our Galaxy.
.- Identify the possible existence of groups of white dwarfs, such as commoving grups or clusters.
.- Summarize the analysis in a paper where showing our results and the reliability of the new Random Forest Algorithm.
 
Overview (resum en anglès):
Gaia space astrometry mission will scan about one billion stars an average of 70 times each over five years. During the mission time repeated astrometric, photometric and spectroscopic observations of the entire sky down to magnitude 20 will be recorded. In other words, Gaia will be able to build a complete three-dimensional map of 1 per cent of our Galaxy storing a huge amount of results from all stars observed with the highest quality ever achieved. All these large amount of astronomical data must be efficiently handled. The use of machine learning algorithms and other automatic classification strategies becomes essential in such a big data frame. The main objective of this master thesis consists to prepare and tested an efficient automatize machine learning algorithm. Five supervised models are considered in this work, becoming the Random Forest algorithm the model that present the best capabilities and performance. We will take advantage of a detailed simulator of the white dwarf population, provided by the Astronomy and Astrophysics Group of the Physics Department of the UPC. This simulator will provide us with a detailed synthetic population that will mimic the characteristic of the observed population of white dwarfs by Gaia. This synthetic population will be used in the learning stage of the Random Forest Algorithm, in order to optimize its implementation to the observed data. Once tested, our algorithm has been applied to the extracted data from available Gaia Data Releases in order to classify its content in the different subpopulations of Galaxy such as the halo or the disk. The accuracy obtained in the present work by our Random Forest algorithm (85\%) represents a substantial improvement with respect to other classical methods (55\%).


© CBLTIC Campus del Baix Llobregat - UPC