CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Extending Object Classificatiion Convolutional Neural Networks to custom logo detection

Estudiants que han llegit aquest projecte:
PEÑA MOLINER, DAVID (data lectura: 15-09-2020)
Cerca aquest projecte a Bibliotècnica

Director/a: TARRÉS RUIZ, FRANCESC

Departament: TSC

Títol: Extending Object Classificatiion Convolutional Neural Networks to custom logo detection

Data inici oferta: 20-06-2019 Data finalització oferta: 20-02-2020

Estudis d'assignació del projecte:

DG ENG AERO/SIS TEL

DG ENG AERO/TELEMÀT

DG ENG SISTE/TELEMÀT

GR ENG SIS TELECOMUN

GR ENG SIST AEROESP

GR ENG TELEMÀTICA

MU MASTEAM 2015

Tipus: Individual

Lloc de realització: EETAC

Paraules clau:
Convolutional Neural Networks, Object Classification,

Descripció del contingut i pla d'activitats:
The objective is to use an Object Classification pre-trained CNN to adapt the last layers in order to include additional user defined objects or logos. The idea is to implement a fully automated system for including new components in the catalogue of objects that the Neural Network is able to recognize.

Overview (resum en anglès):
The aim of this project is to automate the calculation of the total time that the logos of the sponsoring brands of moto GP appear on the screen during the races. This document explains all the steps that have been followed to train an automatic object detection model for a specific database using RetinaNet. At the beginning, a brief explanation of the main concepts of deep learning is given and it is explained how convolutional neural networks and their kinds of layers, convolution and pooling, operate. Afterwards, it is presented a state of the art of the main classification and object detection systems, where RetinaNet has been chosen because, currently, it is one of the systems that provides better results. It should be noted that the main difference between classification and detection is that a detection system obtains the position of the object (rectangular region called bounding box) and indicates its typology and the classification system only indicates its typology. A database of images from 6 moto GP videos had to be created and labeled using Labelimg software. Labeling an image means drawing the bounding box and defining which brand the logo belongs to. The selected brands have been: Alpinestar, DHL, Repsol, GoPro, Michelin, RedBull, Monster, Tissot, Motul and BMW and a total of 16 classes have been created since one can have several forms of logos. Due to the fact that the database is not large enough to train a model from scratch, the weights of a pre-trained network have been used, this technique is known as transfer learnig. In addition, to avoid overfitting, the layers of one part of the architecture called the backbone have been frozen, which, in this case, has been used with Resnet50. Later, another model has been trained applying data augmentation, to improve the results obtained from the first model trained. Data augmentation is a technique that generates new examples by performing transformations on the images in the database. With this, it has obtained an accuracy of 83.3% and a mean Average Precision (mAP) of 65.33%. Finally, an application example called Brand Logo Monitoring in Moto GP has been developed, which, using the model trained with data augmentation, counts automatically the time of appearance of each brand in the MotoGP Grand Prix and give a final result with the total time.