Campus del Baix Llobregat
 
Projectes oferts

Projectes matriculats

Tribunals i dates de lectura

Projectes llegits
(2019-2)

DG ENG AERO/SIS TEL

DG ENG AERO/TELEMÀT

DG ENG SISTE/TELEMÀT

ENG TELEC 2NCICLE 01

ET AERO/ETT SIST 05

ET AERONÀUTICA 03

ETT SIST/ ET AERO 05

ETT SIST TELEC 00

ETT SIST TELEC 91

ETT TELEMÀTICA 00

GR ENG AERONAVEGACIÓ

GR ENG AEROPORTS

GR ENG SIS TELECOMUN

GR ENG SIST AEROESP

GR ENG TELEMÀTICA

MU AEROSPACE S&T 09

MU DRONS

MU MASTEAM 2009

MU MASTEAM 2015

Cerca projectes

Calendari TFG de dipòsit i lectura

Documentació

Web UPC


 

Projecte llegit

Títol: Using Selfsupervised algorithms for video analysis and scene detection

Estudiants que han llegit aquest projecte:

Director: TARRÉS RUIZ, Francesc

Departament: TSC

Títol: Using Selfsupervised algorithms for video analysis and scene detection

Data inici oferta: 07-01-2020     Data finalització oferta: 07-09-2020


Estudis d'assignació del projecte:
    MU MASTEAM 2015
Tipus: Individual
 
Lloc de realització: EETAC
 
Paraules clau:
video analysis, scene detection, machine learning, deep learning, python, keras, shot detection, convolutional neural network, self supervised learning
 
Descripció del contingut i pla d'activitats:
The main objective of the project is to evaluate the performance of selfsupervised
learning models in the context of video analysis and scene classification. The idea is to
develope a neural network architecture that will be train with a pretext task.

After convergence of the self-dupervised algorithm the architecture will be fine-tuned for
scene classification using an scene database based on TV series and films. The results
will be evaluated in the IBM scene database. The project includes the generation of NN
architecture, the coding in Python/Keras, the training and evaluation of the learning
model and the final objective task and the generation of some extra data to be added to
the scene database
 
Overview (resum en anglès):
With the increasing available audiovisual content, well-ordered and effective management of video is desired, and therefore, automatic, and accurate solutions for video indexing and retrieval are needed.
Self-supervised learning algorithms with 3D convolutional neural networks are a promising solution for these tasks, thanks to its independence from human-annotations and its suitability to identify spatio-temporal features.
This work presents a self-supervised algorithm for the analysis of video shots, accomplished by a two-stage implementation: 1- An algorithm that generates pseudo-labels for 20-frame samples with different automatically generated shot transitions (Hardcuts/Cropcuts, Dissolves, Fades in/out, Wipes) and 2- A fully convolutional 3D trained network with an overall achieved accuracy greater than 97% in the testing set.
The model implemented is based in [5], improving the detection of large smooth transitions by implementing a larger temporal context. The transitions detected occur centered in the 10th and 11th frames of a 20-frame input window.


Data de generació 26/01/2021