CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Using Selfsupervised algorithms for video analysis and scene detection

Estudiants que han llegit aquest projecte:
MORA ARIAS, JUAN FELIPE (data lectura: 29-10-2020)
Cerca aquest projecte a Bibliotècnica

Director/a: TARRÉS RUIZ, FRANCESC

Departament: TSC

Títol: Using Selfsupervised algorithms for video analysis and scene detection

Data inici oferta: 07-01-2020 Data finalització oferta: 07-09-2020

Estudis d'assignació del projecte:

MU MASTEAM 2015

Tipus: Individual

Lloc de realització: EETAC

Paraules clau:
video analysis, scene detection, machine learning, deep learning, python, keras, shot detection, convolutional neural network, self supervised learning

Descripció del contingut i pla d'activitats:
The main objective of the project is to evaluate the performance of selfsupervised learning models in the context of video analysis and scene classification. The idea is to develope a neural network architecture that will be train with a pretext task. After convergence of the self-dupervised algorithm the architecture will be fine-tuned for scene classification using an scene database based on TV series and films. The results will be evaluated in the IBM scene database. The project includes the generation of NN architecture, the coding in Python/Keras, the training and evaluation of the learning model and the final objective task and the generation of some extra data to be added to the scene database

Overview (resum en anglès):
With the increasing available audiovisual content, well-ordered and effective management of video is desired, and therefore, automatic, and accurate solutions for video indexing and retrieval are needed. Self-supervised learning algorithms with 3D convolutional neural networks are a promising solution for these tasks, thanks to its independence from human-annotations and its suitability to identify spatio-temporal features. This work presents a self-supervised algorithm for the analysis of video shots, accomplished by a two-stage implementation: 1- An algorithm that generates pseudo-labels for 20-frame samples with different automatically generated shot transitions (Hardcuts/Cropcuts, Dissolves, Fades in/out, Wipes) and 2- A fully convolutional 3D trained network with an overall achieved accuracy greater than 97% in the testing set. The model implemented is based in [5], improving the detection of large smooth transitions by implementing a larger temporal context. The transitions detected occur centered in the 10th and 11th frames of a 20-frame input window.