CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Extracció de característiques i classificació automàtica de senyals d'àudio

Estudiants que han llegit aquest projecte:
GARETA MANZANO, ALEX (data lectura: 14-09-2015)
Cerca aquest projecte a Bibliotècnica

Director/a: TARRÉS RUIZ, FRANCESC

Departament: TSC

Títol: Extracció de característiques i classificació automàtica de senyals d'àudio

Data inici oferta: 27-04-2015 Data finalització oferta: 27-12-2015

Estudis d'assignació del projecte:

GR ENG SIS TELECOMUN

Tipus: Individual

Lloc de realització: EETAC

Paraules clau:
Extracció de característiques, anàlisis senyals audio, tractament digital de senyal, classificació estadística

Descripció del contingut i pla d'activitats:
L'objectiu del projecte és realitzar un software per analitzar automàticament el tipus de contingut de diferents senyals d'àudio. Els continguts hauran de classificarse en una de les següents classes: música, veu, veu+música, silenci, soroll+veu, etc. Per realitzar el sistema de classificació s'extreuren diferents característiques del senyals d'àudio, que intenten caracteritzar-lo de forma paramètrica i amb més simplicitat que les propies mostres d'àuido. Entre aquestes característiques es consideraran els creuaments per zero, el flux espectral, els coeficients MFCC, ect. Un cop extretes les característiques seleccionades s'aplicaran mètodes de classificació estadística per entrenar l'aprenentatge d'un sistema de classificació automàtic. Per fer les proves i valorar-ne els resultats es combinaran bases de dades creades específicament per aquest treball amb altres bases de dades disponibles de fonts alternatives

Overview (resum en anglès):
The purpose it is to make the development of an algorithm that is able to extract the features of audio segments for further classification in speech, music or music and speech files. For this we have analysed the most referenced classification algorithms in the literature, and their characteristics, performance and computational complexity evaluated during training and recognition phases. From these studies, it has been decided to implement a system based on low-level features and a statistical classifier system. MATLAB has been chosen as the development tool since the applications in mind did not require a real-time training system. The coefficients we use to characterize the different types of signal are called MFCC, an acronym for the Mel Frequency Cepstral Coefficient. And as training algorithm a Gaussian Mixture Model (GMM) will be used for each audio class, which will change the number of Gaussians to model and will evaluate the best configuration. In addition to the MFCC, we will also implement the MFCC Deltas and MFCC Delta-Deltas that will provide us with information about the dynamic properties of audio; because MFCC only provide information within a window where the signal is considered stationary. As discussed below, the obtained results are enough satisfactory for the algorithm to run on a real case. Charts and graphs of average error rate for the different configurations of inputs that have been tested and reported. The results information is expanded in the annex.