CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Speech/music audio classification for publicity insertion and DRM

Estudiants que han llegit aquest projecte:
GIL MORENO, FAUSTO (data lectura: 15-02-2018)
Cerca aquest projecte a Bibliotècnica

Director/a: TARRÉS RUIZ, FRANCESC

Departament: TSC

Títol: Speech/music audio classification for publicity insertion and DRM

Data inici oferta: 27-07-2017 Data finalització oferta: 27-03-2018

Estudis d'assignació del projecte:

MU MASTEAM 2015

Tipus: Individual

Lloc de realització: EETAC

Paraules clau:
Digital Audio, Signal Processing, Audio Descriptors, Machine Learning

Descripció del contingut i pla d'activitats:
This project considers the problem of automatic classification of audio signals into music or speech classes. The classification of audio into these categories is very important in different applications such as determining the percentage of music (DRM) in radio programs or the determination of locations for inserting publicity in video. The problem is not obvious in TV or film contents where music and speech are mixed. In this project severals algorithmic strategies will be studied and analyzed. From this previous analysis we will select one of the more promising methods, implement it in Matlab and test its performance. We expect that during the development of the project and the method we will be able to propose some improvements and refinements for improving the performance

Overview (resum en anglès):
The goal of this project is to develop, implement and optimize an existing method called Continuous Frequency Activation (CFA). The aim is to try to solve the problem that exists when advertising in randomly introduced in TV programmes/films/audio podcasts/etc. that can generate discomfort in the viewer. The basic idea is to avoid introducing adverts in the middle of a conversation. The final criteria will be selected taking into account metadata of video (change of plane, scene, fade-out, etc.) and audio (voice, music). To do that, we have developed and algorithm capable of discriminate between music and voice. This algorithm has been developed exclusively for this purpose and does not require base or date training to be trained. Previously to the creation of the algorithm, different existent methods of discrimination between music and voice have been studied and their pros and cons have been analysed. After performing the study, the method that has been selected is The Continuous Frequency Activation (CFA). CFA is one of the methods with better statistic results and it is not necessary to obtain large data bases for its training. The implementation of this algorithm has been performed using MATLAB®. Data base have been used in the realization of the trials, using five different musical style: classic music, Blues, electronical music, Jazz and Speech. The audio files from each different music style have been edited using the software called Audacity®. After performing all the tests, it can be said that the developed algorithm works correctly and it is able to discern music from voice in a very high percentage of cases (97.55%). With the results obtained after the trials, it can be said that this method could be used by companies that are involved in the fields of media, television (Antena 3, Telecinco, etc.) and/or audio podcast. The goal is to automatically introduce publicity in audio podcast format at the most appropriate moment.