CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Speech/music audio classification for publicity insertion and DRM


Estudiants que han llegit aquest projecte:


Director/a: TARRÉS RUIZ, FRANCESC

Departament: TSC

Títol: Speech/music audio classification for publicity insertion and DRM

Data inici oferta: 27-07-2017     Data finalització oferta: 27-03-2018



Estudis d'assignació del projecte:
    MU MASTEAM 2015
Tipus: Individual
 
Lloc de realització: EETAC
 
Paraules clau:
Digital Audio, Signal Processing, Audio Descriptors, Machine Learning
 
Descripció del contingut i pla d'activitats:
This project considers the problem of automatic classification of audio signals into music
or speech classes. The classification of audio into these categories is very important in
different applications such as determining the percentage of music (DRM) in radio
programs or the determination of locations for inserting publicity in video. The problem is
not obvious in TV or film contents where music and speech are mixed. In this project
severals algorithmic strategies will be studied and analyzed. From this previous analysis
we will select one of the more promising methods, implement it in Matlab and test its
performance. We expect that during the development of the project and the method we
will be able to propose some improvements and refinements for improving the
performance
 
Overview (resum en anglès):
The goal of this project is to develop, implement and optimize an existing method called Continuous Frequency Activation (CFA). The aim is to try to solve the problem that exists when advertising in randomly introduced in TV programmes/films/audio podcasts/etc. that can generate discomfort in the viewer.

The basic idea is to avoid introducing adverts in the middle of a conversation. The final criteria will be selected taking into account metadata of video (change of plane, scene, fade-out, etc.) and audio (voice, music).

To do that, we have developed and algorithm capable of discriminate between music and voice. This algorithm has been developed exclusively for this purpose and does not require base or date training to be trained.

Previously to the creation of the algorithm, different existent methods of discrimination between music and voice have been studied and their pros and cons have been analysed. After performing the study, the method that has been selected is The Continuous Frequency Activation (CFA). CFA is one of the methods with better statistic results and it is not necessary to obtain large data bases for its training.

The implementation of this algorithm has been performed using MATLAB®. Data base have been used in the realization of the trials, using five different musical style: classic music, Blues, electronical music, Jazz and Speech.

The audio files from each different music style have been edited using the software called Audacity®.

After performing all the tests, it can be said that the developed algorithm works correctly and it is able to discern music from voice in a very high percentage of cases (97.55%).

With the results obtained after the trials, it can be said that this method could be used by companies that are involved in the fields of media, television (Antena 3, Telecinco, etc.) and/or audio podcast. The goal is to automatically introduce publicity in audio podcast format at the most appropriate moment.



© CBLTIC Campus del Baix Llobregat - UPC