CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Minería de texto mediante NLP en el sector seguros

Estudiants que han llegit aquest projecte:
LUQUE GARCIA, IVAN (data lectura: 03-07-2023)
Cerca aquest projecte a Bibliotècnica

Director/a: MORA SERRANO, FRANCISCO JAVIER

Departament: DECA

Títol: Minería de texto mediante NLP en el sector seguros

Data inici oferta: 18-07-2022 Data finalització oferta: 18-03-2023

Estudis d'assignació del projecte:

GR ENG SIS TELECOMUN

GR ENG SIST AEROESP

GR ENG TELEMÀTICA

Tipus: Individual

Lloc de realització: EETAC

Nom del segon director/a (UPC): Alberto Burgos (CIMNE-TIC)
Departament 2n director/a:

Paraules clau:
Machine Learning, Churn prediction, minería de textos

Descripció del contingut i pla d'activitats:
El software de gestión de corredores de seguros permite el registro de todo tipo de datos derivados de la interacción entre cliente y corredor. Muchos de estos registros corresponden a texto libre, lo cual no permite su explotación directa con técnicas que trabajan con datos tabulares. El objetivo de este TFG es la generación de características a partir de texto libre que permitan extraer información relevante para posteriores análisis y/o modelados de algoritmos de Machine Learning enfocados en la predicción del abandono (churn prediction).

Overview (resum en anglès):
This end of degree project, is about classifying reasons of insurance cancellation, from the text written by insurance brokers. To achieve this goal, it was used data provided by the brokers that works with the same software, segElevia. The task definition of this project was obtained during the process of business understanding and analysing the data obtained. Then it was discovered that the brokers make mistakes when selecting the label for the cancellation reason, when comparing it to the content of the free text field. In the process of development of this project, the following software was used: Jupyter Notebook as working environment, Python as development language, and Scikit Learning, Pandas, Seaborn, Spacy and Numpy as libraries. In regards to the data processing it were used different techniques, such as: word elimination, lemmatization, tokenization, vectorization, zero padding and oversampling; however the last one was not implemented, given that the results were unsatisfactory. During the development of this end of degree project, a variety of artificial intelligence, such as the Random Forest Classifier or the Perceptron. After analysing the results obtained on every model, it was considered that the model that provided more satisfactory, was the Random Forest Classifier. This model provides a weighted average in the metrics of 75% on precision, 74% on recall and 74% on f1-score. Finally from the obtained results, could be created a predictor that helps brokers by indicating the name of the label that they should put while they are writing the free text field, thus reducing the times that the brokers classify wrongly the reason why cancellation of the insurance was made.