CBL - Campus del Baix Llobregat

Projecte llegit

Títol: Prueba de concepto de un sistema automatizado de recolección y análisis de datos de compra y venta de viviendas en Amazon web Service.


Estudiants que han llegit aquest projecte:


Director/a: ROYO VALLÉS, DOLORS

Departament: DAC

Títol: Prueba de concepto de un sistema automatizado de recolección y análisis de datos de compra y venta de viviendas en Amazon web Service.

Data inici oferta: 13-01-2022     Data finalització oferta: 13-09-2022



Estudis d'assignació del projecte:
    GR ENG SIS TELECOMUN
    GR ENG TELEMÀTICA
Tipus: Individual
 
Lloc de realització: EETAC
 
Paraules clau:
Web, html, servicios web en la nuve, programacion python
 
Descripció del contingut i pla d'activitats:


Este proyecto trata de diseñar e implementar un sistema
automatizado de recolección y análisis de datos integrado en los
servicios de amazon web service. El sistema deberá recolectar
esta información de forma periódica de páginas webs utilizando
la técnica de web scraping. Se deberá procesar la información
obtenida para obtener datos relevantes para la compra de pisos.
Habrá de diseñarse e implementar la solución más optima para
consumir los menores recursos posibles, de Amazon web Service.
También se diseñara una página web donde se muestren los datos
pos procesados de la forma más amigable y entendible para los
usuarios. Por último se implementara un sistema de
notificaciones para los usuarios que deseen estar informados
sobre pisos en venta concretos.
 
Overview (resum en anglès):
This work has been done because of my interest in some techniques used to collect information from web pages. These techniques have a name and it is called web scraping. Web scraping is used to crawl information on web pages. This information can then be used to draw conclusions from the data that otherwise could not be drawn. For example, web scraping is used to track down car parts that are no longer for sale on used car parts portals. As these parts are in short supply, they are quickly sold. With web scraping it is possible to receive a notification when one of these parts is offered for sale on one of the portals.

The work done is a proof of concept of web scraping with the Scrapy framework. Scrapy is a framework written in Python used to create bots that collect information from websites. During the work a bot is made with Scrapy. Before the project I didn't know that the traffic of bots on the internet was so high, about 66%. It is for this reason that many websites do not allow these bots to browse their pages by filtering their traffic. The project explains what modifications have been made to the bot so that it can bypass these website defenses. This project could be used to help websites improve their defenses against traffic from bots that only want to collect information from their pages.

The project has also deployed the application on the web service provider amazon web services. Amazon web service is one of the largest web service providers in the world. There are many job offers that require knowledge of this provider and it seemed interesting to me to know how applications could be deployed on it.


© CBLTIC Campus del Baix Llobregat - UPC