Sistema recomendador de taxis para Big Data

Barata González, Jorge

Publication:
Sistema recomendador de taxis para Big Data

Identifiers

URI: https://hdl.handle.net/10016/27164

Files

PFC_Jorge_Barata_Gonzalez_2017.pdf (19.69 MB)

Publication date

2017-09-20

Defense date

2017-09-29

Authors

Barata González, Jorge

Advisors

Basanta Val, Pablo

Impact

Export

Abstract

Este documento propone un sistema para maximizar los ingresos de los taxistas, recomendando las zonas de la ciudad con mayor frecuencia de viajes y más cercanas a la posición del taxista en ese momento. Analizando 10M de trazas de viajes realizados por los Taxis Amarillos de Nueva York con un cluster Spark, encontramos correlaciones entre los beneficios y el tiempo, distancia y número de pasajeros. También se encuentran que los lugares más frecuencia de viajes cambian cada hora y cada día. Mediante clustering, el sistema computa las agrupaciones más lucrativas para cada hora y día de la semana, dando una puntuación a cada uno de los grupos basado en las correlaciones encontradas. El sistema se ejecuta varias veces sobre un clúster Spark, buscando la configuración más óptima. Los resultados se guardan en una base de datos geoespacial, y puede consultarse mediante una aplicación web introduciendo la hora, día de la semana, y ubicación. El sistema recomienda las diez ubicaciones más cercanas, ordenadas por beneficio. El sistema puede ser interesante para los Taxistas Amarillos de Nueva York, como una forma de incrementar los beneficios influyendo en sus desplazamientos de forma directo, libertad que los servicios competidores como Uber y Cabify no ofrecen, ya que los objetivos de tales taxistas les son fijados por la compañía.
This document proposes a system to maximize the income of taxi drivers, recommending the areas of the city with more frequency of trips and closer to the position of the taxi driver at the time of the query. Analyzing 10M traces of trips made by the New York Yellow Taxis with a Spark cluster, we found correlations between the benefits and the time, distance and number of passengers. We also found that more frequent travel places change every hour and every day. Through clustering, the system computes the most profitable groups for each hour and day of the week, giving a score to each of the groups based on the correlations found. The system runs several times on a Spark cluster, looking for the most optimal configuration. The results are stored in a geospatial database, and can be viewed through a web application by entering the time, day of the week, and location. The system recommends the top ten closest locations, sorted by profit. The system may be of interest to the Yellow Taxi drivers in New York, as a way to increase profits by influencing their wandering, something that competing services like Uber and Cabify do not offer, since the objectives of such taxi drivers are fixed by the company.

Keywords

Big Data, Cluster, Data mining, Spark, Spatial clustering, Taxi

Collections

Proyectos Fin de Carrera

Full item page

Publication:
Sistema recomendador de taxis para Big Data

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication: Sistema recomendador de taxis para Big Data

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication:
Sistema recomendador de taxis para Big Data