Publication:
Log File Analysis in Cloud with Apache Hadoop and Apache Spark

Loading...
Thumbnail Image
Identifiers
ISBN: 978-84-608-2581-4
Publication date
2015-10
Defense date
Advisors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Impact
Google Scholar
Export
Research Projects
Organizational Units
Journal Issue
Abstract
Log files are a very important set of data that can lead to useful information through proper analysis. Due to the high production rate and the number of devices and software that generate logs, the use of cloud services for log analysis is almost necessary. This paper reviews the cloud computational framework ApacheTM Hadoop R, highlights the differences and similarities between Hadoop MapReduce and Apache SparkTM and evaluates the performance of them. Log file analysis applications were developed in both frameworks and performed SQL-type queries in real Apache Web Server log files. Various measurements were taken for each application and query with different parameters in order to extract safe conclusions about the performance of the two frameworks.
Description
Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.
Keywords
Log analysis, Cloud, Apache hadoop, Apache spark, Performance evaluation
Bibliographic citation
Carretero Pérez, Jesús; et.al. (eds.). (2015) Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015): Krakow, Poland. Universidad Carlos III de Madrid, pp. 51-62.