Publication: Log File Analysis in Cloud with Apache Hadoop and Apache Spark
Loading...
Identifiers
Publication date
2015-10
Defense date
Authors
Advisors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Log files are a very important set of data that can lead to useful information through proper analysis. Due to the high production rate and the number of devices and software that generate logs, the use of cloud services for log analysis is almost necessary. This paper reviews the cloud computational framework ApacheTM Hadoop R, highlights the differences and similarities between Hadoop MapReduce and Apache SparkTM and evaluates the performance of them. Log file analysis applications were developed in both frameworks and performed SQL-type queries in real Apache Web Server log files. Various measurements were taken for each application and query with different parameters in order to extract safe conclusions about the performance of the two frameworks.
Description
Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.
Keywords
Log analysis, Cloud, Apache hadoop, Apache spark, Performance evaluation
Bibliographic citation
Carretero Pérez, Jesús; et.al. (eds.). (2015) Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015): Krakow, Poland. Universidad Carlos III de Madrid, pp. 51-62.