Comparing Frameworks for Distributed Big Data Processing in the Domain of Predictive Maintenance

Rudolf Plettenberg

Big Data Analysis is a core component of many modern companies across industries. Due to their high requirements, many Big Data applications run on distributed computing environments. Depending on the use case these applications can get very complex, comprising of multiple frameworks that need to work together. One prominent and high potential use case is predictive maintenance where machine learning algorithms are used to predict machine failures. The goal is to minimize downtime and maintenance costs. In this thesis, a predictive maintenance use case is implemented to serve as a benchmark for testing various Big Data frameworks. Popular frameworks like Apache Hadoop and Apache Spark are tested on their performance in different combinations. Additionally, a qualitative comparison of the analyzed frameworks is made.
At first this work gives an overview of the landscape of current Big Data frameworks and determines the most popular ones. Secondly the predictive maintenance use case is described. Thirdly the use case is implemented on the various framework combinations and tested on performance. At last the results are compared and conclusions are drawn.