Hadoop and Map Reduce
Title of the Paper: Association Rules Mining on Map Reduce
Problem Statement:
Smart cities are the popular technology that provides enhanced use of health, energy, transportation, education and weather services. All these processes produced a huge amount of data each and every minute. Handling the high volume of data and extracting useful information from that vast data is a big challenge. To check the efficiency of the Apriori algorithm, the traffic information is derived from NIS (National Institute of Statistics). The dataset contains 340,184 traffic accident records which include 572 attributes. This dataset contains a rich source of information with various circumstances.
Methods and tools:
Big data tools are used for handling large size of databases (K Murali Gopal. 2016). Various big data tools are used for analyzing the data and produce useful information. Association rules theory produces the support and confidence processes for the data. This will be done in a two-step process. In the first step, the Apriori algorithm is used to find the predefined minimum support for each frequent itemset. In the second step, the strong association rules are used to measure the minimum support and minimum confidence. Apache Hadoop Map Reduce tool is used to provide the framework for handling large databases and supports the implementations of abstracts.
Hadoop Ecosystem provides four major processes as Data management, Data Access, Data Processing and Data storage(CARLOS FERNANDEZ-BASSO 2016). Map Reduce is distributed software framework which provides distributed computing support for large datasets. There are two steps in Map reduce that are mapper and reducer. Mapper is a function that handles pair of keys and provides intermediate keys and values for the process(Sarem M. Ammar. 2018). Reducer is a function that handles intermediate key values which produce the group of intermediate values set for similar key values.
The experimental results showed that the association rules are used with a minimum of 30% support values. For this process, strong analysis and confidence values are used. They listed 10 strong values which are used for this analysis. The support and confidence values are shown in the table which is taken from the accident dataset. The association rules showed that intersections in colonies are near highways. They also showed the two-wheeler accidents in non-highways. The rules are used to provide the multi-vehicle accidents on highways also. The rules also specified the night time accidents on highway roads. The forest area accidents are also specified. The parallel Apriori algorithm was used to extract the features from the vast amount of dataset. Association rules are used to discovering knowledge that is used for decision making.
This algorithm was implemented on various datasets to measure efficiency. Two datasets were considered for this measurement. The first dataset contains 34000 transactions and the second dataset contains 150000transactions. Hadoop provides the execution support for this analysis. Both these transactions considered 6 nodes which are used to decrease the time required for running the algorithm. Acceleration criteria are used for measuring the efficiency of the algorithm of Parallel Apriori algorithm based on Map Reduce. The experimental results highlighted that the Apriori algorithm was efficient. Hadoop environment provides satisfying results in the executed process.
References
Bibliography
CARLOS FERNANDEZ-BASSO, M. DOLORES RUIZ & MARIA J. MARTIN-BAUTISTA,2016. “EXTRACTION OF ASSOCIATION RULES USING BIG DATA TECHNOLOGIES.” Int. J. of Design & Nature and Ecodynamics, Vol-11, issue-3, pp 178-185.
K Murali Gopal., Ranjit Patnaik.,2016. “Performance Analysis of Association Rule Mining Using Hadoop.” International Journal of Computer Science and Information Technologies, Vol-7, issue-6, pp 2442-2444.
Sarem M. Ammar., Fadl M. Ba-Alwi.,2018. “Improved FTWeightedHash Apriori Algorithm for Big Data using Hadoop-MapReduce Model.” Journal of Advanced in Mathematics and Computer Science, Vol-27, issue-1, pp 1-11.