Hadoop and Map Reduce













Title of the Paper: Association Rules Mining on Map Reduce

Problem Statement: Smart cities are the popular technology which provides enhanced use of health, energy, transportation, education and weather services. All these processes produced huge amount of data each and every minute. Handling the high volume of data and extracting useful information from that vast data is the big challenge.  To check the efficiency of the Apriori algorithm, the traffic information is derived NIS (National Institute of Statistics). The dataset contains 340,184 traffic accident records which includes 572 attributes. This dataset contains rich source of information with various circumstances.

Methods and tools:

Big data tools are used for handling large size of databases (K Murali Gopal. 2016). Various big data tools are used for analyzing the data and produce useful information. Association rules theory produces the support and confidence processes for the data. This will be done in two step process. In first step, Apriori algorithm is used to find the predefined minimum support for each frequent itemset. In second step, the strong association rules are used to measure the minimum support and minimum confidence. Apache Hadoop Map Reduce tool is used for provide the framework for handling large databases and supports the implementations of abstracts.

Hadoop Ecosystem provides four major processes such as Data management, Data Access, Data Processing and Data storage(CARLOS FERNANDEZ-BASSO 2016). Map Reduce is distributed software framework which provides distributed computing support for large dataset. There are two steps in Map reduce that are mapper and reducer. Mapper is a function which handles pair of keys and provides intermediate keys and values for process(Sarem M. Ammar. 2018). Reducer is a function which handles intermediate key values which produce the group of intermediate values set for similar key values.

The experimental results showed that the association rules are used with minimum 30% support values. For this process, the strong analysis and confidence values are used. They listed 10 strong values which are used for this analysis. The support and confidence values are showed in the table which is taken form accident dataset. The association rules showed that intersections in colonies which are near to highways. They also showed the two-wheeler accidents in non-highways. The rules are used to provide the multi-vehicle accidents in highways also. The rules also specified the night time accidents on highway roads. The forest area accidents are also specified. The parallel Apriori algorithm was used to extract the features from the vast amount of dataset. Association rules are used to discovering knowledge which is used for decision making.

This algorithm was implemented on various datasets to measure the efficiency. Two datasets were considered for this measurement. The first dataset contains 34000 transactions and the second dataset contains 150000transactions. Hadoop provides the execution support for this analysis. Both these transactions considered 6 nodes which are used to decrease the time required for running the algorithm. Acceleration criteria are used for measuring the efficiency of the algorithm of Parallel Apriori algorithm based on Map Reduce. The experimental results highlighted that Apriori algorithm was efficient. Hadoop environment provides satisfied results in the executed process.




K Murali Gopal., Ranjit Patnaik.,2016. “Performance Analysis of Association Rule Mining Using Hadoop.” International Journal of Computer Science and Information Technologies,  Vol-7, issue-6, pp 2442-2444.

Sarem M. Ammar., Fadl M. Ba-Alwi.,2018. “Improved FTWeightedHash Apriori Algorithm for Big Data using Hadoop-MapReduce Model.” Journal of Advanced in Mathematics and Computer Science, Vol-27, issue-1, pp 1-11.



Leave a Comment