Hadoop and MapReduce


The objective of this article is to propose the Replica aware scheduling (ELRAS) for scheduling the Reduce task to process the intermediate data that is used with MapReduce application which depends on cloud computing. This ELRAS is the scheduling techniques that integrate with the ARS for data locality and replication with heterogeneous environment. This ELRAS is proposed to increase the speed of processing techniques and throughout using various applications. The Hadoop and ELRAS are discussed to balance the load with the RSB for increasing the better performance and throughput to the scheduler based on Hadoop.


Effective locality and replica aware scheduling is the strategy used with the reduce task application by working with the master node that is created with the reduce task (Shang, Chen and Yan, 2017). This is used for determining the speed of all the nodes that is performed with the next suitable nodes which affects the performance of the runtime. ELRAS uses the node with the processing the intermediate data that is performed with all the suitable reduce task with higher rate. The performance of the each node is monitored for their speed (Bibal Benifa and Dejey, 2017). The process of speed is degraded with the node when the node is assigned that depends on the Reduce task and data locality. Thus it maintains the queue to replicate with the request that is identified with the deciding object and job tracker.


This locality and replica awareness strategy is developed with the different layer mapping that have the different parameter like CPU utilization, IP added with the network, Rack Id and data stored with the object which is added with the node statistics table. The dynamic creation with NST algorithm is used with the incoming physical machine to provide the updated table with node creation for processing (Gandomi et al, 2019). This is carried with the multiple blocks that are used with the fixed size block based on the available space. Hadoop is configured with various slot number that is configured with the maximum reduced task which increases the Map task. This is used with data locality, identification of data set and virtual machine that is identified with the computing nodes.

Figure 1: Scheduling Strategy


From the experiment, the heterogeneous cluster is used for configuration of nodes in the network. This experiment is conducted with the block size that is configured with 64 MB (Zeng et al, 2018). This algorithm is used for evaluating the performance and configuration based on the proposed approach of ELRAS. The scheduler in Hadoop modifies the ELRAS based on the described approach. . In Radix sort, performance is benefitted with the penalty based on overhead that records the speed of the processing with intermediate data that is measured with suitable node. In word count, the performance is slightly worse than the Hadoop with various slot number with the configuration when the Map task is reduced to 1. The benchmark suite is used with the workload based on the word count, Grep, K means clustering and TeraSort with the Hadoop environment.

Figure 2: Number of Maps


Thus it is concluded with the ELRAS that is adapted with the MapReduce to scheduling reduce task for intermediate data with MapReduce application. This is proved that to identify the data locality and throughput and replication decision to improve the throughput with the Master of runtime system. This is concluded based on the various strategies that are identified with the data placement method, replication with job execution time with cross rack communication based on throughput. In future, ELRAS algorithm is integrated Auto scaling application with cloud computing to improve the MapReduce and Hadoop.


  1. Bibal Benifa, J.V. and Dejey ,2017. Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy. Wireless Personal Communications, 95(3), pp.2709–2733.
  2. Gandomi, A., Reshadi, M., Movaghar, A. and Khademzadeh, A. ,2019. HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework. Journal of Big Data, 6(1).
  3. Shang, F., Chen, X. and Yan, C. ,2017. A strategy for scheduling reduce task based on intermediate data locality of the MapReduce. Cluster Computing, 20(4), pp.2821–2831.
  4. Zeng, X., Garg, S.K., Wen, Z., Strazdins, P., Zomaya, A.Y. and Ranjan, R. ,2018. Cost efficient scheduling of MapReduce applications on public clouds. Journal of Computational Science, 26, pp.375–388.

Leave a Comment