a man with laptop

Hadoop and MapReduce

 Rationale

The objective of this article is to propose the Replica aware scheduling (ELRAS) for scheduling the Reduce task to process the intermediate data that is used with the MapReduce application which depends on cloud computing. This ELRAS is the scheduling technique that integrates with the ARS for data locality and replication with a heterogeneous environment. This ELRAS is proposed to increase the speed of processing techniques and throughout using various applications. The Hadoop and ELRAS are discussed to balance the load with the RSB for increasing the better performance and throughput to the scheduler based on Hadoop.

Methods

Effective locality and replica aware scheduling is the strategy used with the reduced task application by working with the master node that is created with the reduced task (Shang, Chen and Yan, 2017). This is used for determining the speed of all the nodes that are performed with the next suitable nodes which affect the performance of the runtime. ELRAS uses the node with the processing of the intermediate data that is performed with all the suitable reduced tasks at a higher rate. The performance of each node is monitored for its speed (Bibal Benifa and Dejey, 2017). The process of speed is degraded with the node when the node is assigned that depends on the Reduce task and data locality. Thus it maintains the queue to replicate with the request that is identified with the deciding object and job tracker.

Tools

This locality and replica awareness strategy is developed with the different layer mapping that has the different parameters like CPU utilization, IP added with the network, Rack Id and data stored with the object which is added with the node statistics table. The dynamic creation with the NST algorithm is used with the incoming physical machine to provide the updated table with node creation for processing (Gandomi et al, 2019). This is carried with the multiple blocks that are used with the fixed size block based on the available space. Hadoop is configured with various slot number that is configured with the maximum reduced task which increases the Map task. This is used with data locality, identification of data set and a virtual machine that is identified with the computing nodes.

Figure 1: Scheduling Strategy

Finding

From the experiment, the heterogeneous cluster is used for the configuration of nodes in the network. This experiment is conducted with the block size that is configured with 64 MB (Zeng et al, 2018). This algorithm is used for evaluating the performance and configuration based on the proposed approach of ELRAS. The scheduler in Hadoop modifies the ELRAS based on the described approach. . In the Radix sort, performance is benefitted with the penalty based on overhead that records the speed of the processing with intermediate data that is measured with a suitable node. In word count, the performance is slightly worse than the Hadoop with various slot numbers with the configuration when the Map task is reduced to 1. The benchmark suite is used with the workload based on the word count, Grep, K means clustering and TeraSort with the Hadoop environment.

Figure 2: Number of Maps

Conclusion

Thus it is concluded with the ELRAS that is adapted with the MapReduce to scheduling reduce tasks for intermediate data with MapReduce application. This is proved that to identify the data locality and throughput and replication decision to improve the throughput with the Master of the runtime system. This is concluded based on the various strategies that are identified with the data placement method, replication with job execution time with cross rack communication based on throughput. In future, ELRAS algorithm is an integrated Auto scaling application with cloud computing to improve MapReduce and Hadoop.

References

  1. Bibal Benifa, J.V. and Dejey ,2017. Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy. Wireless Personal Communications, 95(3), pp.2709–2733.
  2. Gandomi, A., Reshadi, M., Movaghar, A. and Khademzadeh, A.,2019. HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework. Journal of Big Data, 6(1).
  3. Shang, F., Chen, X. and Yan, C.,2017. A strategy for scheduling reduces tasks based on the intermediate data locality of MapReduce. Cluster Computing, 20(4), pp.2821–2831.
  4. Zeng, X., Garg, S.K., Wen, Z., Strazdins, P., Zomaya, A.Y. and Ranjan, R. ,2018. Cost-efficient scheduling of MapReduce applications on public clouds. Journal of Computational Science, 26, pp.375–388.

Leave a Comment