Dimensionality Reduction and the Strange Case of Categorical Data for Predicting Defective Water Meter Devices

  • Marco RoccettiEmail author
  • Luca Casini
  • Giovanni Delnevo
  • Simone Bonfante
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1253)


Further to an experiment conducted with a deep learning (DL) model, tailored to predict whether a water meter device would fail with passage of time, we came across a very strange case, occurring when we tried to strengthen the training activity of our classifier by using, besides the numerical measurements of consumed water, also other contextual available information, of categorical type. Surprisingly, that further categorical information did not improve the prediction accuracy, which instead fell down, sensibly. Recognized the problem as a case of an excessive increase of the dimensions of the space of data under observation, with a correspondent loss of statistical significance, we changed the training strategy. Observing that every categorical variable followed a quasi-Pareto distribution, we re-trained our DL models, for each single categorical variable, only on that fraction of meter devices (and corresponding measurements of consumed water) that exhibited the most frequent qualitative values for that categorical variable. This new strategy yielded a prediction accuracy level never reached before, amounting to a value of 87–88% on average.


Machine learning design Human-machine-bigdata interaction loop Human data science Water metering and consumption 



We are indebted towards the company that has provided the data. To guarantee its privacy, we keep it here anonymized.


  1. 1.
    Casini, L., Delnevo, G., Roccetti, M., Zagni, N., Cappiello, G.: Deep water: predicting water meter failures through a human-machine intelligence collaboration. In: Advances in Intelligent Systems and Computing. Springer (2020)Google Scholar
  2. 2.
    Roccetti, M., Delnevo, G., Casini, L., Zagni, N., Cappiello, G.: A paradox in ML design: less data for a smarter water metering cognification experience. In: ACM International Conference Proceeding Series. ACM (2019)Google Scholar
  3. 3.
    Roccetti, M., Delnevo, G., Casini, L., Cappiello, G.: Is bigger always better? A controversial journey to the center of machine learning design, with uses and misuses of big data for predicting water meter failures. J. Big Data 6(1), 70 (2019)CrossRefGoogle Scholar
  4. 4.
    Delnevo, G., Roccetti, M., Casini, L.: Categorical data as a stone guest in a data science project for predicting defective water meters. In: 4th Annual Science Fiction Prototyping Conference 2020: Designing your Future with Science Fiction (2020)Google Scholar
  5. 5.
    Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)CrossRefGoogle Scholar
  6. 6.
    Casini, L., Marfia, G., Roccetti, M.: Some reflections on the potential and limitations of deep learning for automated music generation. In: IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC (2018)Google Scholar

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021

Authors and Affiliations

  • Marco Roccetti
    • 1
    Email author
  • Luca Casini
    • 1
  • Giovanni Delnevo
    • 1
  • Simone Bonfante
  1. 1.Department of Computer Science and EngineeringUniversity of BolognaBolognaItaly

Personalised recommendations