The Joy of Type Providers
- 2.1k Downloads
Let me let you in on the dirty little secret of machine learning: If you were to look solely at the topics publicly discussed, you would think that most of the work revolves around crafting fancy algorithms, the remaining time being spent on designing ways to run these algorithms in a distributed fashion, or some other similarly interesting engineering challenge. The sad truth is that this is a rather marginal part of the job; most of your time will likely be spent on a much more prosaic activity: data janitorial tasks. If you want the machine to learn anything, you need to feed it data, and data has a way of coming from all sorts of different sources, in shapes and formats that are impractical for what you want to do with it. It usually has missing information, and is rarely properly documented. In short, finding and preparing data is both hugely important for machine learning and potentially a source of great pain.