Thursday, October 23, 2014

Facts about Raw Data

Dr. Rufus Pollock, founder and co-Director of the Open Knowledge Foundation said about raw data and fancy GUIs:

"one thing I find remarkable about many data projects is how much effort goes into developing a shiny front-end for the material. Now I’m not knocking shiny front-ends, they’re important for providing a way for many users to get at the material ... think what a website designed five years ago looks like today (hello css). Then think about what will happen to that nifty ajax+css work you’ve just done. By contrast ascii text, csv files and plain old sql dumps (at least if done with some respect for the ascii standard) don’t date — they remain forever in style."

Amen to that. 

Nothing will compete with the simplicity of storing raw data as CSV files. I'm amazed to see how complex and complicated nowadays business are and how enormous amount of money people pour in to maintain such complex systems. But what people know in fact about raw data ? Are you storing your data from your hosts, sensors, weather data stations somewhere close to you and in a format which easily will let you carry on a simple time series analysis ? What on earth is raw data ?

What is it ?

We call raw data, the data which has been collected from a source: operating system, application process, a meteorological weather station, or a sensor, for example, without any changes or modification of any kind, numerical or not. Sometimes this set of data is called the primary data. The raw data can have any format, binary or text, record orientated or not, time series or not. Usually the raw data is found as a simple text orientated format, the CSV format, where data is presented as records, each record having fields comma separated.

Do I need this ?

It is important to collect and store raw data from your hosts or other types of devices, somewhere safe and easy to access. This will be your centralized consistent data point of all data recorded from your network of sensors or data-centre hosts. From this centralized point you can easily inspect, browse and conduct any type of data analysis or visualization you like without being restricted to a particular software application. So yes, you will need access to raw data.

Who else is collecting and using raw data ?

Any statistical, numerical and visualization data processing will require access to raw data. From financial market, like financial raw data feeds to medical, space, aeronautical, biochemical engineering, they all heavily use raw data.


