
I asked myself "How BIG is the BIG DATA?" according to the experts. So I searched the Internet and found many definitions: "greater than 1TB," or "Data that is so large that you can not revealed adequately store and process it with your regular technology" or even "the number of independent data sources, each with the potential to interact”.
Here it is where I found something interesting: https://www.quora.com/How-much-data-is-Big-Data
The only certain thing is that there is no a single definition of BIG DATA. Why? Basically, we always try to give standard definitions to things but for BIG DATA is not like this, then how do I know if the person who sells me the solution to manage the BIG DATA is offering me the solution with a capital S?
I believe that BIG DATA is the concise definition of a technological challenge for us engineers: today we produce more data than can be stored in conventional computers, and then there is the problem of how to manage them. Here it comes the magic word: MANAGE data. I mean, there are 3 things to do with data:
It goes without saying, the concept of when the data become BIG is absolutely relative and should be measured together with the resources that use the data (technical, human and financial resources). For example, handle 100 GB of data per day is definitely a challenge for my business, but it is not for Coca-Cola Company.
From my point of view, the size of the figure is not a true indicator nor even the type of data (data from sensors, rather than user clicks, or published tweets), but it is the type of analysis you want to do with the data that make them BIG or not. Let me explain it with an example: if you collect, from a network camera, images and movies distributed for making a simple log, then we will arrange a suitable storage and some synthetic utility, but the problem is not BIG. But if the images have to be analyzed and related to each other, then you must implement a complex frame analysis and the problem is actually BIG. BIG for computing resources, BIG for the fast access to data, BIG for the ability to synthesize the analysis results.
Now let’s see it from another point of view, the one I hold most dear, that of small businesses in which entrepreneurs want to make the leap in data management. Entrepreneurs understand that the answers to many questions are in the data that can be collected in their companies, during their production processes. But does it make sense to talk to them about PetaByte, about geographically distributed storage? For them it is not important that data is BIG or SMALL, SMALL indeed if it would be much better! The information needed to make decisions are often based only on 1% of the collected data so if we knew which are the relevant details we might avoid having to manage the remaining 99%. But since I do not always know which data is useful, when in doubt we save everything! If an SME could do everything with Excel instead ELK would surely be happier!
Now let’s change the perspective talking about GOOD DATA. Good data are the useful data to make analysis and make sound decisions with the analysis results. Elementary, my dear Watson – you would say. And yet usually you found what you look for and if we are oriented to the BIG DATA we will design systems able to swallow up billions of scalable, redundant, expensive numbers. What if we would start from BIG DATA to come to GOOD DATA? How? For example by removing the dead branches (unproductive and rounded up data) from the “data tree”.
My perspective is not to dodge the BIG DATA, on the contrary I find it an exciting challenge, but to study it with the idea of making it a GOOD DATA ANALYSIS as well as GOOD DATA VISUALIZATION. I know that they seem trivial terms, but to perform these processes we require the most innovative technologies of DEEP LEARNING and Business Intelligence.