Big Data – Blog by YY
Introduction to Data
In this weekly blog, I am gonna introduce the characteristics agains “data.”
Yes, “Data”, the four-long character stands for proprietary relating to individual holdings after the terms “BIG DATA” bursts out into attentions, as well as the most amazing and intriguing elements once you spend time on it.
“Data” being processed can be categorized as the following,
information has well-defined length and format, as well as referring those have high degree of organization, such as the one stored in database, computer-/machine-/human-generated data.
- computer-generated: server-/web-logging data
- human-generated: database data
- machine-generated: sensor data
the opposite of structured data. Tho most high volume data type in word.
information has NO well-defined length and format, lack of degree of organization in between.
- video/photos/audio files
– semi-structured (another form of structured data),
information has organizational properties, making it feasible to understand and analyze, but still containing unorganized data in it.
- csv/xml/json files, etc
example of cvs file, data type in fields among rows are all in consistency, except the #1 row(headers).
Sometimes, it’s hard to distinguish the difference between the ones, semi-structured/unstructured data.
For making life simple and naive, some experts argue there is no semi-structured data in existence; however, others don’t.
How to extract elements from raw information is the fundamental task in data analysis. The most common way for data processing is, trying to keep and transform all information in raw data to structured one, and dealing with consequential processing or even analytics tasks among the one in the future.
The reason might because its more easier to figure out correlations among elements in structured data than those in unstructured one.