Big Data – Blog by YY
Data, as the core factor in Data world. We might wonder whether there is criterion existed before everything getting started. What I meant “ everything getting started “ is the process of ETL, data deposit into DataBases(DBs)/DataWarehouses(DWs), and, of course, the afterwards analysis against the source we currently have.
The answer is “ YES “ and that is the topic I will introduce, “ Data Quality. “
Why it is important, because “ Quality data “ means “ USEFUL DATA, “ meaning data must be consistent and unambiguous. Data that is not high quality can undergo data cleansing to raise its quality.
Six factors are related to the so-called “ Data Quality “ and will be touch in orders.
Whether all requisite information is available from the source and can be utilize afterwards, not just in analysis, in data visualization, but also in any mandatory requirements if necessary.
The absence of difference, when comparing two or more representations of a thing against a definition.
Data conformity describes how well data adheres to standards and how well it is represented in desired formats.
The degree where the data correctly describes the situations from the real world or events being described.
Data integrity relates to the validity of data for a specific time slots during which it is relevant from the source.
The degree where the data represent reality from the time slots.
As a matter of fact, if we can evaluate data at hand, we will decrease tons of time deleting/refactoring/remodeling dirty fields from raw data, as well as speeding up time in development.
As a reminder, the six factors mentioned above is not mandatory, some exports use different terms to describe traits in Data Quality, like the following (Completeness, Conformity, Consistency, Accuracy, Duplication and Integrity.)