Data Profiling

(只提供英文版本)

Big Data – Blog by YY

Data Profiling

In the previous weekly posts, Data Governance and Data Quality, as criterion not just for earlier data processing flows, but also for afterwards analysis and consistency on data source in data warehouse, have been introduced. However, there still have minor terms in existence of Data Governance and Data Quality. and that is what will be mentioned in this post, Data Profiling, statistical criterion on raw data or processed one.

继续阅读

Data Goverance

(只提供英文版本)

Big Data – Blog by YY

Data Goverance

Data Quality, the term being mentioned last week, has educed the following question —

whether there is any criterion, as standards, that force factors in Data Quality to follow.

That is, Data Governance, the overall management of the availability, usability, and security of data used in enterprises.

继续阅读

Continue Integration and Continue Delivery

(只提供繁体中文版本)

CICD – Blog by John Chang

Continue Integration and Continue Delivery

這禮拜建立了一套CI/CD的demo lab,用的是AWS的CodePipeline。Source的部分我用的是自己的GitHub,build跟deploy都是使用AWS的CodeBuild和CodeDeploy。EC2的部分,我建立了一台ubuntu instance,上面只跑了Tomcat webcontainer,目的是為了顯示整個pipeline完成後的結果。

继续阅读

Data Quality

(只提供英文版本)

Big Data – Blog by YY

Data Quality

Data, as the core factor in Data world. We might wonder whether there is criterion existed before everything getting started. What I meant “ everything getting started “ is the process of ETL, data deposit into DataBases(DBs)/DataWarehouses(DWs), and, of course, the afterwards analysis against the source we currently have.

The answer is “ YES “ and that is the topic I will introduce, “ Data Quality. “

Why it is important, because “ Quality data “ means “ USEFUL DATA, “ meaning data must be consistent and unambiguous. Data that is not high quality can undergo data cleansing to raise its quality.

继续阅读

Kibana – data visualization tool for Elasticsearch

(只提供英文版本)

Big Data – Blog by YY

Kibana – data visualization tool for Elasticsearch

So far, I’ve introduced what ETL/ELT, categories of data formats, as well as why the roles play an important role in Data World.
In this post, I am gonna take you to walk through one of the most popular data visualization tools – Kibana – for presenting “data” after processing.

继续阅读