Day 2, R0 13:15–14:00
In today's big data world, the data you need to analyze comes from diverse sources in a variety of different formats. Combining all that data and reconciling it is incredibly difficult. Based on your need, adopting a proper and manageable ETL tool can make data integration easier.
An open source project, Apache NiFi, is a tool to built to automate and manage the flow of data between systems. You can use NiFi to build streaming data pipelines between different data-related systems, including Apache Kafka and Apache Spark, various RDBS, and so much more!
In this talk, I will start with introducing a concept of ETL and Apache NiFi, what it can solve, and how to use Python to enable NiFi's ability. Then, a sample demo will help you to understand how to build a streaming data pipeline with NiFi.
Speaker: Shuhsi Lin
A data engineer and python programmer. Currently working on various data applications in a manufacturing company.
Research interests: IoT applications, data streaming processing, data analysis and data visualization.