top of page
TechEd Course Banner.png
logo-dark.webp

Google Cloud Dataflow

706

Course Code:

In this course, you will learn how to use Dataflow to extract, transform, and load data from multiple data sources and into Google BigQuery for analysis.

5 Days

Advanced

fotor_2023-5-12_10_29_21.png
Learning Objectives
  • Gain knowledge of how to create and deploy pipeline.


  • Learn how to configure internet access and firewall rules for Cloud Dataflow.

Anchor 1
Prerequisites:
  • Completed CP100A: Google Cloud Platform Fundamentals

  • Familiarity with extract, transform, and load (ETL) activities

  • Some programming experience in Java or a similar language

Intended Audience

This course is for IT Professionals who want to learn about Google Cloud Technologies.

focused-students-using-tablet-discussing-information (1).jpg
Course Outline:

Module 1: Data Flows


  • What Is a Data Flow? 

  • Batch vs. Streaming 


Module 2: Examples of Data Flows 


  • Moving Data 

  • Moving and Transforming Data 

  • Multiple Inputs 

  • Multiple Outputs 


Module 3: Google Cloud Dataflow

 

  • What Is Google Cloud Dataflow? 

  • Features 

  • Integration with GCP 

  • Types of Input 

  • Types of Output 

  • SDK 

  • Core Types 

  • Activity: Designing Real-World Data Flows 


Module 4: Google Cloud Dataflow 


  • What Is Google Cloud Dataflow? 

  • Features 

  • Integration with GCP 

  • Types of Input 

  • Types of Output 

  • SDK 

  • Core Types 

  • Activity: Designing Real-World Data Flows 


Dataflow Pipelines 


Module 5: Creating Dataflow Projects 


  • Creating a Project 

  • Initializing gcloud 

  • Dependencies 

  • Creating a Dataflow Project with Maven 

  • Pipelines 

  • Pipeline Options 

  • Creating a PCollection 

  • Input-Transform-Output 

  • Running Pipelines Locally 

  • Exercise 1: Creating a Simple Data Flow 


Module 6: Testing and Debugging Pipelines 


  • Output 

  • Logging 

  • Unit Testing 

  • Outputting Results 

  • Exercise 2: Testing, Logging, and Outputting Pipelines 


Module 7: Running Pipelines in Google Cloud Dataflow 


  • Monitoring Progress 

  • Instances 

  • Viewing Logs 

  • Exercise 3: Running Data Flows in the Cloud 


Module 8: Programming the Dataflow SDK 


Module 9: PCollections 


  • PCollection Characteristics 

  • Generic Types 

  • Creating PCollections from Memory 

  • Read Transforms 

  • Text.IO Read Example 

  • Writing Transforms 

  • Text.IO Write Example 

  • Exercise 4: Using Text.IO to Read and Write Data 


Module 10: Basic Transforms 


  • ParDo 

  • Input and Output Types 

  • Side Inputs 

  • Side Outputs 

  • .apply() 

  • Chaining Transforms 

  • Count 

  • Formatting Results 

  • Writing Output 

  • Exercise 5: Chaining Transforms 


Module 11: GroupByKey Transform

 

  • GroupByKey Example 

  • Windowing 

  • Exercise 6: Grouping Output by Key 


Module 12: Composite Transforms 


  • Combining Multiple Transforms 

  • PTransform 

  • Overriding apply() 

  • Exercise 7: Write Composite Transforms 


Module 13: Multiple Inputs 


  • Multiple Collections for the Same Type 

  • PCollectionList 

  • Flatten 

  • Multiple Collections of Different Types 

  • PCollectionTuple 

  • Exercise 8: Handling Multiple Inputs 


Integrating Dataflow with BigQuery 


Module 14: BigQuery 


  • BigQuery Overview 

  • BigQuery Projects 

  • DataSets and Tables 

  • Table Schemas 

  • Table Data Sources 

  • Table Source Format 


Module 15: Writing to BigQuery from Dataflow 


  • Defining Table Schemas 

  • Outputting Dataflow Jobs to BigQuery Tables 

  • BigQuery.IO Example 

  • Exercise 9: Outputting to BigQuery 


Streaming Data Flows with Pub/Sub 


Module 16: Pub/Sub 


  • What Is PubSub? 

  • Topics and Subscriptions 

  • Push vs. Pull Subscriptions 


Module 17: Using Pub/Sub with Http 


  • Integrating Pub/Sub with HTTP Clients 

  • Scopes and Authentication Tokens 

  • Creating Topics and Subscriptions with Http 

  • Publishing and Receiving Messages 

  • Acknowledging Message Receipt 


Module 18: Using Pub/Sub with AppEngine 


  • Setting up Pub/Sub Endpoints in AppEngine Apps 

  • Programming Pub/Sub with the Python SDK 

  • Using Web Hooks to Process Pub/Sub Messages 


Module 19: Using Pub/Sub with Dataflow 


  • Reading from PubSub 

  • PubSub.IO Read Example 

  • Writing to PubSub 

  • PubSub.IO Write Example 

  • Windowing 

  • Running Dataflow Streaming Jobs 

  • Exercise 10: Creating a Streaming Data Flow 



Program Highlights

Highly engaging & interactive sessions

70% Hands On

Quizzes & Assessments

24*7 Support

Submit Request

Thanks for submitting!

Contact Us Now

+91 953-537-5027

Why TechEd Trainings?

​Handcrafted Content

Professional Trainers

Hands On Labs

Seamless Delivery

bottom of page