
Google Cloud Dataflow
706
Course Code:
In this course, you will learn how to use Dataflow to extract, transform, and load data from multiple data sources and into Google BigQuery for analysis.
5 Days
Advanced

Learning Objectives
Gain knowledge of how to create and deploy pipeline.
Learn how to configure internet access and firewall rules for Cloud Dataflow.
Prerequisites:
Completed CP100A: Google Cloud Platform Fundamentals
Familiarity with extract, transform, and load (ETL) activities
Some programming experience in Java or a similar language
Intended Audience
This course is for IT Professionals who want to learn about Google Cloud Technologies.
.jpg)
Course Outline:
Module 1: Data Flows
What Is a Data Flow?Â
Batch vs. StreamingÂ
Module 2: Examples of Data FlowsÂ
Moving DataÂ
Moving and Transforming DataÂ
Multiple InputsÂ
Multiple OutputsÂ
Module 3: Google Cloud Dataflow
Â
What Is Google Cloud Dataflow?Â
FeaturesÂ
Integration with GCPÂ
Types of InputÂ
Types of OutputÂ
SDKÂ
Core TypesÂ
Activity: Designing Real-World Data FlowsÂ
Module 4: Google Cloud DataflowÂ
What Is Google Cloud Dataflow?Â
FeaturesÂ
Integration with GCPÂ
Types of InputÂ
Types of OutputÂ
SDKÂ
Core TypesÂ
Activity: Designing Real-World Data FlowsÂ
Dataflow PipelinesÂ
Module 5: Creating Dataflow ProjectsÂ
Creating a ProjectÂ
Initializing gcloudÂ
DependenciesÂ
Creating a Dataflow Project with MavenÂ
PipelinesÂ
Pipeline OptionsÂ
Creating a PCollectionÂ
Input-Transform-OutputÂ
Running Pipelines LocallyÂ
Exercise 1: Creating a Simple Data FlowÂ
Module 6: Testing and Debugging PipelinesÂ
OutputÂ
LoggingÂ
Unit TestingÂ
Outputting ResultsÂ
Exercise 2: Testing, Logging, and Outputting PipelinesÂ
Module 7: Running Pipelines in Google Cloud DataflowÂ
Monitoring ProgressÂ
InstancesÂ
Viewing LogsÂ
Exercise 3: Running Data Flows in the CloudÂ
Module 8: Programming the Dataflow SDKÂ
Module 9: PCollectionsÂ
PCollection CharacteristicsÂ
Generic TypesÂ
Creating PCollections from MemoryÂ
Read TransformsÂ
Text.IO Read ExampleÂ
Writing TransformsÂ
Text.IO Write ExampleÂ
Exercise 4: Using Text.IO to Read and Write DataÂ
Module 10: Basic TransformsÂ
ParDoÂ
Input and Output TypesÂ
Side InputsÂ
Side OutputsÂ
.apply()Â
Chaining TransformsÂ
CountÂ
Formatting ResultsÂ
Writing OutputÂ
Exercise 5: Chaining TransformsÂ
Module 11: GroupByKey Transform
Â
GroupByKey ExampleÂ
WindowingÂ
Exercise 6: Grouping Output by KeyÂ
Module 12: Composite TransformsÂ
Combining Multiple TransformsÂ
PTransformÂ
Overriding apply()Â
Exercise 7: Write Composite TransformsÂ
Module 13: Multiple InputsÂ
Multiple Collections for the Same TypeÂ
PCollectionListÂ
FlattenÂ
Multiple Collections of Different TypesÂ
PCollectionTupleÂ
Exercise 8: Handling Multiple InputsÂ
Integrating Dataflow with BigQueryÂ
Module 14: BigQueryÂ
BigQuery OverviewÂ
BigQuery ProjectsÂ
DataSets and TablesÂ
Table SchemasÂ
Table Data SourcesÂ
Table Source FormatÂ
Module 15: Writing to BigQuery from DataflowÂ
Defining Table SchemasÂ
Outputting Dataflow Jobs to BigQuery TablesÂ
BigQuery.IO ExampleÂ
Exercise 9: Outputting to BigQueryÂ
Streaming Data Flows with Pub/SubÂ
Module 16: Pub/SubÂ
What Is PubSub?Â
Topics and SubscriptionsÂ
Push vs. Pull SubscriptionsÂ
Module 17: Using Pub/Sub with HttpÂ
Integrating Pub/Sub with HTTP ClientsÂ
Scopes and Authentication TokensÂ
Creating Topics and Subscriptions with HttpÂ
Publishing and Receiving MessagesÂ
Acknowledging Message ReceiptÂ
Module 18: Using Pub/Sub with AppEngineÂ
Setting up Pub/Sub Endpoints in AppEngine AppsÂ
Programming Pub/Sub with the Python SDKÂ
Using Web Hooks to Process Pub/Sub MessagesÂ
Module 19: Using Pub/Sub with DataflowÂ
Reading from PubSubÂ
PubSub.IO Read ExampleÂ
Writing to PubSubÂ
PubSub.IO Write ExampleÂ
WindowingÂ
Running Dataflow Streaming JobsÂ
Exercise 10: Creating a Streaming Data FlowÂ
Program Highlights
Highly engaging & interactive sessions
70% Hands On
Quizzes & Assessments
24*7 Support
Why TechEd Trainings?
​Handcrafted Content
Professional Trainers
Hands On Labs
Seamless Delivery