Getting Started with Apache Beam

This is 3-2-1-go project on how to get started with Apache Beam.

Inverted Index

More on this on Medium: https://medium.com/@davide.anastasia/getting-started-with-apache-beam-26bfc5126438

The idea behind this simple batch job is to create an inverted index: given a set of documents in text format, the job will parse and build a word -> location mapping for each of the words. The job is an interesting toy, as it shows how:

read data + file name (slightly different than using TextIO)
filter out common stop words (in a very naive way, but more interesting ways can be found!)
create a CombineFn in order to avoid streaming all the data for a single key to a single node

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
bootstrap.sh		bootstrap.sh
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started with Apache Beam

Inverted Index

References

About

Releases

Packages

Languages

davideanastasia/apache-beam-getting-started

Folders and files

Latest commit

History

Repository files navigation

Getting Started with Apache Beam

Inverted Index

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages