Stars
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Example end to end data engineering project.
Easy to maintain open source documentation websites.
🔯 Modern, batteries-included Hugo theme for creating beautiful doc, blog and static websites
Hands-on MLOps projects to explore and learn the practical aspects of machine learning engineering for production.
Production ML rental prediction system.
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
A self-contained dbt project for testing purposes
A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting
Processing TfL data for bike usage with Google Cloud Platform.
This is a PySpark-based data pipeline that fetches weather data for a few cities, performs some basic processing and transformation on the data, and then writes the processed data to a Google Cloud…
A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!
docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud, Apache Kafka Confluent Platform
sontivr / data-pipeline-in-k8s
Forked from agdsouza/OnTheSamePageDeploying Data Pipelines - Kubernetes Way
Apache Pinot - A realtime distributed OLAP datastore
The official home of the Presto distributed SQL query engine for big data
Apache Superset is a Data Visualization and Data Exploration Platform
End to end data engineering project with kafka, airflow, spark, postgres and docker.
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Free Data Engineering course!
Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python