ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
-
Updated
Nov 15, 2024 - Python
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰
Squirrel dataset hub
Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate
Enables custom tracing of Python applications in Dynatrace
Product scraping from Walmart Canada website, with further cleaning and integration of data from a different store.
Python package for seamless data integration from multiple sources like CSV, Excel, Google Sheets, and MongoDB. It simplifies data loading and transformation with a unified interface, supporting future expansions to more databases and cloud storage services.
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
RKI Metadata Exchange | API and GUI micro service for distributing metadata items before it gets picked up by ETL-pipelines for further processing.
Apache Paimon Python The Python implementation of Apache Paimon.
IDPS-ESCAPE (Intrusion Detection and Prevention Systems for Evading Supply Chain Attacks and Post-compromise Effects), part of the CyFORT project: open-source SOAR system powered by a dedicated ML-based anomaly detection toolbox (ADBox) integrated with open-source software such as Wazuh and Suricata.
Python script to extract all .csv/.txt files from a specific AWS S3 bucket & generate the .sql scripts to ingest the files into a AWS Redshift database.
Built real-time data streaming system using the Hadoop ecosystem, which will perform data extraction, data ingestion, data storage data retrieval, data transformation and data analysis in real time.
Infer SQL DDL statements from tabular data.
This project involved analyzing AdventureWorks bike sales data to uncover key insights into sales performance by country, customer segments, and products. The findings informed strategies for targeted marketing, market expansion, promotional timing, and product quality improvements.
Data ingestion from Google Sheet to BigQuery
This Repository contains the contents related to Data Engineering Using AWS
This project, you will build a full AI pipeline for an image classification task using Convolutional Neural Networks (CNNs). The project will cover data ingestion, preprocessing, model training, deployment, and CI/CD integration using GitHub Actions, Docker, and AWS.
Add a description, image, and links to the data-ingestion topic page so that developers can more easily learn about it.
To associate your repository with the data-ingestion topic, visit your repo's landing page and select "manage topics."