-
Freelancer
- India
- https://linktr.ee/divithraju
- https://medium.com/@Divithraju
- in/divithraju
-
-
datacompy Public
Forked from capitalone/datacompyPandas, Polars, and Spark DataFrame comparison for humans and more!
Python Apache License 2.0 UpdatedSep 11, 2024 -
pyspark-example-project Public
Forked from AlexIoannides/pyspark-example-projectImplementing best practices for PySpark ETL jobs and applications.
Python UpdatedSep 9, 2024 -
awesome-spark Public
Forked from awesome-spark/awesome-sparkA curated list of awesome Apache Spark packages and resources.
-
pyspark-examples Public
Forked from spark-examples/pyspark-examplesPyspark RDD, DataFrame and Dataset Examples in Python language
-
-
divith-raju-Data-Mining Public
This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal…
-
This ETL project was designed to demonstrate the development of a scalable data pipeline for customer sales analysis. It covers all essential steps, from data extraction to transformation and load…
-
This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large da…
-
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipelin…
-
This ETL pipeline project is a practical demonstration of my skills in data engineering and automation using Python and Apache Airflow. By integrating MySQL for data storage and leveraging Airflow …
-
divith-raju-Python Public
This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository …
-
divith-raju-Pyspark-work Public
-
divith-raju-PySpark-Projects Public
-
The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this to…
-
divith-raju-big-data-tools
-
1 Updated
Mar 31, 2024 -
-
-
Big Data Platform on MongoDB Atlas and Heroku PostgreSQL
-
-
divith-raju-Sweetviz-Package Public
-
1 Updated
May 9, 2023 -
-
divith-raju-OpenMetadata Public
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
-
-
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
-
Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web server log data
1 UpdatedDec 24, 2022 -
search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance base…