#

apache-hadoop

Here are 81 public repositories matching this topic...

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

python java machine-learning scala apache-spark distributed-computing design-patterns pyspark mapreduce reducers partitioning hadoop-mapreduce distributed-algorithms mappers data-algorithms apache-hadoop

Updated Oct 14, 2024
Java

mahmoudparsian / big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Updated Nov 13, 2024
HTML

tencentyun / hadoop-cos

hadoop-cos（CosN文件系统）为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持，可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage

alluxio apache-hadoop tencent-cloud-cos hadoop-compatible-filsystem

Updated Nov 4, 2024
Java

s911415 / apache-hadoop-3.1.0-winutils

HADOOP 3.1.0 winutils

hadoop native winutils apache-hadoop

Updated Apr 12, 2018
Batchfile

PBWebMedia / yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

yarn hadoop metrics exporter apache prometheus resource-manager yarn-hadoop-cluster apache-hadoop

Updated Oct 22, 2024
Go

realtimedatalake / hive-metastore-docker

Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments

docker open-source big-data docker-compose postgresql apache-hive apache-hadoop apache-hive-metastore rtdl

Updated Jan 31, 2022
Dockerfile

Guru107 / hadoop-small-files-merger

A Spark application to merge small files on Hadoop

scala apache-spark avro text parquet apache-hadoop

Updated Sep 7, 2020
Scala

spark-minimal-algorithms

kowaalczyk / spark-minimal-algorithms

An python implementation of Minimal Mapreduce Algorithms for Apache Spark

python spark apache-spark algorithms python3 pyspark hadoop-mapreduce apache-hadoop minimal-algorithms

Updated Jun 22, 2020
Python

mohammadtavakoli78 / Cloud-Computing

This is projects of Cloud Computing Course

docker kubernetes yarn hadoop docker-compose helm cloud-computing hdfs helm-charts statefulsets cloud-services helm-chart statefulset apache-hadoop

Updated Sep 2, 2022
Python

RBC-DSAI-IITM / DCEIL

A fast, scalable and distributed community detection algorithm based on CEIL scoring function.

apache-spark community-detection apache-hadoop

Updated Jan 1, 2019
Scala

Coursal / Hadoop-Examples

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

java hadoop examples mapreduce hadoop-mapreduce mapreduce-java hadoop-example apache-hadoop

Updated May 22, 2024
Java

nghoanglongde / spark-cluster-with-docker

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

docker apache-spark apache-hadoop

Updated May 10, 2024
Shell

bdoepf / aws-emr-prometheus

emr aws apache-spark prometheus apache-flink emr-cluster apache-hadoop

Updated Jan 5, 2021
HCL

whoami-anoint / EasyHadoop

Simplified Hadoop Setup and Configuration Automation

data-science big-data hdfs ec2-instance big-data-analytics apache-hadoop big-data-projects hdfs-cluster big-data-essentials

Updated Sep 2, 2023
Shell

haodemon / HadoopStreaming

Set of Input Formats for Hadoop Streaming

hadoop inputformat apache-hadoop

Updated Sep 25, 2024
Java

jagdish4501 / Network-intrusion-Detection

This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.

numpy scikit-learn pandas matplotlib scapy npcap libcap apache-hadoop

Updated May 8, 2024
Jupyter Notebook

chriskery / hadoop-operator

Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.

kubernetes hadoop k8s hadoop-cluster kubernetes-operator apache-hadoop

Updated Jan 19, 2024
Go

felidsche / mail-spam-filter

An email spam filter using Apache Spark’s ML library

apache-spark spark-ml apache-hadoop

Updated Apr 14, 2021
Python

Jordan396 / Giraph-1.2.0-Installation

Instructions for Installing Giraph-1.2.0

virtual-machine google-cloud giraph apache-hadoop ubuntu1804

Updated May 1, 2019

sawadogosalif / Big-Data-Technologies

Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal

apache-spark apache-kafka apache-hive apache-hadoop apache-hbase pysark

Updated Apr 30, 2022
Python

Improve this page

Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."