MapReduce, Spark, Java, and Scala for Data Algorithms Book
-
Updated
Oct 14, 2024 - Java
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
Export Hadoop YARN (resource-manager) metrics in prometheus format
Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments
A Spark application to merge small files on Hadoop
An python implementation of Minimal Mapreduce Algorithms for Apache Spark
This is projects of Cloud Computing Course
A fast, scalable and distributed community detection algorithm based on CEIL scoring function.
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Simplified Hadoop Setup and Configuration Automation
Set of Input Formats for Hadoop Streaming
This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
An email spam filter using Apache Spark’s ML library
Instructions for Installing Giraph-1.2.0
Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal
Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.
To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."