[go: up one dir, main page]

Skip to content

I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.

Notifications You must be signed in to change notification settings

SAKET-SK/Semester6-SPPU-Data-Analysis-Lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semester6-SPPU-Data-Analysis-Lab

------------------------Hadoop Assignments------------------------

  1. Hadoop Installation on a)Single Node b)Multiple Node
  2. Design a distributed application using MapReduce which processes a log file of a system. List out the users who have logged for maximum period on the system. Use simple log file from the Internet and process it using a pseudo distribution mode on Hadoop platform.
  3. Design and develop a distributed application to find the coolest/hottest year from the available weather data. Use weather data from the Internet and process it using MapReduce.
  4. Write an application using HBase and HiveQL for flight information system which will include a. Creating, Dropping, and altering Database tables b. Creating an external Hive table to connect to the HBase for Customer Information Table c. Load table with data, insert new values and field in the table, Join tables with Hive d. Create index on Flight information Table 5) Find the average departure delay per day in 2008.

------------------------R and Python Assignments------------------------

  1. Perform the following operations using R/Python on the Amazon book review and facebook metrics data sets a. Create data subsets b. Merge Data c. Sort Data d. Transposing Data e. Melting Data to long format f. Casting data to wide format
  2. Perform the following operations using R/Python on the Air quality and Heart Diseases data sets a. Data cleaning b. Data integration c. Data transformation d. Error correcting e. Data model building
  3. Integrate R/Python and Hadoop and perform the following operations on forest fire dataset a. Text mining in RHadoop b. Data analysis using the Map Reduce in Rhadoop c. Data mining in Hive
  4. Visualize the data using R/Python by plotting the graphs for assignment no. 2 and 3
  5. Perform the following data visualization operations using Tableau on Adult and Iris datasets a. 1D (Linear) Data visualization b. 2D (Planar) Data Visualization c. 3D (Volumetric) Data Visualization d. Temporal Data Visualization e. Multidimensional Data Visualization f. Tree/ Hierarchical Data visualization g. Network Data visualization

Best of Luck for completion of these assignments. Refer this repo and you are good to go 😊😊✔ Special Thanks to this YouTube Channel (https://www.youtube.com/channel/UC7t9h_6TMky-P68wdNmFWSQ) from where I refered and sucesfully completed all Hadoop Assignments without any error.

About

I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published