This project demonstrates the development of a Real-Time Alert Notification System for monitoring vital health parameters using data pipelines. It leverages streaming data from IoT devices in hospitals and health centers to trigger real-time alerts when patients' vital signs (such as body temperature, heartbeat, blood pressure, etc.) fall outside predefined thresholds. The solution utilizes a combination of Apache Kafka, Apache Spark, Hive, HBase, Sqoop, and Amazon SNS for end-to-end data processing and alerting.
The rise of IoT in healthcare has enabled continuous monitoring of patient vitals in real-time. This repository contains the implementation of a robust and scalable data pipeline designed to capture, store, and process high-velocity IoT data. The system monitors patient data and sends alerts to medical professionals whenever vitals deviate from normal ranges, ensuring timely interventions and improved patient care.
- Real-time Data Ingestion: Stream patient vital signs from IoT devices using Apache Kafka.
- Stream Processing & Analytics: Use Apache Spark (PySpark) to process real-time data and compare it with reference thresholds.
- Reference Data Storage: Store threshold values for vital signs in HBase for quick lookups during stream processing.
- Data Storage: Store processed and raw data in Hive for historical analysis and reporting.
- Alert Notifications: Trigger email notifications via Amazon SNS when vitals exceed normal ranges.
- Data Import/Export: Use Apache Sqoop to move data between RDBMS and Hadoop/Hive ecosystems.
IoT Data Ingestion: IoT devices continuously push vital signs data (e.g., temperature, heart rate, BP) into Kafka topics.
Real-Time Stream Processing: Spark Streaming reads data from Kafka, processes the stream, and checks whether vitals cross the thresholds stored in HBase. Alerts: If any vital parameter exceeds its threshold, the system triggers an alert notification to the registered email using Amazon SNS. Data Storage: Processed records are saved in Hive for long-term storage and batch analysis. Data Management: Sqoop is used to import/export patient data between relational databases (RDS/MySQL) and the Hadoop ecosystem.
Apache Kafka: Distributed streaming platform for real-time data ingestion. Apache Spark (PySpark): Real-time data processing and analytics. Apache HBase: NoSQL database for storing reference threshold data. Hive: Data warehouse system for structured data storage and querying. Sqoop: Tool for data import/export between RDBMS and Hadoop. Amazon SNS: Simple Notification Service for sending email alerts. HDFS: Distributed file system for storing large volumes of data.
Java 8 or higher, Python 3.x, Apache Kafka, Apache Spark, Hadoop (HDFS), Hive, HBase, Sqoop, AWS Account for SNS, MySQL (or any RDBMS for patient records)
-> Kafka Setup: 1. Install and configure Kafka. 2. Create Kafka topics for vital signs data. -> HBase Setup: 1. Install HBase and create a table for storing threshold reference data. -> Spark Streaming Setup: 1. Install Spark and set up streaming jobs to read from Kafka and process data. -> Hive Setup: 1. Set up Hive to store patient vital data. -> Sqoop Setup: 1. Install Sqoop to import/export data between MySQL and Hive. -> SNS Setup: 1. Configure AWS SNS for sending alert notifications via email.
Contributions are welcome! If you have suggestions for improvements or new features, feel free to create an issue or submit a pull request.
License This project is licensed under the MIT License - see the LICENSE file for details.