💳 ETL Credit Card Data Set 💳

using Pentaho Data Integration (PDI)/Kettle ⚙

.: 📄 Dataset taken from Kaggle :.

📃 Table of Contents:

About Project
Objectives
Data Set Description
ETL Process
Preview Output File

🖋 About Project

This repository contains:
- ETL file using Pentaho Data Integration (PDI).
- CSV file that has gone through ETL process.
This project will also:
- Clean and transform both data sets (application record and credit record),
- Merge, clean, and transform data sets into one data set (in CSV format).

📌 Objectives

Perform ETL using PDI for both datasets.
Create time dimension using PDI.
Create fact table using PDI.

🧾 Data Set Description

The dataset description can be seen here.

⚙ ETL Process

👨‍💼 Application Record

▶ CSV file input

Importing application record csv.

▶ Sort rows

Sort data based on ID (in ascending order).

▶ Unique rows

Filtering duplicate ID.

▶ Replace in string

Replacing 'Y' with 1, and 'N' with 0.

▶ Add constants

Adding 'Current_Date' column.

▶ Calculator

Calculating applicant age and how long applicant have been working (in years).

▶ Filter rows

Filtering applicant that is less than 21 y.o.
Filtering applicant with null/empty values

💶 Credit Record

▶ CSV file input

Importing credit record csv.

▶ Sort rows 2

Sort data based on ID (in ascending order).

▶ Add constants 2

Adding 'Current_Date' column.

▶ Calculator 2

Calculating month loan payment.
Creating copy of 'STATUS' column.

▶ Replace in string 2

Replace C, X, 0 with 'Good Debt' (C: loan for that month is already paid; X: no loan for that month; 0: loan is 1 to 29 days overdue).
Replace 1, 2, 3, 4, 5 with 'Bad Debt' (1: loan is 30 to 59 days overdue; 2: loan is 60 to 89 days overdue; 3: loan is 90 to 119 days overdue; 4: loan is 120 to 149 days overdue; 5: loan is more than 150 days overdue).

▶ Calculator 3

Creating 2 copies of 'STATUS2' column (Good_Debt and Bad_Debt).

▶ Replace in string 3

Good_Debt: Good Debt will be change to 1, while Bad Debt will be change to 0.
Bad_Debt: Good Debt will be change to 0, while Bad Debt will be change to 1.

▶ Group by

Calculating total of Good Debt and Bad Debt from each applicant (similar to group by function in SQL).

▶ Modified JavaScript value

If the total of Good Debt is higher than Bad Debt, then an applicant status will be eligible (1).
If the total of Bad Debt is higher than Good Debt, then an applicant status will be not eligible (0).

📥 Output file

▶ Stream lookup

Bad_Debt_CNT, Good_Debt_CNT, and STATUS will be merged based on applicant ID.

▶ Filter rows

Applicant with empty Bad_Debt_CNT, Good_Debt_CNT, and STATUS will be deleted.

▶ Select values 2

Select columns that will extracted.

▶ Text file output

Exporting cleaned and transformed data set into CSV file.

👀 Preview Output File

🙌 Support me!

👉 If you find this project useful, please ⭐ this repository 😆!

🎈 Check out my work using AutoML/PyCaret with this processed data set here!

👉 More about myself: here

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Screenshot		Screenshot
Application_Data.csv		Application_Data.csv
Main.ktr		Main.ktr
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💳 ETL Credit Card Data Set 💳

📃 Table of Contents:

🖋 About Project

📌 Objectives

🧾 Data Set Description

⚙ ETL Process

👨‍💼 Application Record

▶ CSV file input

▶ Sort rows

▶ Unique rows

▶ Replace in string

▶ Add constants

▶ Calculator

▶ Filter rows

💶 Credit Record

▶ CSV file input

▶ Sort rows 2

▶ Add constants 2

▶ Calculator 2

▶ Replace in string 2

▶ Calculator 3

▶ Replace in string 3

▶ Group by

▶ Modified JavaScript value

📥 Output file

▶ Stream lookup

▶ Filter rows

▶ Select values 2

▶ Text file output

👀 Preview Output File

🙌 Support me!

🎈 Check out my work using AutoML/PyCaret with this processed data set here!

About

Releases

Packages

caesarmario/etl-credit-card-dataset-using-pentaho

Folders and files

Latest commit

History

Repository files navigation

💳 ETL Credit Card Data Set 💳

📃 Table of Contents:

🖋 About Project

📌 Objectives

🧾 Data Set Description

⚙ ETL Process

👨‍💼 Application Record

▶ CSV file input

▶ Sort rows

▶ Unique rows

▶ Replace in string

▶ Add constants

▶ Calculator

▶ Filter rows

💶 Credit Record

▶ CSV file input

▶ Sort rows 2

▶ Add constants 2

▶ Calculator 2

▶ Replace in string 2

▶ Calculator 3

▶ Replace in string 3

▶ Group by

▶ Modified JavaScript value

📥 Output file

▶ Stream lookup

▶ Filter rows

▶ Select values 2

▶ Text file output

👀 Preview Output File

🙌 Support me!

🎈 Check out my work using AutoML/PyCaret with this processed data set here!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages