In this project, we digged into the popularity of news on the Facebook News platform. With the timing of publishment an important feature that affects the popularity of news, we developed a real time predicting system for the popularity of certain piece of news.
Preprocess and EDA
We performed data preprocessing first and then exploratory data analysis.
ML pipeline
We applied a ML pipeline for multiple regression models, including random forest, gradient boosting and SVR, on the transformed data.
Cross Validation and feature importances
We selected the best parameters by cross validation for each model, and then analyzed global feature importances by permutation on each model.
Conclusions
Finally, we provided a business interpretation and outlook based on our project.
The python version and package versions should be:
Python 3.7
numpy==1.17.1
pandas==0.25.0
matplotlib==3.1.1
scikit-learn==0.21.3
plotly==4.1.1