This project is means to give an understanding on how sentiment analysis can be run on tweets with Python. Sentiment analysis or opinion analysis is; “contextual mining of text which identifies and extracts subjective information in source material and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations.” I will be using the data generated by the two presidents of the United States, Donald Trump and Barack Obama by treating them as self-brands. The project will also cover data preparation and cleaning techniques for posterior analysis.
The syntax is written in Python 3.5.2 using Spyder IDE. A basic understanding of Python programming language is required to reuse and understand the script. For the most part the script relies only on standard scientific python tools (numpy/matplotlib/pandas/seaborn). You will be additionally requiring the following Python libraries:
Tweepy: A well-known Python wrapper for Twitter API.
Textblob: A simplistic Python library for processing textual data.
Before beginning, you will be required to install tweepy and textblob via pip.The project will cover the following topics:
- Accessing Twitter Data with Python
- Building dataframes with Pandas
- Building datetime arrays (weekday and month)
- Data types, Nulls and Descriptive analysis
- Python Regex for Data Cleaning (HTML, URLs and @mentions)
- Pandas Time Series and Weekly Analysis
- Graphical Representations of findings
- Sentiment analysis
All graphs and results created in this project can be found in an excel in the repository. The data used for analysis in this project will also be found in the same excel file. You will find a detailed explanation for all codes utlized on my website: