Details in blog post: https://blog.munhou.com/2020/07/12/Pytorch-Implementation-of-GEE-A-Gradient-based-Explainable-Variational-Autoencoder-for-Network-Anomaly-Detection/
Create a new conda environment
conda create -n gee python=3.7.7
conda activate gee
conda install pyspark=3.0.0 click=7.1.2 jupyterlab=2.1.5 seaborn=0.10.1
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.1 -c pytorch
conda install pytorch-lightning=0.8.4 shap=0.35.0 -c conda-forge
pip install petastorm==0.9.2
Download the processed data here or perform all the following steps.
- Download raw data march_week3_csv.tar.gz and july_week5_csv.tar.gz.
- Decompress files.
tar -xvf march_week3_csv.tar.gz tar -xvf july_week5_csv.tar.gz
- Separate files by date.
grep '^2016-03-18' march.week3.csv.uniqblacklistremoved >> 20160318.csv grep '^2016-03-19' march.week3.csv.uniqblacklistremoved >> 20160319.csv grep '^2016-03-20' march.week3.csv.uniqblacklistremoved >> 20160320.csv grep '^2016-07-30' july.week5.csv.uniqblacklistremoved >> 20160730.csv grep '^2016-07-31' july.week5.csv.uniqblacklistremoved >> 20160731.csv
- Put
20160319.csv
and20160730.csv
todata/train
folder,20160318.csv
,20160320.csv
, and20160731.csv
todata/test
folder. - Perform feature extraction.
python feature_extraction.py --train data/train --test data/test --target_train feature/train.feature.parquet --target_test feature/test.feature.parquet
Download the processed data here or perform all the following steps.
python build_model_input.py --train feature/train.feature.parquet --test feature/test.feature.parquet --target_train model_input/train.model_input.parquet --target_test model_input/test.model_input.parquet
Download pre-trained model here or perform all the following steps.
python train_vae.py --data_path model_input/train.model_input.parquet --model_path model/vae.model --gpu True