@iar 2017-12-11T02:02:07.000000Z 字数 8609 阅读 113

BD4H_Final project README.md

Gatech

Technique Stack

Big Data Analysis, Augmentation

Spark, Hadoop for preprocess and statistics analysis.
Main package: Skimage, OpenCV, sklearn, numpy, matplotlib. For detail, please refer to requirements.txt.

Deep Learning

Tensorflow-gpu v1.4 in VirtualEnv with Python v3.6.

Dataset

Please refer to the read.docx in Final_data shared folder from Google Drive to download dataset, and AlexNet weights. Then follow the README.docx in Final_data to put them into the codebase's directories accordingly.
we use 2 dataset and both are in open source domain, so free to download. However, the total raw dataset is about 50GB and will take days to re-run the preprocess. Therefore we recommand to download our preprocessed dataset (in pickled format) so it can be used directly in re-run our code.

1st dataset - Openi

openi dataset original link

2nd dataset - NIH

NIH dataset original link

File Structure

1. Folder: Orientation detector

code to do pre-process on Openi dataset to recognize front vs side X-ray images, so as to separate Openi-all dataset into Openi-front-only dataset.

To run them: (totally 11 steps)

put all .py files and y.csv under the folder contains images
y.csv contains 168 observations, they are human coded labels (by myself). 0: side, 1: front
run image.py first, it takes about 230s on my PC
run data_prep.py
run logistics.py
the final output is in y_output.csv:
- the 1st column is index start at 1
- the 2nd column is index start at 1.png
- the 3rd column is the label, 0: side, 1: front
the overall accuracy is around 92%
run move_pic.py to seperate front pics and side pics
double checked visually (by myself)
- 331 labeled fronts should be sides
- 147 labeled sides should be fronts
- accuracy is 93.59%
run img_list.py to get the file names lists:
- front_list.csv contains all front images
- side_list.csv contains all side images
- labeled_front_right.csv contains all front images labeled right by my detector
- labeled_side_right.csv contains all front images labeled right by my detector
- should_be_side.csv contains all side images but labeled as front
- should_be_front.csv contains all side images but labeled as side
- noise.csv contains noise image detected by human eyes(me)
copy front_list.csv, side_list.csv, labeled_front_right.csv, labeled_side_right.csv, should_be_side.csv, should_be_front.csv, noise.csv and img_divide.py into folder you want to decide front vs side images
run img_divide.py

2. Folder: Descriptive Analysis code results

code to do descriptive statistics analysis of pixel values of the gray scale imaCropped_image_Lenet_more_metric ges.

The Descriptive-Openi and Descriptive-NIH codes are similar.

To run them: (totally 3 steps)

Please change your spark home as below: findspark.init('PUT YOU SPARK HOME HERE')
Please set your image directory paths accordingly under the Set directory paths in the ipynb files.
The barplot.R used ggplot2 in r. Please use R and install ggplot2 first.

3. Folder: Openi code results

Prerequest

install tensorflow-gpu in virtualenv, using Python v3.6.
jupyter-notebook --ip=192.168.0.131 --no-browser. Please change the ip accordingly.
Please download bvlc-alexnet.npy in Google drive link, under Final_data.

sub-folder: GoogLeNet

This folder doesn't contain data, please download the data from Google Drive Link.

This folder contains the code for running the GoogLeNet for Openi_all and Openi_front_only datasets

Cropped_image_Lenet_more_metric.ipynb: Read in the pickled 90x90 data set and run GoogLeNet for Openi_all
Cropped_image_Lenet_front_only_more_metrics.ipynb: Read in the pickled 90x90 data set and run GoogLeNet for Openi_front_only
Google_Lenet_Xray.ipynb: Read in the normal and abnormal images, downsize the images and save them into pickled files.
process.ipynb: Read in the original downloaded images and divided them into normal and abnormal images
Text and image outputs and results of GoogLeNet are contained in the jupyter notebook.

sub-folder: AlexNet_transfer_learning

This folder contains the code for running the AlexNet for Openi_all and Openi_front_only datasets

alexnet_cutting_point.py: Pretrained AlexNet
Xray_AlexNet_train_feature_extraction_debug_cutting_point.py: Read in the pickled 90x90 data set and run AlexNet for Openi_all and Open_front_only
12_06_2017_front_and_side_output_images: Results and outputs of running AlexNet for Openi_all
12_07_2017_front_only_output_images: Results and outputs of running AlexNet for Openi_front_only

How to reproduce the results:

Download the corresponding (pickled and cropped) data from Google Drive
Run Cropped_image_Lenet_more_metric.ipynb
Run Cropped_image_Lenet_front_only_more_metrics.ipynb
Run Xray_AlexNet_train_feature_extraction_debug_cutting_point.py for Openi_front
Change the data loaded in from Openi_front to Openi_all , i.e., uncomment the following lines

#normal_img_load = '../dataset/openi/pickled_cropped_img/2697_cropped_normal_imgs_and_labels_90x90.p'
#abnormal_img_load = '../dataset/openi/pickled_cropped_img/3517_cropped_abnormal_imgs_and_labels_90x90.p'
#normal_imgs = normal_img_load_90x90['images'].reshape(2697, 90, 90, 1)
#abnormal_imgs = abnormal_img_load_90x90['images'].reshape(3517, 90, 90, 1)

Change the cutting point of AlexNet during tranfer learning
- The default is to use second final layer (fc7)
- If you want to test run other cutting point, please comment out the "fc7" section and uncomment the "fc6" and "maxpool5" sections

Comment # ------------ Cutting the AlexNet at the second final layer (fc7)----------#
Uncomment # ------------ Cutting the AlexNet at the third final layer (fc6)----------#

Outputs:
- It will oputput 5 figures and 2 .txt files
- The 5 figures are ROC curve and Accuracy, F1 score, Recall, Precision as a function of epochs
- The 2 .txt files are for the regeneration of ROC curves

4. Folder: NIH code results

Prerequest

install tensorflow-gpu in virtualenv, using Python v3.6.
jupyter-notebook --ip=192.168.0.131 --no-browser. Please change the ip accordingly.
Please download bvlc-alexnet.npy in givin Google drive link, since it exceeds the github 100MB file size limit.
Please download NIH/Openi preprocessed data (in pickle format)
- dataset (90x90) folder in root folder.
- pickled_data (120x120) folder inside LeNet folder.

sub-folder: AlexNet_transfer_learning

This folder holds the code to do transfer learning, based on pre-trained AlexNet and use different cutting point to train on NIH pickled dataset. All jupyter notebook has the result. When you re-run these ipynb, same result for each cell should be reproduced.

jupy_fc6_lr0002.ipynb: Training, validation and test using pre-trained AlexNet, cutting point at fc6 layer, with learning rate 0.0002.
jupy_fc6_lr0001.ipynb: Training, validation and test using pre-trained AlexNet, cutting point at fc6 layer, with learning rate 0.0001.
jupy_fc7_lr0002.ipynb: Training, validation and test using pre-trained AlexNet, cutting point at fc7 layer, with learning rate 0.0002.
jupy_fc7_lr0001.ipynb: Training, validation and test using pre-trained AlexNet, cutting point at fc7 layer, with learning rate 0.0001.
jupy_maxpool5.ipynb: Training, validation and test using pre-trained AlexNet, cutting point at maxpool5 layer, with learning rate 0.0001.
bvlc-alexnet.npy: AlexNet weights in TensorFlow

Example: run `jupy_fc6_lr0002.ipynb`

The final jupyter cell output will show each Epoch's result as well as 5 figures.
- The epoch's result contains:
  - Time spend (per this epoch)
  - Training Accuracy
  - Training precision
  - Training recall
  - Training f1
  - Validation Accuracy
  - Validation precision
  - Validation recall
  - Validation f1
- The 5 result plots are:
  - ROC curve for X-Ray image classifier
  - Training and Validation Accuracy
  - Training and Validation F1 Score
  - Training and Validation Recall
  - Training and Validation Precision
The running time is about 150 min on Server with 48 CPU cores, 2 Titan X and 128GB ram.

sub-folder: GoogLeNet

This folder holds the code to do GoogLeNet training, validation, and testing

pickled_data directory (input data)
- 120x120 pickled dataset
pickled_cropped_img directory (after processed data)
- 90x90 cropped pickled dataset
Cropped_image_Lenet_front_only_more_metrics.ipynb: Train/Validation/Test LeNet on front only Openi pickled images.
Cropped_image_Lenet_more_metric: Train/Validation/Test LeNet on all Openi pickled images.

Example: run `Cropped_image_Lenet_more_metric.ipynb`

This notebook includes 3 parts:
1. crop the pickle data from 120x120 to 90x90.
2. train on LeNet.
3. test predict.
The result is:
1. Training log:
  - EPOCH #
  - Training Accuracy
  - Training precision
  - Training recall
  - Training f1
  - Validation Accuracy
  - Validation precision
  - Validation recall
  - Validation f1
2. Plots:
  1. Validation Accuracy
  2. Training and Validation Accuracy
  3. Training and Validation F1 Score
  4. Training and Validation Recall
  5. Training and Validation Precision
  6. One plot includes all (Training/Validation Accuracy and F1 Score)
  7. ROC curve for X-Ray image classifier

sub-folder: dataset

NIH pickled dataset
- 50000_front_abnormal_imgs_and_labels_90x90.p
- 60000_front_normal_imgs_and_labels_90x90.p

5. requirements.txt

pip freeze of reuiqred package.

pip3 install -r requirements.txt to install all packages.

6. README.md

detail explain the structure of code base, step-by-step instruction to reproduce our result.

BD4H_Final project README.md

Technique Stack

Big Data Analysis, Augmentation

Deep Learning

Dataset

1st dataset - Openi

2nd dataset - NIH

File Structure

1. Folder: Orientation detector

To run them: (totally 11 steps)

2. Folder: Descriptive Analysis code results

To run them: (totally 3 steps)

3. Folder: Openi code results

Prerequest

sub-folder: GoogLeNet

sub-folder: AlexNet_transfer_learning

How to reproduce the results:

4. Folder: NIH code results

Prerequest

sub-folder: AlexNet_transfer_learning

Example: run jupy_fc6_lr0002.ipynb

sub-folder: GoogLeNet

Example: run Cropped_image_Lenet_more_metric.ipynb

sub-folder: dataset

5. requirements.txt

6. README.md

内容目录

Example: run `jupy_fc6_lr0002.ipynb`

Example: run `Cropped_image_Lenet_more_metric.ipynb`