[关闭]
@iar 2017-12-11T10:02:07.000000Z 字数 8609 阅读 101

BD4H_Final project README.md

Gatech


Technique Stack

Big Data Analysis, Augmentation

Deep Learning

Dataset

1st dataset - Openi

2nd dataset - NIH

File Structure

1. Folder: Orientation detector

code to do pre-process on Openi dataset to recognize front vs side X-ray images, so as to separate Openi-all dataset into Openi-front-only dataset.

To run them: (totally 11 steps)

  1. put all .py files and y.csv under the folder contains images
  2. y.csv contains 168 observations, they are human coded labels (by myself). 0: side, 1: front
  3. run image.py first, it takes about 230s on my PC
  4. run data_prep.py
  5. run logistics.py
  6. the final output is in y_output.csv:
    • the 1st column is index start at 1
    • the 2nd column is index start at 1.png
    • the 3rd column is the label, 0: side, 1: front
  7. the overall accuracy is around 92%
  8. run move_pic.py to seperate front pics and side pics
  9. double checked visually (by myself)

    • 331 labeled fronts should be sides
    • 147 labeled sides should be fronts
    • accuracy is 93.59%
  10. run img_list.py to get the file names lists:

    • front_list.csv contains all front images
    • side_list.csv contains all side images
    • labeled_front_right.csv contains all front images labeled right by my detector
    • labeled_side_right.csv contains all front images labeled right by my detector
    • should_be_side.csv contains all side images but labeled as front
    • should_be_front.csv contains all side images but labeled as side
    • noise.csv contains noise image detected by human eyes(me)
  11. copy front_list.csv, side_list.csv, labeled_front_right.csv, labeled_side_right.csv, should_be_side.csv, should_be_front.csv, noise.csv and img_divide.py into folder you want to decide front vs side images

  12. run img_divide.py

2. Folder: Descriptive Analysis code results

  • code to do descriptive statistics analysis of pixel values of the gray scale imaCropped_image_Lenet_more_metric ges.
  • The Descriptive-Openi and Descriptive-NIH codes are similar.

To run them: (totally 3 steps)

  1. Please change your spark home as below: findspark.init('PUT YOU SPARK HOME HERE')
  2. Please set your image directory paths accordingly under the Set directory paths in the ipynb files.
  3. The barplot.R used ggplot2 in r. Please use R and install ggplot2 first.

3. Folder: Openi code results

Prerequest

sub-folder: GoogLeNet

  • This folder doesn't contain data, please download the data from Google Drive Link.
  • This folder contains the code for running the GoogLeNet for Openi_all and Openi_front_only datasets

sub-folder: AlexNet_transfer_learning

This folder contains the code for running the AlexNet for Openi_all and Openi_front_only datasets

How to reproduce the results:

  1. Download the corresponding (pickled and cropped) data from Google Drive
  2. Run Cropped_image_Lenet_more_metric.ipynb
  3. Run Cropped_image_Lenet_front_only_more_metrics.ipynb
  4. Run Xray_AlexNet_train_feature_extraction_debug_cutting_point.py for Openi_front
  5. Change the data loaded in from Openi_front to Openi_all , i.e., uncomment the following lines
  1. #normal_img_load = '../dataset/openi/pickled_cropped_img/2697_cropped_normal_imgs_and_labels_90x90.p'
  2. #abnormal_img_load = '../dataset/openi/pickled_cropped_img/3517_cropped_abnormal_imgs_and_labels_90x90.p'
  3. #normal_imgs = normal_img_load_90x90['images'].reshape(2697, 90, 90, 1)
  4. #abnormal_imgs = abnormal_img_load_90x90['images'].reshape(3517, 90, 90, 1)
  1. Change the cutting point of AlexNet during tranfer learning
    • The default is to use second final layer (fc7)
    • If you want to test run other cutting point, please comment out the "fc7" section and uncomment the "fc6" and "maxpool5" sections
  1. Comment # ------------ Cutting the AlexNet at the second final layer (fc7)----------#
  2. Uncomment # ------------ Cutting the AlexNet at the third final layer (fc6)----------#
  1. Outputs:
    • It will oputput 5 figures and 2 .txt files
    • The 5 figures are ROC curve and Accuracy, F1 score, Recall, Precision as a function of epochs
    • The 2 .txt files are for the regeneration of ROC curves

4. Folder: NIH code results

Prerequest

sub-folder: AlexNet_transfer_learning

This folder holds the code to do transfer learning, based on pre-trained AlexNet and use different cutting point to train on NIH pickled dataset. All jupyter notebook has the result. When you re-run these ipynb, same result for each cell should be reproduced.

Example: run jupy_fc6_lr0002.ipynb

sub-folder: GoogLeNet

This folder holds the code to do GoogLeNet training, validation, and testing

Example: run Cropped_image_Lenet_more_metric.ipynb

sub-folder: dataset

5. requirements.txt

pip freeze of reuiqred package.

6. README.md

detail explain the structure of code base, step-by-step instruction to reproduce our result.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注