• Seb Monzon

PROJECT: Intracranial Hemorrhage Detection

Updated: Nov 19, 2019

Using convolutional neural networks built on TensorFlow

Healthcare is ripe for innovation, with every advancement in machine learning speeding it along. One area in particular, medical imaging, is rapidly evolving with each stepwise improvement in neural network techniques. In this blog, I will discuss my capstone project for Flatiron School’s Data Science bootcamp, in which I use a convolutional neural networks to detect intracranial hemorrhages, or bleeding in and around the brain.

First and foremost, though, a huge thank you to Allunia and her Kaggle notebook, which I drew from as my baseline, as well as all the other Kagglers who helped piece together the functions and solutions to common problems. In a future post, I would like to talk about my initial experiment doing this project blind -- i.e. not looking at any others' code -- and how I learned a ton from that process. However, I learned even more from all the brilliant data scientists putting their code out in the public and for that, I'm very grateful.

1. context

Before delving into the workflow, I should highlight the topic's importance. Building and implementing deep learning techniques in medical imaging problems is currently cutting edge. The UCSF article heading on the left, released less than a month ago shows two things. 1. The work is cutting edge and 2. AI could improve the speed and accuracy (or whichever metric you're using to evaluate success) of detection. You can click on the image to be linked to the article.

This is especially important as radiologists are saddled with increasing workloads as every year, more and more medical images are ordered and produced. Building these neural networks is not only helpful for this specific problem, but can be applied as transfer learning to new models for new problems (e.g. brain tumors).

2. data

The project originated from a Kaggle competition. The data came in DICOM (.dcm) files. DICOM is the Digital Imaging and Communications in Medicine standard for biomedical images. It specifies a data interchange protocol, a digital image format (for CT scans in our case), and file structures.

In other words, each CT scan was not only a large 512x512 pixel image, but a host of metadata as well. These files, consequently, were too large to load on my computer. For this project, I decided to use cloud computing as I could download the Kaggle files there and more efficiently run deep-learning models with the aid of rented GPUs. After comparing pricing and ease-of-setup, I chose to run my project on Paperspace over AWS, Google Cloud, and others.

I successfully obtained 103,772 files to use for my project. I used the pydicom library to access and manipulate the files. The goal was to identify the probability of each of five subtypes of hemorrhage existing in the image, as well as a catch-all 'any' category. Each image ID corresponds to exactly one DICOM file and 6 observations in the labeled data frame provided through Kaggle.

3. methodology snapshot

A. Preprocessing

  1. building custom functions

  2. error-handling – expect loads of errors

  3. many parameters to change for testing different models

B. Build and run a convolutional neural network

  1. using TensorFlow and Keras modules

  2. multi-class multi-label classification problem

  3. transfer learning (ResNet, VGG)

  4. GPU from cloud computing platform will speed up process greatly

4. preprocessing

With pydicom's .pixelarray method, we can extract the pixel values for each image's 262,144 pixels in a 512x512 matrix.

When we look at the raw values for the first ten images, we see the distribution shown in the first graph below. Each DICOM file is compressed with a linear transformation, which is encoded in the file itself. When we reverse the transformation, we're left with Hounsfield Units (HU)—based on a quantitative scale for describing radiodensity—the distributions of which are represented in the second graph below.

The import thing to notice here is the wide range and clustering. Normal grayscale pixels range [0,255]. The HUs here range over twenty times more than that. Needless to say, a human eye cannot make those distinctions. In order to visualize these images, we can correct for the large negative values by setting them to -1024, the value of air, as well as by choosing a 'window', discussed next.

In order to focus on pertinent values, we can 'window' the image appropriately. This means choosing a range of values that will be represented. Anything not in this range, we can essentially ignore and treat the same. I chose values in the 'hematoma window' as described in radiopaedia and radclass.

You can see the results of these important preprocessing steps in the three images (different CT scan slices) below.

1. Raw pixels 2. Rescaled pixels 3. Rescaled/doctor's window 4. rescaled/custom window

below are two key functions in my preprocessing class

3. deep learning with convnets

Before discussing the architecture of my convnet, let's first take a look at the abstracted concept in the image below, which we will read left to right.

Each red box: input image (pixel matrix) --> convolved features --> max pooled features || Yellow box: features --> flattened neural net input layer --> hidden layers --> output layer

The very first blue matrix is our pixel values (or in our case, our windowed Hounsfield Units). The shape will be 512x512.

This matrix is convolved with a series of filters. In the image above, there are five filters. Each of these filters scans over the original matrix to create a new matrix (usually of slightly smaller shape), which we call a convolved feature. You are returned as many matrices as you had filters. These filters are meant to pick out some basic features and are randomly generated in our model.

Below, you can see GIFs of convolution with one filter in action. The filters in these examples are 3x3 in shape, but they can be any shape. You can see the explicit values in the rightmost filter. It is the matrix [[1,0,1], [0,1,0], [1,0,1]], which is multiplied by the pixel value then added together, then shifted a certain amount (in this case, just one pixel). Together, this forms the values and shape of our convolved feature.

The convolution then undergoes a large reduction in dimensionality through pooling (often max pooling, as we will show here). In the master image, our convolved features are shown in green, they are then Max Pooled into the red matrices.

Max pooling is a simple process of taking the max value of a certain window. This window, again determines the size of your outputs. In the GIF below, the convolved feature is on the right and the Max Pooled feature on the left.

Afterwards, a non-linear transformation is applied. You can repeat these steps several times for greater dimensionality reduction. After this process, the remaining matrix is flattened and fed in as the input layer for the fully-connected neural network. Every node is connected to every node in the following layer with a weight and a bias, which modulates the effect that node has on the 'activation' of the following layer's node. The number of hidden layers can differ, but eventually, the final hidden layer is fully connected to the output layer, which has as many nodes as there are items you are classifying -- in our case, the number of hemorrhage subtypes plus one for the 'any' category, so six in total.

My convnet code

1. Stratified Kfolds for train-test-splits

Several methods in my model class

The code to run the model itself


Multi-label multi-class loss for 20 Epochs (ResNet50)

You can see the log loss of actual submitted projects here! Thank you for following along on my capstone project!