Skin Cancer Detection using Convolution Neural Network(CNN)

5 min readDec 28, 2021

Build deep learning model to classify given query image into one of the 7 different classes of skin cancer.

Skin Cancer Detection using Convolution Neural Network(CNN)

Skin cancer is the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions.

1. Prerequisites:

This post assumes you are familiarity with basic knowledge of Data Preprocessing, Exploratory Data Analysis, Performance matric, Machine Learning, Deep Learning techniques like CNN, python syntax, some libraries like NumPy, Pandas, sk-learn, Matplotlib, Seaborn, PrettyTable, TensorFlow, Keras, etc.

2. About Data:

2.1 Overview:

Another more interesting than digit classification dataset to use to get biology and medicine students more excited about machine learning and image processing.

2.2 Original Data Source:

Original Challenge: https://challenge2018.isic-archive.com
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images.
This the HAM10000 (“Human Against Machine with 10000 training images”) dataset. It consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts.
Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen’s disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc)

It has 7 different classes of skin cancer which are listed below:

Melanocytic nevi
Melanoma
Benign keratosis-like lesions
Basal cell carcinoma
Actinic keratoses
Vascular lesions
Dermatofibroma

3. Importing Essential Libraries:

import pandas as pd
import numpy as np
import warnings
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense,MaxPool2D
import tensorflow as tf

4. Loading data and Making labels:

5. Train Test Split:

>>df.label.unique()array([4, 6, 2, 5, 0, 1, 3])# reference: https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000/discussion/183083classes={
    0:('akiec', 'actinic keratoses and intraepithelial carcinomae'),
         
    1:('bcc' , 'basal cell carcinoma'),
         
    2:('bkl', 'benign keratosis-like lesions'),
         
    3:('df', 'dermatofibroma'),
         
    4:('nv', ' melanocytic nevi'),
         
    5:('vasc', ' pyogenic granulomas and hemorrhage'),
         
    6:('mel', 'melanoma'),
}y_train=train_set['label']

x_train=train_set.drop(columns=['label'])

y_test=test_set['label']

x_test=test_set.drop(columns=['label'])

columns=list(x_train)import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

print(device)

6. Exploratory data analysis (EDA):

import seaborn as sns

sns.countplot(train_set['label'])

After random over sampling,

from imblearn.over_sampling import RandomOverSampler 

oversample = RandomOverSampler()

x_train,y_train  = oversample.fit_resample(x_train,y_train)sns.countplot(y_train)

7. CNN Model Architecture:

8. Model Building (CNN):

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 7.87 µs
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_15 (Conv2D)           (None, 28, 28, 16)        448       
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 14, 14, 16)        0         
_________________________________________________________________
batch_normalization_18 (Batc (None, 14, 14, 16)        64        
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 12, 12, 32)        4640      
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 10, 10, 64)        18496     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
batch_normalization_19 (Batc (None, 5, 5, 64)          256       
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 3, 3, 128)         73856     
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 1, 1, 256)         295168    
_________________________________________________________________
flatten_3 (Flatten)          (None, 256)               0         
_________________________________________________________________
dropout_9 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_15 (Dense)             (None, 256)               65792     
_________________________________________________________________
batch_normalization_20 (Batc (None, 256)               1024      
_________________________________________________________________
dropout_10 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 128)               32896     
_________________________________________________________________
batch_normalization_21 (Batc (None, 128)               512       
_________________________________________________________________
dense_17 (Dense)             (None, 64)                8256      
_________________________________________________________________
batch_normalization_22 (Batc (None, 64)                256       
_________________________________________________________________
dropout_11 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_18 (Dense)             (None, 32)                2080      
_________________________________________________________________
batch_normalization_23 (Batc (None, 32)                128       
_________________________________________________________________
dense_19 (Dense)             (None, 7)                 231       
=================================================================
Total params: 504,103
Trainable params: 502,983
Non-trainable params: 1,120
_________________________________________________________________

9. Setting Optimizer & Annealing:

10. Training Model:

11. Model Evaluation:

Confusion Matrix:

12. Model Deployment:

The model is deployed to Heroku cloud through Git/GitHub.

Files Required are:

In the above mentioned files, tester.png, LICENSE, model.png, model_architecture.png are optional

13. Model Deployment Results:

Play with inputting skin cancer images here

URL to deployment: https://skin-cancer-detection-cnn.herokuapp.com/

Home page:

Result Page:

14. Conclusion:

This model is not robust to all skin images, because, we not trained with good amount of equal class images data. Due to random oversampling it may give some wrong predictions to images.

15. References:

You can reach me at:-

GitHub Repository Link: https://github.com/charanhu/Google-Analytics-Customer-Revenue-Prediction

LinkedIn: https://www.linkedin.com/in/charanhu/

GitHub: https://github.com/charanhu