ImageNet needs more Wild Boar Photos

Is your deep convolutional network misclassifying images? You can find out why with a heatmap of class activation overlaid on its misclassified pictures.

A heatmap overlay shows parts of an image most activated in a neural network’s last convolutional layer. In this African elephant picture, the top-most convolutional layer of the VGG16 architecture turns the photo into a 14x14 grid highlighting blocks with strongest African_elephant activation:

af_el_1

Original image source: elephants.com - African elephant Flora

What it’s saying with a yellow-green splotch is “Look! There’s an African elephant here!” The learner returns a score of 46%, quite high for a blink-of-an-eye judgment with 1000 objects to choose from and even locates that object in the picture correctly. Impressive.

imagenet_decode_predictions(preds, top = 3)[[1]]
#  class_name class_description      score
#1  n02504458  African_elephant 0.46432969
#2  n02437312     Arabian_camel 0.29539737
#3  n01871265            tusker 0.07210348

Shaded parts of this photo have at least some activation to class African_elephant. These show the elephant’s face and nearby foliage are what distinguish it from an Indian elephant and other classes, like a strawberry or an aircraft carrier. Parts of the photo that have 0 activation on the corresponding heatmap show up as non-shaded, which can be verified from a visualization of the activation heatmap:

af_el_1_hm

or printing it out as a numeric matrix:

round(heatmap, 2)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
# [1,] 0.00 0.00 0.02 0.02 0.01 0.00 0.00 0.00 0.00  0.00  0.00  0.00  0.00  0.00
# [2,] 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.00  0.00  0.00  0.00  0.00
# [3,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.00  0.00  0.00  0.00  0.00
# [4,] 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00  0.00  0.00  0.00  0.00  0.00
# [5,] 0.00 0.00 0.09 0.07 0.26 0.00 0.11 0.13 0.07  0.16  0.00  0.00  0.00  0.00
# [6,] 0.00 0.00 0.01 0.04 0.30 0.24 0.69 0.63 0.41  0.12  0.04  0.00  0.00  0.00
# [7,] 0.00 0.00 0.01 0.04 0.14 0.14 0.55 0.72 0.92  0.23  0.06  0.00  0.00  0.00
# [8,] 0.00 0.00 0.00 0.01 0.00 0.03 0.61 0.98 1.00  0.22  0.00  0.00  0.00  0.00
# [9,] 0.00 0.00 0.02 0.01 0.00 0.00 0.30 0.27 0.31  0.00  0.00  0.00  0.02  0.00
#[10,] 0.00 0.00 0.04 0.04 0.01 0.00 0.01 0.00 0.02  0.00  0.00  0.02  0.04  0.01
#[11,] 0.00 0.00 0.10 0.09 0.07 0.06 0.10 0.00 0.00  0.00  0.00  0.03  0.05  0.04
#[12,] 0.01 0.14 0.13 0.13 0.11 0.11 0.10 0.00 0.00  0.08  0.09  0.12  0.08  0.08
#[13,] 0.13 0.13 0.15 0.14 0.12 0.10 0.08 0.00 0.03  0.11  0.11  0.15  0.12  0.11
#[14,] 0.04 0.06 0.06 0.04 0.00 0.00 0.02 0.00 0.00  0.00  0.00  0.00  0.00  0.00

Detecting sources of errors

Here is another African elephant with huge ears above its neck, but this time the learner has misclassified it as a tusker with a score of 55% as opposed to 17% for African elephant. Tusker isn’t a terrible judgment. It’s a more generic group that includes wild boars but the classification is not as accurate as African elephant. What threw it off from making a more precise call? Let’s see.

af_el_2

Original image source: By Komar.de - Non-woven photomural Elephant

Looks like the top of the head and the back. Surprising it’s not the tusks. If we take a sample of ImageNet tusker training images, it quickly becomes obvious most tusker images are of elephants. In the first 25 tusker examples shown here, none look like wild boars.

tuskers

So the cause of our misclassification is understandable, and a training set limitation error. A great first recourse would be to add to ImageNet other kinds of tuskers to better train that class.

imagenet_decode_predictions(preds, top = 3)[[1]]
#  class_name class_description     score
#1  n01871265            tusker 0.5496630
#2  n02504013   Indian_elephant 0.2749955
#3  n02504458  African_elephant 0.1732897

R Code

library(keras)
library(magick)
library(viridis)

model <- application_vgg16(weights = "imagenet") # keeping top
model # assumes input picture of size 224 x 224

img_path <- "images/African_elephant_1.jpg"

img <- image_load(img_path, target_size = c(224, 224)) %>% 
  image_to_array() %>% 
  array_reshape(dim = c(1, 224, 224, 3)) %>% # for batch of this size
  imagenet_preprocess_input() # channelwise color normalization

preds <- model %>% predict(img)
imagenet_decode_predictions(preds, top = 3)[[1]]

# get least likely classes for fun
tail(imagenet_decode_predictions(preds, top = 1000)[[1]])

max_class_nbr <- which.max(preds[1, ]) # is the class index

# if want to see second most class activations, get at which index
second_class_nbr <- which.max((preds[1, ])[-max_class_nbr]) # should be second 
# add +1 if above the previous index number
second_class_nbr <- ifelse(second_class_nbr >= max_class_nbr,  
                           second_class_nbr + 1, 
                           second_class_nbr)


# visualize which parts of the image are most class 1 using Grad-CAM
elephant_output <- model$output[, max_class_nbr]
elephant_output <- model$output[, second_class_nbr]
last_conv_layer <- model %>% get_layer("block5_conv3")
grads <- k_gradients(elephant_output, last_conv_layer$output)[[1]]
pooled_grads <- k_mean(grads, axis = c(1, 2, 3))
iterate <- k_function(list(model$input),
                      list(pooled_grads, last_conv_layer$output[1,,,]))
c(pooled_grads_value, conv_layer_output_value) %<-% iterate(list(img))

for(i in 1:dim(conv_layer_output_value)[3]){
  conv_layer_output_value[,,i] <- 
    conv_layer_output_value[,,i] * pooled_grads_value[[i]]
}

heatmap <- apply(conv_layer_output_value, c(1, 2), mean)

# normalize heatmap between 0 and 1
heatmap <- pmax(heatmap, 0)
heatmap <- heatmap / max(heatmap)

round(heatmap, 2)

write_heatmap <- function(heatmap, filename, width = 224, height = 224,
                          bg = "white", col = terrain.colors(12)){
  png(filename, width = width, height = height, bg = bg)
  op = par(mar = c(0, 0, 0, 0))
  on.exit({par(op); dev.off()}, add = TRUE)
  rotate <- function(x) t(apply(x, 2, rev))
  image(rotate(heatmap), axes = FALSE, asp = 1, col = col)
}

write_heatmap(heatmap, paste0(substr(img_path, 1, nchar(img_path) - 4), "_heatmap.png"))


image <- image_read(img_path)
info <- image_info(image)
geometry <- sprintf("%dx%d!", info$width, info$height)

pal <- col2rgb(viridis(20), alpha = TRUE)
alpha <- floor(seq(0, 255, length = ncol(pal)))
pal_col <- rgb(t(pal), alpha = alpha, maxColorValue = 255)
write_heatmap(heatmap, "elephant_overlay.png",
              width = dim(heatmap)[1], height = dim(heatmap)[2], 
              bg = NA, col = pal_col)
image_read("elephant_overlay.png") %>% 
  image_resize(geometry, filter = "quadratic") %>% 
  image_composite(image, operator = "blend", compose_args = "20") %>% 
  plot()

# then save output
image_read("elephant_overlay.png") %>% 
  image_resize(geometry, filter = "quadratic") %>% 
  image_composite(image, operator = "blend", compose_args = "20") %>% 
  image_scale("x480") %>% 
  image_convert(format = "jpg") %>% 
  image_write(paste0(substr(img_path, 1, nchar(img_path) - 4), "_overlay.jpg"))
  
  

# reset the image to second elephant
img_path <- "images/African_elephant_2.jpg"
# then rerun the above from img <-

Written on September 11, 2018