Emotion Recognition in Ruby

Welcome to another installment of deep learning in Ruby. Today, we’ll look at FER+, a deep convolutional neural network for emotion recognition developed at Microsoft. The project is open source, and there’s a pretrained model in the ONNX Model Zoo that we can get running quickly in Ruby.

First, download the model and this photo of a park ranger.

Park Ranger

Photo from Yellowstone National Park

We’ll use MiniMagick to prepare the image and the ONNX Runtime gem to run the model.

gem "mini_magick"
gem "onnxruntime"

For the image, we need to zoom in on her face, resize it to 64x64, and convert it to grayscale. Typically, we’d use a face detection model to find the bounding box and use that information to crop the image, but for simplicity, we’ll do just do it manually.

img = MiniMagick::Image.open("ranger.jpg")
img.crop "100x100+60+20", "-gravity", "center" # manual crop
img.resize "64x64^", "-gravity", "center", "-extent", "64x64"
img.colorspace "Gray"
img.write("resized.jpg")

Here’s a blown up version:

Park Ranger

Finally, create a 64x64 matrix of the grayscale intensities.

# all pixels are the same for grayscale, so just get one of them
pixels = img.get_pixels.flat_map { |r| r.map(&:first) }
input = OnnxRuntime::Utils.reshape(pixels, [1, 1, 64, 64])

Now that the input is prepared, we can load and run the model.

model = OnnxRuntime::Model.new("model.onnx")
output = model.predict("Input3" => input)

We use softmax to convert the model output into probabilities.

def softmax(x)
  exp = x.map { |v| Math.exp(v - x.max) }
  exp.map { |v| v / exp.sum }
end

probabilities = softmax(output["Plus692_Output_0"].first)

Then map the labels and sort by highest probability.

emotion_labels = [
  "neutral", "happiness", "surprise", "sadness",
  "anger", "disgust", "fear", "contempt"
]
pp emotion_labels.zip(probabilities).sort_by { |_, v| -v }.to_h

And the results are in:

{
  "happiness" => 0.9999839207138284,
  "surprise"  => 1.0569785479062501e-05,
  "neutral"   => 4.826811128840592e-06,
  "anger"     => 4.63037778140089e-07,
  "sadness"   => 9.574742925740587e-08,
  "contempt"  => 7.941520916580971e-08,
  "fear"      => 2.8803367665891773e-08,
  "disgust"   => 1.568577943664937e-08
}

There’s a 99.9% probability she looks happy in the photo. Not bad!

Here’s the complete code. Now go out and try it with your own images!

Published September 13, 2019 · Tweet


You might also enjoy

Artistic Style Transfer in Ruby

TensorFlow Object Detection in Ruby

XGBoost and LightGBM Come to Ruby


All code examples are public domain.
Use them however you’d like (licensed under CC0).