Emotion Recognition in Ruby
Welcome to another installment of deep learning in Ruby. Today, we’ll look at FER+, a deep convolutional neural network for emotion recognition developed at Microsoft. The project is open source, and there’s a pretrained model in the ONNX Model Zoo that we can get running quickly in Ruby.
First, download the model and this photo of a park ranger.
Photo from Yellowstone National Park
We’ll use MiniMagick to prepare the image and the ONNX Runtime gem to run the model.
gem "mini_magick"
gem "onnxruntime"
For the image, we need to zoom in on her face, resize it to 64x64, and convert it to grayscale. Typically, we’d use a face detection model to find the bounding box and use that information to crop the image, but for simplicity, we’ll do just do it manually.
img = MiniMagick::Image.open("ranger.jpg")
img.crop "100x100+60+20", "-gravity", "center" # manual crop
img.resize "64x64^", "-gravity", "center", "-extent", "64x64"
img.colorspace "Gray"
img.write "resized.jpg"
Here’s a blown up version:
Finally, create a 64x64 matrix of the grayscale intensities.
# all pixels are the same for grayscale, so just get one of them
pixels = img.get_pixels.flat_map { |r| r.map(&:first) }
input = OnnxRuntime::Utils.reshape(pixels, [1, 1, 64, 64])
Now that the input is prepared, we can load and run the model.
model = OnnxRuntime::Model.new("model.onnx")
output = model.predict("Input3" => input)
We use softmax to convert the model output into probabilities.
def softmax(x)
exp = x.map { |v| Math.exp(v - x.max) }
exp.map { |v| v / exp.sum }
end
probabilities = softmax(output["Plus692_Output_0"].first)
Then map the labels and sort by highest probability.
emotion_labels = [
"neutral", "happiness", "surprise", "sadness",
"anger", "disgust", "fear", "contempt"
]
pp emotion_labels.zip(probabilities).sort_by { |_, v| -v }.to_h
And the results are in:
{
"happiness" => 0.9999839207138284,
"surprise" => 1.0569785479062501e-05,
"neutral" => 4.826811128840592e-06,
"anger" => 4.63037778140089e-07,
"sadness" => 9.574742925740587e-08,
"contempt" => 7.941520916580971e-08,
"fear" => 2.8803367665891773e-08,
"disgust" => 1.568577943664937e-08
}
There’s a 99.9% probability she looks happy in the photo. Not bad!
Here’s the complete code. Now go out and try it with your own images!