Numo: NumPy for Ruby

Numo

NumPy is an extremely popular library for machine learning in Python. It provides an efficient way to work with large, multi-dimensional arrays. What you may not know is Ruby has a library with similar functionality. It’s called Numo, and in this post, we’ll look at what you can do with it.

Basic Operations

Numo’s core data structure is the multi-dimensional array, which has methods for mathematical operations. These operations are written in C, so they’re much faster than performing the same operations in Ruby.

Let’s start by creating a Numo array from a Ruby array.

x = Numo::DFloat.cast([[1, 2, 3], [4, 5, 6]])

Each array has shape. We created a 2x3 2D array, but arrays can be 1D, 3D, or more.

x.shape # [2, 3]

Read a row or column with:

x[0, true] # 1st row - [1, 2, 3]
x[true, 2] # 3rd column - [3, 6]

We can add a constant value:

x + 2 # [[3, 4, 5], [6, 7, 8]]

Or add arrays:

x + x # [[2, 4, 6], [8, 10, 12]]

Some operations like mean and sum can be run over a specific axis.

x.sum(0)  # sum of each column - [5, 7, 9]
x.mean(1) # mean of each row - [2, 5]

We can also change its shape - useful for preparing data for models.

x.reshape(3, 2) # [[1, 2], [3, 4], [5, 6]]

If you’re familiar with NumPy operations, there are side-by-side examples and a table showing how the functions map.

Building Models

Rumale is a machine learning library similar to Python’s Scikit-learn. It uses Numo for inputs and outputs. Here’s a basic example of linear regression.

# generate data: y = 1 + 2(x0) + 3(x1)
x = Numo::DFloat.asarray([[0, 1], [1, 0], [1, 2]])
y = 1 + 2 * x[true, 0] + 3 * x[true, 1]

# train
model = Rumale::LinearModel::LinearRegression.new(
          fit_bias: true, max_iter: 10000)
model.fit(x, y)

# predict
model.predict(x)

Rumale has many, many models and other useful tools for:

Regression: linear, ridge, lasso, support vector machines
Classification: logistic regression, naive Bayes, K-nearest neighbors, support vector machines
Clustering: K-means, Gaussian mixture model
Dimensionality reduction: principal component analysis

Scikit-learn has a great cheat-sheet to help you decide what do use:

Image from Scikit-learn (BSD License)

Storing Data

Numo arrays can be marshaled just like other Ruby objects. This allows you to save your work and resume it at a later time.

# save
File.binwrite("x.dump", Marshal.dump(x))

# load
x = Marshal.load(File.binread("x.dump"))

Npy allows you to save and load arrays in the same format as NumPy. This is more performant than marshaling.

# save
Npy.save("x.npy", x)

# load
x = Npy.load("x.npy")

It also makes it easy to load datasets like MNIST.

mnist = Npy.load_npz("mnist.npz")

Summary

You now have a basic introduction to Numo and know how to:

perform basic operations
build a model
store data

Consider Numo for your next machine learning project.

All code examples are public domain.
Use them however you’d like (licensed under CC0).

Numo: NumPy for Ruby

Basic Operations

Building Models

Storing Data

Summary

Ruby ML for Python Coders

Daru: Pandas for Ruby

Jupyter + Rails