Daru: Pandas for Ruby


Photo by Bruce Hong

NumPy and Pandas are two extremely popular libraries for machine learning in Python. Last post, we looked at Numo, a Ruby library similar to NumPy. As luck would have it, there’s a library similar to Pandas as well. It’s called Daru, and it’s the focus of this post.

2020 Update: Since writing this article, I created a data frame library called Rover that’s designed for data exploration and machine learning. Check it out as well.


Daru is a data analysis library. Its core data structure is the data frame, which is similar to an in-memory database table. Data frames have rows and columns, and each column has a specific data type. Let’s create a data frame with the most populous countries:

df = Daru::DataFrame.new(
  country: ["China", "India", "USA"],
  population: [1433, 1366, 329] # in millions

Population data from the United Nations, 2019

Here’s what it looks like:

     country population
0      China       1433
1      India       1366
2        USA        329

You can get specific columns with:

df[:country, :population]

Or specific rows with:

df.first(2)  # first 2 rows
df.last(2)   # last 2 rows
df.row[1]    # 2nd row
df.row[1..2] # 2nd and 3rd row

Filtering, Sorting, and Grouping

Select countries with over 1 billion people.

df.where(df[:population] > 1000)

For equality, use eq or in.

df.where(df[:country].in(["USA", "India"]))

Negate a condition with !.


Combine operators with & (and) and | (or).

df.where(df[:country].eq("USA") | (df[:population] < 1400))

Sort the data frame by a column with:

df.sort([:country], ascending: [false])

You can also group data and perform aggregations.

cities = Daru::DataFrame.new(
  country: ["China", "China", "India"],
  city: ["Shanghai", "Beijing", "Mumbai"]

Combining Data Frames

There are a number of ways to combine data frames. You can add rows:

countries = Daru::DataFrame.new(
  country: ["Indonesia", "Pakistan"],
  population: [271, 217] # in millions

Or add columns:

locations = Daru::DataFrame.new(
  continent: ["Asia", "Asia", "North America"],
  planet: ["Earth", "Earth", "Earth"]

You can also perform joins like in SQL.

cities = Daru::DataFrame.new(
  country: ["China", "China", "India"],
  city: ["Shanghai", "Beijing", "Mumbai"]
df.join(cities, how: :inner, on: [:country])

Reading and Writing Data

Daru makes it easy to load data from a CSV file.


After manipulating the data, you can save it back to a CSV file.


You can also load data directly from Active Record.

relation = Country.where("population > 100")


For plotting, use a Jupyter notebook with IRuby. Create a plot with:

df.plot type: :bar, x: :country, y: :population do |plot, diagram|
  plot.x_label "Country"
  plot.y_label "Population (millions)"

Daru Plot

You can also create line charts, scatter plots, box plots, and histograms.


You’ve now seen how to use Daru to:

Try out Daru for your next analysis.

Published September 18, 2019 · Tweet

You might also enjoy

Numo: NumPy for Ruby

XGBoost and LightGBM Come to Ruby

Emotion Recognition in Ruby

All code examples are public domain.
Use them however you’d like (licensed under CC0).