Daru: Pandas for Ruby


Photo by Bruce Hong

2020 Update: Since writing this article, I created a data frame library called Rover that’s designed for data exploration and machine learning. Check it out as well.

NumPy and Pandas are two extremely popular libraries for machine learning in Python. Last post, we looked at Numo, a Ruby library similar to NumPy. As luck would have it, there’s a library similar to Pandas as well. It’s called Daru, and it’s the focus of this post.


Daru is a data analysis library. Its core data structure is the data frame, which is similar to an in-memory database table. Data frames have rows and columns, and each column has a specific data type. Let’s create a data frame with the most populous countries:

df = Daru::DataFrame.new(
  country: ["China", "India", "USA"],
  population: [1433, 1366, 329] # in millions

Population data from the United Nations, 2019

Here’s what it looks like:

     country population
0      China       1433
1      India       1366
2        USA        329

You can get specific columns with:

df[:country, :population]

Or specific rows with:

df.first(2)  # first 2 rows
df.last(2)   # last 2 rows
df.row[1]    # 2nd row
df.row[1..2] # 2nd and 3rd row

Filtering, Sorting, and Grouping

Select countries with over 1 billion people.

df.where(df[:population] > 1000)

For equality, use eq or in.

df.where(df[:country].in(["USA", "India"]))

Negate a condition with !.


Combine operators with & (and) and | (or).

df.where(df[:country].eq("USA") | (df[:population] < 1400))

Sort the data frame by a column with:

df.sort([:country], ascending: [false])

You can also group data and perform aggregations.

cities = Daru::DataFrame.new(
  country: ["China", "China", "India"],
  city: ["Shanghai", "Beijing", "Mumbai"]

Combining Data Frames

There are a number of ways to combine data frames. You can add rows:

countries = Daru::DataFrame.new(
  country: ["Indonesia", "Pakistan"],
  population: [271, 217] # in millions

Or add columns:

locations = Daru::DataFrame.new(
  continent: ["Asia", "Asia", "North America"],
  planet: ["Earth", "Earth", "Earth"]

You can also perform joins like in SQL.

cities = Daru::DataFrame.new(
  country: ["China", "China", "India"],
  city: ["Shanghai", "Beijing", "Mumbai"]
df.join(cities, how: :inner, on: [:country])

Reading and Writing Data

Daru makes it easy to load data from a CSV file.


After manipulating the data, you can save it back to a CSV file.


You can also load data directly from Active Record.

relation = Country.where("population > 100")


For plotting, use a Jupyter notebook with IRuby. Create a plot with:

df.plot type: :bar, x: :country, y: :population do |plot, diagram|
  plot.x_label "Country"
  plot.y_label "Population (millions)"

Daru Plot

You can also create line charts, scatter plots, box plots, and histograms.


You’ve now seen how to use Daru to:

Try out Daru for your next analysis.

Published September 18, 2019

You might also enjoy

Numo: NumPy for Ruby

Ruby ML for Python Coders

Jupyter + Rails

All code examples are public domain.
Use them however you’d like (licensed under CC0).