Rails, Meet Data Science

Organizations today have more data than ever. Predictive modeling is a powerful way to use this data to solve problems and create better experiences for customers. For instance, do a better job keeping items in stock by predicting demand or lower costs by predicting fraud. If you use Ruby on Rails, it can be tough to know how to incorporate this into your app.

We’ll go over four patterns you can use for prediction with Rails. We used all four successfully during my time at Instacart. They can work when you have no data scientists (when I started) as well as when you have a strong data science team.

Patterns

With predictive modeling, you first train a model and then use it to predict. The patterns can be grouped by the language used for each task:

Pattern	Train	Predict
1	3rd Party	3rd Party
2	Ruby	Ruby
3	Another Language	Ruby
4	Another Language	Another Language

Two popular languages for data science are Python and R.

You can decide which pattern to use for each model you build. We’ll walk through the approaches and discuss the pros and cons of each.

Pattern 1: Use a 3rd Party

Before building a model in-house, it’s good to see what already exists. There are a number of external services you can use for specific problems. Here are a few:

Fraud - Sift Science
Recommendations - Tamber
Anomaly Detection & Forecasting - Trend
NLP - Amazon Comprehend and Google Cloud Natural Language
Vision - AWS Rekognition and Google Cloud Vision

Pros

Get domain knowledge from the company
Fast to implement and easy to maintain

Cons

Not easy to iterate if it doesn’t fit your needs
Vendor lock-in

Pattern 2: Train and Predict in Ruby

Ruby has a number of libraries for building simple models. Simple models can perform very well since a large part of model building is feature engineering. This is a great option if there are no data scientists in your company or on your team. A developer can own the model end-to-end, which is great for speed and iteration.

Here are a few libraries for building models in Ruby:

Eps - good for beginners
Rumale - good for advanced users
XGBoost
LightGBM
And many more

Once a model is trained, you’ll need to store it. You can use methods provided by the library, or marshal if none exist. You can store the models as files or in the database.

Be sure to commit the code used to train models so you can update them with newer data in the future. The Rails console is a decent place to create them, or use a Jupyter notebook running IRuby for better visualizations (see setup instructions for Rails).

Pros

Simple models can perform well
No need to introduce a new language

Cons

Limited tools for building models
Limited model selection
Many people who have experience building models don’t know Ruby

Pattern 3: Train in Another Language, Predict in Ruby

Ruby is getting better for data science thanks to SciRuby. However, languages like R and Python currently have much better tools. Also, many people who have experience building models don’t know Ruby.

Luckily, you can build models in another language and predict in Ruby. This way, you can use more advanced tools for visualization, validation, and tuning without adding complexity to your production stack. If you don’t have data scientists, you can use this pattern to contract with one.

Here are models that can currently predict in Ruby:

Eps - Linear Regression, Naive Bayes
Scoruby - Random Forest, GBM, Decision Tree, Naive Bayes
XGBoost - Gradient Boosting
LightGBM - Gradient Boosting

For this to work, models need to be stored in a shared format that both languages understand. PMML and PFA are two interchange formats. PFA is newer but has less adoption than PMML. Andrey Melentyev has a great post on the topic.

Once again, it’s important that models are reproducible. This allows you to update them with newer data in the future. Be sure to follow software engineering best practices like:

Use source control (create a new repo or add to your existing repo)
Use a package manager for a reproducible environment
Keep credentials out of source control (use .env or .Renviron)

Here are some tools you can use:

Function	Python	R
Package management	Pipenv	Jetpack
Database access	SQLAlchemy	dbx
PMML export	sklearn2pmml	pmml

One place to be careful is implementing the features in Ruby. It must be consistent with how they were implemented in training. To ensure this is correct, verify it programmatically. Create a CSV file with ids and predictions from the original model and confirm the Ruby predictions match. Here’s some example code.

Pros

Better tools for model building
No need to operate a new language in production

Cons

Need to introduce a new language in development
Limited model selection
Need to create features in two languages

Pattern 4: Train and Predict in Another Language

The last option we’ll cover is doing both training and prediction outside Ruby. This is great if you have a team of data scientists who specialize in another language. This pattern allows data scientists to own models end-to-end.

It also gives you access to models that are not available in Ruby. For instance, there are forecasting libraries like Prophet and deep learning libraries like TensorFlow.

The implementation depends on how predictions are generated. Two common ways are batch and real-time.

Batch Predictions

Batch predictions are generated asynchronously and are typically run on a regular interval. This can be every minute or once a week. An example is a daily job that updates demand forecasts for the following weeks. Predictions can be stored and later used by the Rails app as needed.

Don’t be afraid to read and write directly to the database. While microservice design patterns caution against using the database as an API, we didn’t have much issue with it. When updating records, it’s also a good idea to write audits to see how predictions change over time.

Jobs can be scheduled with cron, or ideally a distributed scheduler like Mani for high availability. If you need to let the Rails app know a job has completed, you can do this through your messaging system. HTTP works great if you don’t have one.

Real-Time Predictions

Real-time predictions are generated synchronously and are triggered by calls from the Rails app. An example is recommending items to a user at checkout based off what’s in their cart.

HTTP is a common choice for retrieving predictions, but you can use a messaging system or even pipes. Great tools for HTTP are Django and Flask for Python and Plumber for R.

As with the other patterns, follow best engineering practices. In addition to ones previously mentioned:

Use a framework, or at the very least a consistent project structure
Keep code DRY

Don’t be afraid to use Rails to manage the database schema. It’s easy enough for data scientists to learn to create and run migrations. Otherwise, you need to support another system for schema changes.

To store models, you most likely won’t use an interchange format, since libraries can’t load them. Instead, use serialization specific to the language, like pickle in Python and serialize in R.

If deciding between Python and R, Python has more general purpose libraries, so it’s easier to run in production.

Pros

Larger selection of models available
Data scientists can own models end-to-end

Cons

Need to run multiple languages in production

Conclusion

You’ve now seen four great patterns for bringing predictive models to Rails. Each has different trade-offs, so we recommend taking the simplest approach that works for you. No matter which you choose, make sure your models are reproducible.

Happy modeling!

Updates

May 2019: Added Rumale
August 2019: Added XGBoost and LightGBM

All code examples are public domain.
Use them however you’d like (licensed under CC0).

Rails, Meet Data Science

Patterns

Pattern 1: Use a 3rd Party

Pattern 2: Train and Predict in Ruby

Pattern 3: Train in Another Language, Predict in Ruby

Pattern 4: Train and Predict in Another Language

Batch Predictions

Real-Time Predictions

Conclusion

15 More ML Gems for Ruby

Ruby ML for Python Coders

TensorFlow Object Detection in Ruby