Rover

Simple, powerful data frames for Ruby

⛰️ Designed for data exploration and machine learning, and powered by Numo

🌲 Uses Vega for visualization

Installation

Add this line to your application’s Gemfile:

gem 'rover-df'

Intro

A data frame is an in-memory table. It’s a useful data structure for data analysis and machine learning. It uses columnar storage for fast operations on columns.

Try it out for forecasting by clicking the button below:

Use the Run button (or SHIFT + ENTER) to run each line.

Creating Data Frames

From an array

Rover::DataFrame.new([
  {a: 1, b: "one"},
  {a: 2, b: "two"},
  {a: 3, b: "three"}
])

From a hash

Rover::DataFrame.new({
  a: [1, 2, 3],
  b: ["one", "two", "three"]
})

From Active Record

Rover::DataFrame.new(User.all)

From a CSV

Rover.read_csv("file.csv")
# or
Rover.parse_csv("CSV,data,string")

Attributes

Get number of rows

df.count

Get column names

df.keys

Check if a column exists

df.include?(name)

Selecting Data

Select a column

df[:a]

Note that strings and symbols are different keys, just like hashes

Select multiple columns

df[[:a, :b]]

Select first rows

df.head
# or
df.first(5)

Select last rows

df.tail
# or
df.last(5)

Select rows by index

df[1]
# or
df[1..3]
# or
df[[1, 4, 5]]

Filtering

Filter on a condition

df[df[:a] == 100]
df[df[:a] != 100]
df[df[:a] > 100]
df[df[:a] >= 100]
df[df[:a] < 100]
df[df[:a] <= 100]

In

df[df[:a].in?([1, 2, 3])]
df[df[:a].in?(1..3)]
df[df[:a].in?(["a", "b", "c"])]

Not in

df[!df[:a].in?([1, 2, 3])]

And, or, and exclusive or

df[(df[:a] > 100) & (df[:b] == "one")] # and
df[(df[:a] > 100) | (df[:b] == "one")] # or
df[(df[:a] > 100) ^ (df[:b] == "one")] # xor

Operations

Basic operations

df[:a] + 5
df[:a] - 5
df[:a] * 5
df[:a] / 5
df[:a] % 5
df[:a] ** 2

Summary statistics

df[:a].count
df[:a].sum
df[:a].mean
df[:a].median
df[:a].percentile(90)
df[:a].min
df[:a].max

Count occurrences

df[:a].tally

Cross tabulation

df[:a].crosstab(df[:b])

Grouping

Group

df.group(:a).count

Works with all summary statistics

df.group(:a).max(:b)

Multiple groups

df.group([:a, :b]).count

Visualization

Add Vega to your application’s Gemfile:

gem 'vega'

And use:

df.plot(:a, :b)

Specify the chart type (line, pie, column, bar, area, or scatter)

df.plot(:a, :b, type: "pie")

Updating Data

Add a new column

df[:a] = 1
# or
df[:a] = [1, 2, 3]

Update a single element

df[:a][0] = 100

Update multiple elements

df[:a][0..2] = 1
# or
df[:a][0..2] = [1, 2, 3]

Update elements matching a condition

df[:a][df[:a] > 100] = 0

Clamp

df[:a].clamp!(0, 100)

Delete columns

df.delete(:a)
# or
df.except!(:a, :b)

Rename a column

df[:new_a] = df.delete(:a)

Sort rows

df.sort_by! { |r| r[:a] }

Clear all data

df.clear

Combining Data Frames

Add rows

df.concat(other_df)

Add columns

df.merge!(other_df)

Inner join

df.inner_join(other_df)
# or
df.inner_join(other_df, on: :a)
# or
df.inner_join(other_df, on: [:a, :b])
# or
df.inner_join(other_df, on: {df_col: :other_df_col})

Left join

df.left_join(other_df)

Encoding

One-hot encoding

df.one_hot

Drop a variable in each category to avoid the dummy variable trap

df.one_hot(drop: true)

Conversion

Array of hashes

df.to_a

Hash of arrays

df.to_h

Numo array

df.to_numo

CSV

df.to_csv

Types

You can specify column types when creating a data frame

Rover::DataFrame.new(data, types: {"a" => :int, "b" => :float})

Or

Rover.read_csv("data.csv", types: {"a" => :int, "b" => :float})

Supported types are:

boolean - bool
float - float, float32
integer - int, int32, int16, int8
unsigned integer - uint, uint32, uint16, uint8
object - object

Get column types

df.types

For a specific column

df[:a].type

Change the type of a column

df[:a] = df[:a].to(:int)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/ankane/rover.git
cd rover
bundle install
bundle exec rake test

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.github/workflows		.github/workflows
lib		lib
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
rover-df.gemspec		rover-df.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rover

Installation

Intro

Creating Data Frames

Attributes

Selecting Data

Filtering

Operations

Grouping

Visualization

Updating Data

Combining Data Frames

Encoding

Conversion

Types

History

Contributing

About

Releases

Packages

Languages

License

0000marcell/rover

Folders and files

Latest commit

History

Repository files navigation

Rover

Installation

Intro

Creating Data Frames

Attributes

Selecting Data

Filtering

Operations

Grouping

Visualization

Updating Data

Combining Data Frames

Encoding

Conversion

Types

History

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages