Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for raw_import flag #375

Merged
merged 1 commit into from
Apr 26, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,42 @@ Obviously not every type of definition might be compiled. There are some restric

However, it is quite possible that your type definition will be supported by Witchcraft™ technology out of the box in the most of the cases.

### Raw Import

Another way to speed up import time is Raw Imports. This technology is only available in ActiveRecord adapter. Very often, ActiveRecord model instantiation is what consumes most of the CPU and RAM resources. Precious time is wasted on converting, say, timestamps from strings and then serializing them back to strings. Chewy can operate on raw hashes of data directly obtained from the database. All you need is to provide a way to convert that hash to a lightweight object that mimics the behaviour of the normal ActiveRecord object.

```ruby
class LightweightProduct
def initialize(attributes)
@attributes = attributes
end

# Depending on the database, `created_at` might
# be in different formats. In PostgreSQL, for example,
# you might see the following format:
# "2016-03-22 16:23:22"
#
# Taking into account that Elastic expects something different,
# one might do something like the following, just to avoid
# unnecessary String -> DateTime -> String conversion.
#
# "2016-03-22 16:23:22" -> "2016-03-22T16:23:22Z"
def created_at
@attributes['created_at'].tr(' ', 'T') << 'Z'
end
end

define_type Product do
default_import_options raw_import: ->(hash) {
LightweightProduct.new(hash)
}

field :created_at, 'datetime'
end
```

Also, you can pass `:raw_import` option to the `import` method explicitly.

### Types access

You can access index-defined types with the following API:
Expand Down
2 changes: 1 addition & 1 deletion lib/chewy/type.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

module Chewy
class Type
IMPORT_OPTIONS_KEYS = [:batch_size, :bulk_size, :refresh, :consistency, :replication]
IMPORT_OPTIONS_KEYS = [:batch_size, :bulk_size, :refresh, :consistency, :replication, :raw_import]

include Search
include Mapping
Expand Down
13 changes: 12 additions & 1 deletion lib/chewy/type/adapter/active_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,13 @@ def import_scope(scope, options)
result = true

while ids.present?
result &= yield grouped_objects(default_scope_where_ids_in(ids))
objects =
if options[:raw_import]
raw_default_scope_where_ids_in(ids, options[:raw_import])
else
default_scope_where_ids_in(ids)
end
result &= yield grouped_objects(objects)
break if ids.size < options[:batch_size]
ids = pluck_ids(scope.where(target_id.gt(ids.last)))
end
Expand All @@ -49,6 +55,11 @@ def scope_where_ids_in(scope, ids)
scope.where(target_id.in(Array.wrap(ids)))
end

def raw_default_scope_where_ids_in(ids, converter)
sql = default_scope_where_ids_in(ids).to_sql
object_class.connection.execute(sql).map(&converter)
end

def relation_class
::ActiveRecord::Relation
end
Expand Down
21 changes: 21 additions & 0 deletions spec/chewy/type/adapter/active_record_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,27 @@ def import(*args)

specify { expect(import(cities.first, nil)).to eq([{index: [cities.first]}]) }
specify { expect(import(cities.first.id, nil)).to eq([{index: [cities.first]}]) }

context 'raw_import' do
let(:probe) { double }
let(:converter) { ->(raw_hash) { probe.call(raw_hash) } }
let(:moscow) { OpenStruct.new(id: 1, name: 'Moscow') }
let(:warsaw) { OpenStruct.new(id: 2, name: 'Warsaw') }
let(:madrid) { OpenStruct.new(id: 3, name: 'Madrid') }
before do
@one, @two, @three = City.all.to_a
end

it 'uses the raw import converter to make objects out of raw hashes from the database' do
expect(City).not_to receive(:new)

expect(probe).to receive(:call).with(a_hash_including('id' => @one.id, 'name' => @one.name)).and_return(moscow)
expect(probe).to receive(:call).with(a_hash_including('id' => @two.id, 'name' => @one.name)).and_return(warsaw)
expect(probe).to receive(:call).with(a_hash_including('id' => @three.id, 'name' => @three.name)).and_return(madrid)

expect(import(City.where(nil), raw_import: converter)).to eq([{index: [moscow, warsaw, madrid]}])
end
end
end

context 'additional delete conitions' do
Expand Down
5 changes: 3 additions & 2 deletions spec/chewy/type_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,10 @@ def self.by_name
specify { expect { PlacesIndex::City.default_import_options(invalid_option: "Yeah!") }.to raise_error(ArgumentError) }

context 'default_import_options is set' do
before { PlacesIndex::City.default_import_options(batch_size: 500) }
let(:converter) { -> {} }
before { PlacesIndex::City.default_import_options(batch_size: 500, raw_import: converter) }

specify { expect(PlacesIndex::City._default_import_options).to eq(batch_size: 500) }
specify { expect(PlacesIndex::City._default_import_options).to eq(batch_size: 500, raw_import: converter) }
end
end
end