Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSVImporter file upload bug fix #1541

Merged
merged 5 commits into from
Aug 22, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .rubocop_todo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,4 +114,7 @@ Performance/RegexpMatch:
Exclude:
- 'lib/importer/csv_parser.rb'
- 'app/services/solr_config_uploader.rb'
- 'app/models/account.rb'
- 'app/models/account.rb'

ClassLength:
Max: 120
Binary file added lib/assets/batch-upload-test/example3.tiff
Binary file not shown.
Binary file added lib/assets/batch-upload-test/sample.docx
Binary file not shown.
3 changes: 3 additions & 0 deletions lib/assets/csv_test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
id,type,title,description,resource_type,contributor,contributor,date_created,file
11111111-2465-1111-5468-000123456789,ETD,Hammock Tacos,,image,Tracy S Gertler,Cynthia V Stack,2015-03-29T22:12:12.100363+00:00,example3.tiff
22222222-0971-2222-1353-000123456789,ETD,Freegan Intelligentsia,,text,Tracy S Gertler,Cynthia V Stack,2015-03-29T22:12:12.100363+00:00,sample.docx
1 change: 0 additions & 1 deletion lib/importer/csv_parser.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
module Importer
# rubocop:disable Metrics/ClassLength
class CSVParser
include Enumerable

Expand Down
2 changes: 1 addition & 1 deletion lib/importer/factory/image_factory.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module Factory
class ImageFactory < ObjectFactory
include WithAssociatedCollection

self.klass = GenericWork
self.klass = Image
# A way to identify objects that are not Hydra minted identifiers
self.system_identifier_field = :identifier

Expand Down
37 changes: 31 additions & 6 deletions lib/importer/factory/object_factory.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,16 @@ def run
object
end

## FOR CONSIDERATION: handle a row (i.e. Work) with more than one file:
## currently the file_set is replaced on update
def update
raise "Object doesn't exist" unless object
@attr = update_attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you chosen to use an instance variable here? Why not a local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called in attach_file_to_work (line 137).

run_callbacks(:save) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the callbacks removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand their use! Everything seems to work without. With them the method's Assignment Branch Condition size throws a rubocop error - would you prefer me to change the rubocop threshold for this too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adjust the thresholds as needed. If the callbacks aren't used, then make a separate PR to "remove unused callbacks" It's easer to understand PRs if they constrain themselves to a single change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks! I just restored the callbacks - that didn't cause a rubocop failure anymore. I think I was getting that on my initial solution before extracting logic in new methods.

work_actor.update(environment(update_attributes))
work_actor.update(environment(@attr))
end
destroy_existing_file_set if object.file_sets.present?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to destroy an existing file set? Why not update it? If you replace one fileset with another, people who have cited an existing URI are going to have their references broken. What's the benefit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question; I am not sure that is the best way forward. This is how I envision the use of the csv importer:

  • If I want to update a work with an added file, I would add columns to my csv. I could keep the 1st file or reorder file columns to pick a different one (the thumbnail will be rendered from the 1st).
  • If I delete a file from the csv I would expect not to have it appear in the work anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's universal though. What if I only want to update titles. Reuploading all the files is going to be very slow. And I don't want to have to do that just to update some metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Apologies for a late reply. Good point.
The method actually destroys the file set only if the file itself needs to be changed - the check is line 142 f.destroy if attributes[:file] != f.title. I will rename this method for clarity.
But the file is re-uploaded; which is inefficient for sure. I'll make changes.

attach_file_to_work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attaching a file to a work is the responsibility of the work actor. Why are you adding it here in duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The work_actor doesn't do this - the feature is broken which is the reason for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is done in the actor stack here: https://github.com/samvera/hyrax/blob/f43e7048bf3961f48812809897b36d84b6577f68/app/actors/hyrax/actors/create_with_files_actor.rb#L10 now. So you may need to upload the file and then set the uploaded file id in the :uploaded_files parameter before calling the actor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will work on this.

log_updated(object)
end

Expand Down Expand Up @@ -62,13 +67,14 @@ def search_by_identifier
# https://github.com/projecthydra/active_fedora/issues/874
# 2+ years later, still open!
def create
attrs = create_attributes
@attr = create_attributes
@object = klass.new
run_callbacks :save do
run_callbacks :create do
klass == Collection ? create_collection(attrs) : work_actor.create(environment(attrs))
klass == Collection ? create_collection(@attr) : work_actor.create(environment(@attr))
end
end
attach_file_to_work
log_created(object)
end

Expand Down Expand Up @@ -107,14 +113,33 @@ def transform_attributes
.merge(file_attributes)
end

# NOTE: This approach is probably broken since the actor that handled `:files` attribute was removed:
# https://github.com/samvera/hyrax/commit/3f1b58195d4381c51fde8b9149016c5b09f0c9b4
def file_attributes
files_directory.present? && files.present? ? { files: file_paths } : {}
end

def file_paths
files.map { |file_name| File.join(files_directory, file_name) }
attributes[:file].map { |file_name| File.join(files_directory, file_name) } if attributes[:file]
end

def import_file(path)
u = Hyrax::UploadedFile.new
u.user_id = User.find_by_user_key(User.batch_user_key).id if User.find_by_user_key(User.batch_user_key)
u.file = CarrierWave::SanitizedFile.new(path)
u.save
u
end

## If no file name is provided in the CSV file, `attach_file_to_work` is not performed
## TO DO: handle invalid file in CSV
## currently the importer stops if no file corresponding to a given file_name is found
def attach_file_to_work
imported_file = [import_file(file_paths.first)] if file_paths
AttachFilesToWorkJob.new.perform(object, imported_file, @attr) if imported_file
end

def destroy_existing_file_set
f = object.file_sets.first
f.destroy if attributes[:file] != f.title
end

# Regardless of what the MODS Parser gives us, these are the properties we are prepared to accept.
Expand Down
1 change: 1 addition & 0 deletions lib/importer/factory/with_associated_collection.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ def create_attributes
# Strip out the :collection key, and add the member_of_collection_ids,
# which is used by Hyrax::Actors::AddAsMemberOfCollectionsActor
def update_attributes
return super if attributes[:collection].nil?
super.except(:collection).merge(member_of_collection_ids: [collection.id])
end

Expand Down
82 changes: 41 additions & 41 deletions spec/lib/importer/factory/etd_factory_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,50 @@

RSpec.describe Importer::Factory::ETDFactory, :clean do
let(:factory) { described_class.new(attributes) }
let(:actor) { double }
before do
allow(Hyrax::CurationConcern).to receive(:actor).and_return(actor)
end
let(:files) { [] }
let(:attributes) do
{
collection: { id: coll.id },
files: files,
identifier: ['123'],
title: ['Test image'],
read_groups: ['public'],
depositor: 'bob',
edit_users: ['bob']
}
end

context 'with files' do
let(:factory) { described_class.new(attributes, 'tmp/files', files) }
let(:files) { ['img.png'] }
let(:coll) { create(:collection) }

context "for a new image" do
it 'calls the actor with the files' do
expect(actor).to receive(:create).with(Hyrax::Actors::Environment) do |k|
expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
end
factory.run
end
end
let(:work) { GenericWork }

context "for an existing image without files" do
let(:work) { create(:generic_work) }
let(:factory) { described_class.new(attributes.merge(id: work.id), 'tmp/files', files) }

it 'creates file sets' do
expect(actor).to receive(:update).with(Hyrax::Actors::Environment) do |k|
expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
end
factory.run
end
end
end
# context 'with files' do
# let(:factory) { described_class.new(attributes, 'tmp/files', files) }
# let(:files) { ['img.png'] }
# let(:coll) { create(:collection) }
#
# context "for a new image" do
# it 'calls the actor with the files' do
# expect(actor).to receive(:create).with(Hyrax::Actors::Environment) do |k|
# expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
# end
# factory.run
# end
# end
#
# context "for an existing image without files" do
# let(:work) { create(:generic_work) }
# let(:factory) { described_class.new(attributes.merge(id: work.id), 'tmp/files', files) }
#
# it 'creates file sets' do
# expect(actor).to receive(:update).with(Hyrax::Actors::Environment) do |k|
# expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
# end
# factory.run
# end
# end
# end

context 'when a collection already exists' do
let!(:coll) { create(:collection) }
let(:attributes) do
{
collection: { id: coll.id },
files: files,
identifier: ['123'],
title: ['Test image'],
read_groups: ['public'],
depositor: 'bob',
edit_users: ['bob']
}
end
let(:actor) { Hyrax::CurationConcern.actor }

it 'does not create a new collection' do
expect(actor).to receive(:create).with(Hyrax::Actors::Environment) do |k|
Expand All @@ -58,4 +56,6 @@
end.to change(Collection, :count).by(0)
end
end

include_examples("csv_importer")
end
82 changes: 41 additions & 41 deletions spec/lib/importer/factory/image_factory_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,50 @@

RSpec.describe Importer::Factory::ImageFactory, :clean do
let(:factory) { described_class.new(attributes) }
let(:actor) { double }
before do
allow(Hyrax::CurationConcern).to receive(:actor).and_return(actor)
end
let(:files) { [] }
let(:attributes) do
{
collection: { id: coll.id },
files: files,
identifier: ['123'],
title: ['Test image'],
read_groups: ['public'],
depositor: 'bob',
edit_users: ['bob']
}
end

context 'with files' do
let(:factory) { described_class.new(attributes, 'tmp/files', files) }
let(:files) { ['img.png'] }
let!(:coll) { create(:collection) }

context "for a new image" do
it 'creates file sets with access controls' do
expect(actor).to receive(:create).with(Hyrax::Actors::Environment) do |k|
expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
end
factory.run
end
end
let(:work) { Image }

context "for an existing image without files" do
let(:work) { create(:generic_work) }
let(:factory) { described_class.new(attributes.merge(id: work.id), 'tmp/files', files) }

it 'creates file sets' do
expect(actor).to receive(:update).with(Hyrax::Actors::Environment) do |k|
expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
end
factory.run
end
end
end
# context 'with files' do
# let(:factory) { described_class.new(attributes, 'tmp/files', files) }
# let(:files) { ['img.png'] }
# let!(:coll) { create(:collection) }
#
# context "for a new image" do
# it 'creates file sets with access controls' do
# expect(actor).to receive(:create).with(Hyrax::Actors::Environment) do |k|
# expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
# end
# factory.run
# end
# end
#
# context "for an existing image without files" do
# let(:work) { create(:generic_work) }
# let(:factory) { described_class.new(attributes.merge(id: work.id), 'tmp/files', files) }
#
# it 'creates file sets' do
# expect(actor).to receive(:update).with(Hyrax::Actors::Environment) do |k|
# expect(k.attributes).to include(member_of_collection_ids: [coll.id], files: ['tmp/files/img.png'])
# end
# factory.run
# end
# end
# end

context 'when a collection already exists' do
let!(:coll) { create(:collection) }
let(:attributes) do
{
collection: { id: coll.id },
files: files,
identifier: ['123'],
title: ['Test image'],
read_groups: ['public'],
depositor: 'bob',
edit_users: ['bob']
}
end
let(:actor) { Hyrax::CurationConcern.actor }

it 'does not create a new collection' do
expect(actor).to receive(:create).with(Hyrax::Actors::Environment) do |k|
Expand All @@ -58,4 +56,6 @@
end.to change(Collection, :count).by(0)
end
end

include_examples("csv_importer")
end
70 changes: 70 additions & 0 deletions spec/support/shared_examples_for_csv_importer_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
RSpec.shared_examples "csv_importer" do
context "with a file" do
let(:attributes) do
{
id: "123",
title: ["Gluten-free umami"],
file: ["world.png"]
}
end
let(:factory) { described_class.new(attributes, 'spec/fixtures/images') }

before { factory.run }

describe "#run" do
it "uploads the content of the file" do
expect(Hyrax::UploadedFile.last[:file]).to eq("world.png")
end

it "attaches the uploaded file to a work" do
expect(Hyrax::UploadedFile.last[:file_set_uri]).to eq(work.find("123").file_sets.first.uri)
end
end

# describe "when a work with the same id already exists" do
# let(:new_attr) do
# {
# id: "123",
# title: ["Squid tofu banjo"],
# file: ["nypl-hydra-of-lerna.jpg"]
# }
# end
#
# # Ldp::NotFound Error for the update method in 2 tests below - how to stub??
#
# it "updates metadata" do
# new_factory = described_class.new(new_attr, 'spec/fixtures/images')
# new_factory.run
# expect(work.last.title).to eq(["Squid tofu banjo"])
# end
#
# # current behavior - may want to handle this differently?
# it "replaces its file_set" do
# new_factory = described_class.new(new_attr, 'spec/fixtures/images')
# new_factory.run
# expect(Hyrax::UploadedFile.last[:file]).to eq("nypl-hydra-of-lerna.jpg")
# end
# end
end

context "without a file" do
## the csv_importer still creates a Work when no file is provided.
## TO DO: handle invalid file in CSV; current behavior:
## the importer stops if no file corresponding to a given file_name is found
let(:attributes) { { id: "345", title: ["Artisan succulents"] } }
let(:factory) { described_class.new(attributes) }

before { factory.run }

it "creates a Work with supplied metadata" do
expect(work.find("345").title).to eq(["Artisan succulents"])
end

it "updates a Work with supplied metadata" do
new_attr = { id: "345", title: ["Retro humblebrag"] }
new_factory = described_class.new(new_attr)
new_factory.run
expect(work.find("345").title).to eq(["Retro humblebrag"])
end
end
end