Skip to content

feat: ebook record dedup #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 21, 2018
Merged

Conversation

BadrAly
Copy link

@BadrAly BadrAly commented Aug 20, 2018

How to test:

  • git PR
  • run_test.sh
  • if this a new instance make sure to run bootstrap and setup
  • run the following command to update existing records
    dojson -l marcxml -i data/ebooks.xml do cantookmarc21 | invenio records create_else_update --vendor cantook

Signed-off-by: Aly Badr aly.badr@rero.ch
Co-authored-by: Gianni Pante gianni.pante@rero.ch

@BadrAly BadrAly force-pushed the baa-#623-records-dedup branch 2 times, most recently from 4993e33 to 34a2e9f Compare August 21, 2018 07:40
@BadrAly BadrAly requested review from jma, iGormilhit and rerowep August 21, 2018 08:39
):
"""Create a new ebook record."""
assert cls.minter
if delete_pid and data.get('pid'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not assert not data.get('pid')?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted your proposition, done

assert cls.minter
if delete_pid and data.get('pid'):
del (data['pid'])
if not id_:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem copy and paste from reroils_data, fixed

except PIDDoesNotExistError:
return None

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative implementation (to be checked):

import copy
    
    @staticmethod
    def _merge(new_record, record):
        """Merge new and old electronic_location_and_access fields."""
        field = 'electronic_location_and_access'
        new_e_r = new_record.get(field)
        old_e_r = record.get(field)
        for e_r in old_e_r:
            # check if already exists!
            if e_r not in new_e_r:
                new_e_r.append(copy.deepcopy(e_r))
                new_record['__order__'].insert(
                    new_record['__order__'].index(field), field
                )
        return new_record

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted your proposition, done


@staticmethod
def _merge(new_record, record):
"""Merge new and old records."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No realy merge record but only electronic_location_and_access



@pytest.yield_fixture()
def cdf_record():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does CDF means?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdf = la-chaux-de-fonds, comments added to the tests


@pytest.yield_fixture()
def mv_record():
"""CDF record."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MV record instead? What is MV?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MV: mediatheque-valais, comments added to the tests

assert merged_record_status == 'updated'
ela = merged_record['electronic_location_and_access']
assert len(ela) == 2
first_856 = ela[0]['uniform_resource_identifier'][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use 856 here!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used uri instead of 856, done

Signed-off-by: Aly Badr <aly.badr@rero.ch>
@BadrAly BadrAly force-pushed the baa-#623-records-dedup branch from 34a2e9f to d711ece Compare August 21, 2018 09:42
@jma jma merged commit 2a4b94e into rero:master Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants