Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agents: add MEF synchro. #3170

Merged
merged 1 commit into from
Feb 1, 2023
Merged

agents: add MEF synchro. #3170

merged 1 commit into from
Feb 1, 2023

Conversation

jma
Copy link
Contributor

@jma jma commented Nov 30, 2022

Co-Authored-by: Johnny Mariéthoz Johnny.Mariethoz@rero.ch

Why are you opening this PR?

  • Which task/US does it implement?
  • Which issue does it fix?

Dependencies

My PR depends on the following rero-ils-ui's PR(s):

  • rero/rero-ils-ui#

How to test?

  • What command should I have to run to test your PR?
  • What should I test through the UI?

@github-actions github-actions bot added the f: data About data model, importation, transformation, exportation of data, specific for bibliographic data label Nov 30, 2022
@jma jma changed the base branch from staging to US-sync-mef November 30, 2022 09:59
@jma jma changed the base branch from US-sync-mef to staging November 30, 2022 10:01
@jma jma changed the base branch from staging to US-sync-mef November 30, 2022 10:02
@jma jma force-pushed the maj-sync-mef-cli branch 4 times, most recently from 74b4952 to 4bf357e Compare December 7, 2022 06:22
@jma jma force-pushed the US-sync-mef branch 2 times, most recently from 10ed911 to 3d29866 Compare December 20, 2022 15:09
@jma jma force-pushed the maj-sync-mef-cli branch from 4bf357e to 1b17993 Compare January 11, 2023 12:42
@github-actions github-actions bot added developers From a developer point of view. f: circulation Concerns the circulation interface or backend translations Translations labels Jan 11, 2023
@jma jma force-pushed the maj-sync-mef-cli branch from 1b17993 to 29eab62 Compare January 11, 2023 15:29
@github-actions github-actions bot removed translations Translations f: circulation Concerns the circulation interface or backend developers From a developer point of view. labels Jan 11, 2023
@jma jma force-pushed the maj-sync-mef-cli branch from 29eab62 to e05e75e Compare January 19, 2023 16:04
@github-actions github-actions bot added the developers From a developer point of view. label Jan 19, 2023
@jma jma force-pushed the maj-sync-mef-cli branch 5 times, most recently from 189083d to 8af6691 Compare January 26, 2023 07:53
@jma jma marked this pull request as ready for review January 26, 2023 10:06
@jma jma requested review from lauren-d, rerowep and zannkukai January 26, 2023 10:06
Copy link
Contributor

@lauren-d lauren-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greats work !
I just have a general question. Could we use sync with long terms queries?
Maybe, it's better to write method with pids list in argument.
The idea is to launch multiple sync in the same times using the new generic task from rero-invenio-base 😁

if ref_type == 'viaf':
query = ContributionsSearch() \
.filter('term', viaf_pid=ref_pid)
query = query.filter('term', viaf_pid=ref_pid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use &= statement 😝
query &= Q('term', viaf_pid=ref_pid)

if not os.path.exists(log_dir):
os.mkdir(log_dir)
verbose_level = ['ERROR', 'INFO', 'DEBUG']
logging_config = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should be part of a conf/logging.py file into this module ? just a suggestion


def _get_last_date(self):
"""Get the date of the last execution of the synchronization."""
self.from_date = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be moved into __init__ method

f'MEF(pid: {agent.pid}) content has been '
f'updated')
if not self.dry_run:
if (old_mef_pid == new_mef_pid):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary parentheses

Comment on lines 410 to 415
doc_is_updated = self.update_agents_in_document(
doc_pid=doc_pid,
pids_to_replace=pids_to_replace
)
if doc_is_updated:
doc_updated.add(doc_pid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.update_agents_in_document can only return True (maybe it could raise an exception) so :

for pid in docs_pids:
    self.update_agents_in_document(doc_pid=pid, pids_to_replace=pids_to_replace)
....
return doc_pids, updated, error

@jma
Copy link
Contributor Author

jma commented Jan 30, 2023

Greats work ! I just have a general question. Could we use sync with long terms queries? Maybe, it's better to write method with pids list in argument. The idea is to launch multiple sync in the same times using the new generic task from rero-invenio-base 😁

Thanks. Unfortunately I do not think that it is safe to do it in parallel as two MEF updates can modify the same document. Perhaps it works for the db as we use transactions, but what about the indexing?

@jma jma force-pushed the maj-sync-mef-cli branch 2 times, most recently from 28ba272 to 63176aa Compare January 30, 2023 09:25
@github-actions github-actions bot added the dev: documentation About changelog, release notes, installation procedures, gh templates... label Jan 30, 2023
@jma jma force-pushed the maj-sync-mef-cli branch 2 times, most recently from d8b0469 to a22986b Compare January 31, 2023 06:55
@jma jma requested a review from zannkukai January 31, 2023 09:02
* Adds a celery tasks to perform synchronization and contributions
  cleaning.
* Adds a new `RERO_ILS_MEF_SYNC_LOG_DIR` configuration to specify the
  logs location.
* Fixes MEF url configuration for the dojson transformation.
* Returns the most recent contribution in the `get_contribution` method.
* Adds a new parameter for the contribution `update_online` to avoid
  documents indexing.
* Adds a command line interface to perform manual synchronization.
* Refactors the contribution and subject dumpers for document indexing.
* Raises an exception if the contribution $ref cannot be resolved in
  document.
* Updates `rero-invenio-base` to be able to use custom migration scripts
  on workers.

Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
@jma jma force-pushed the maj-sync-mef-cli branch from a22986b to bf9363a Compare February 1, 2023 07:03
@jma jma merged commit b5fe714 into rero:US-sync-mef Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev: documentation About changelog, release notes, installation procedures, gh templates... developers From a developer point of view. f: data About data model, importation, transformation, exportation of data, specific for bibliographic data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants