-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agents: add MEF synchro. #3170
agents: add MEF synchro. #3170
Conversation
74b4952
to
4bf357e
Compare
10ed911
to
3d29866
Compare
189083d
to
8af6691
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greats work !
I just have a general question. Could we use sync with long terms queries?
Maybe, it's better to write method with pids list in argument.
The idea is to launch multiple sync in the same times using the new generic task from rero-invenio-base 😁
if ref_type == 'viaf': | ||
query = ContributionsSearch() \ | ||
.filter('term', viaf_pid=ref_pid) | ||
query = query.filter('term', viaf_pid=ref_pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use &=
statement 😝
query &= Q('term', viaf_pid=ref_pid)
if not os.path.exists(log_dir): | ||
os.mkdir(log_dir) | ||
verbose_level = ['ERROR', 'INFO', 'DEBUG'] | ||
logging_config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it should be part of a conf/logging.py
file into this module ? just a suggestion
|
||
def _get_last_date(self): | ||
"""Get the date of the last execution of the synchronization.""" | ||
self.from_date = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be moved into __init__
method
f'MEF(pid: {agent.pid}) content has been ' | ||
f'updated') | ||
if not self.dry_run: | ||
if (old_mef_pid == new_mef_pid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary parentheses
doc_is_updated = self.update_agents_in_document( | ||
doc_pid=doc_pid, | ||
pids_to_replace=pids_to_replace | ||
) | ||
if doc_is_updated: | ||
doc_updated.add(doc_pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.update_agents_in_document
can only return True
(maybe it could raise an exception) so :
for pid in docs_pids:
self.update_agents_in_document(doc_pid=pid, pids_to_replace=pids_to_replace)
....
return doc_pids, updated, error
Thanks. Unfortunately I do not think that it is safe to do it in parallel as two MEF updates can modify the same document. Perhaps it works for the db as we use transactions, but what about the indexing? |
28ba272
to
63176aa
Compare
d8b0469
to
a22986b
Compare
* Adds a celery tasks to perform synchronization and contributions cleaning. * Adds a new `RERO_ILS_MEF_SYNC_LOG_DIR` configuration to specify the logs location. * Fixes MEF url configuration for the dojson transformation. * Returns the most recent contribution in the `get_contribution` method. * Adds a new parameter for the contribution `update_online` to avoid documents indexing. * Adds a command line interface to perform manual synchronization. * Refactors the contribution and subject dumpers for document indexing. * Raises an exception if the contribution $ref cannot be resolved in document. * Updates `rero-invenio-base` to be able to use custom migration scripts on workers. Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
Co-Authored-by: Johnny Mariéthoz Johnny.Mariethoz@rero.ch
Why are you opening this PR?
Dependencies
My PR depends on the following
rero-ils-ui
's PR(s):How to test?