-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Fixes #7945] Ingest harvested layer data to geonode #7947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge from upsteram
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.2 to 1.26.3. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/1.26.3/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.2...1.26.3) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Toni <toni.schoenbuchner@csgis.de>
…6881) * [Fixes GeoNode#6880] Circle CI upload tests fail irregulary * CircleCI test fix: sometimes expires due to upload timeout in the test environment * - Avoid infinite loop on upload testing * Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment" This reverts commit 66139fd. Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it> Co-authored-by: afabiani <alessio.fabiani@gmail.com>
…de#6911) * get meaningful document filenames on download * - Strip extension from document title before slugify it (e.g.: image.jpg instead of imagejpg.jpg) Co-authored-by: afabiani <alessio.fabiani@gmail.com> Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
…loop on "wait_for_progress"
…ng slash at the end of GEOSERVER_LOCATION (GeoNode#6913) * [Fixes GeoNode#6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION * [Fixes GeoNode#6916] unit test for GEOSERVER_LOCATION
Bumps [django-cors-headers](https://github.com/adamchainz/django-cors-headers) from 3.6.0 to 3.7.0. - [Release notes](https://github.com/adamchainz/django-cors-headers/releases) - [Changelog](https://github.com/adamchainz/django-cors-headers/blob/master/HISTORY.rst) - [Commits](adamchainz/django-cors-headers@3.6.0...3.7.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [amqp](https://github.com/celery/py-amqp) from 5.0.3 to 5.0.5. - [Release notes](https://github.com/celery/py-amqp/releases) - [Changelog](https://github.com/celery/py-amqp/blob/master/Changelog) - [Commits](celery/py-amqp@v5.0.3...v5.0.5) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1. - [Release notes](https://github.com/pypa/pip/releases) - [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst) - [Commits](pypa/pip@21.0...21.0.1) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [coverage](https://github.com/nedbat/coveragepy) from 5.3.1 to 5.4. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](coveragepy/coveragepy@coverage-5.3.1...coverage-5.4) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.1 to 6.2.2. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst) - [Commits](pytest-dev/pytest@6.2.1...6.2.2) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [djangorestframework-gis](https://github.com/openwisp/django-rest-framework-gis) from 0.16 to 0.17. - [Release notes](https://github.com/openwisp/django-rest-framework-gis/releases) - [Changelog](https://github.com/openwisp/django-rest-framework-gis/blob/master/CHANGES.rst) - [Commits](openwisp/django-rest-framework-gis@v0.16.0...v0.17.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… it has… (GeoNode#6923) * [Fixes GeoNode#6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded * - Add REST APIs test suite to CircleCI
* [Cleanup and Refactor] Remove QGIS server backend dependencies * [Cleanup and Refactor] Remove QGIS server backend dependencies * - Fix LGTM issues
…_auth_middleware Feature#650 basic auth middleware
Upstream master
…olutions_master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far it almost works fine but...
ISSUE 1
I noticed that in the case I leave the option
{
"harvest_datasets": true,
"copy_datasets": false
}The layer won't be populated not accessible.
E.g.: by using the demo instance https://development.demo.geonode.org/ as a source I get the following
The dataset is empty.
ISSUE 2
Trying to harvest a Map fails with an exception, the session and the harvester are then blocker forever.
Internal Server Error: /en/admin/harvesting/harvestableresource/
Traceback (most recent call last):
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/contrib/admin/options.py", line 616, in wrapper
return self.admin_site.admin_view(view)(*args, **kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/utils/decorators.py", line 130, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/contrib/admin/sites.py", line 232, in inner
return view(request, *args, **kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/utils/decorators.py", line 43, in _wrapper
return bound_method(*args, **kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/utils/decorators.py", line 130, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/contrib/admin/options.py", line 1723, in changelist_view
response = self.response_action(request, queryset=cl.get_queryset(request))
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/django/contrib/admin/options.py", line 1408, in response_action
response = func(self, request, queryset)
File "/mnt/c/Work/geonode/geonode/harvesting/admin.py", line 365, in harvest_selected_resources
tasks.harvest_resources.apply_async(args=(harvestable_resource_ids, harvesting_session.pk))
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 569, in apply_async
return self.apply(args, kwargs, task_id=task_id or uuid(),
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 790, in apply
ret = tracer(task_id, args, kwargs, request)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 467, in trace_task
I, R, state, retval = on_error(task_request, exc, uuid)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 450, in trace_task
R = retval = fun(*args, **kwargs)
File "/mnt/c/Work/geonode/geonode/harvesting/tasks.py", line 147, in harvest_resources
harvesting_workflow.apply_async()
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1408, in apply_async
return self.apply(args, kwargs,
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1427, in apply
args=(tasks.apply(args, kwargs).get(propagate=propagate),),
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1117, in apply
return app.GroupResult(group_id, [
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1118, in <listcomp>
sig.apply(args=args, kwargs=kwargs, **options) for sig, _, _ in tasks
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 186, in apply
return self.type.apply(args, kwargs, **options)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 790, in apply
ret = tracer(task_id, args, kwargs, request)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 467, in trace_task
I, R, state, retval = on_error(task_request, exc, uuid)
File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 450, in trace_task
R = retval = fun(*args, **kwargs)
File "/mnt/c/Work/geonode/geonode/harvesting/tasks.py", line 174, in _harvest_resource
harvested_resource_info = worker.get_resource(harvestable_resource, harvesting_session_id)
File "/mnt/c/Work/geonode/geonode/harvesting/harvesters/geonodeharvester.py", line 299, in get_resource
resource_descriptor = self._get_resource_details(
File "/mnt/c/Work/geonode/geonode/harvesting/harvesters/geonodeharvester.py", line 444, in _get_resource_details
result = self._get_resource_descriptor(
File "/mnt/c/Work/geonode/geonode/harvesting/harvesters/geonodeharvester.py", line 547, in _get_resource_descriptor
additional_params_handler = {
KeyError: 'maps'
ISSUE 3
Whenever an harvest session fails, there's no way to recover the harvester, which remains busy forever.
|
Ragarding ISSUE 3 this is a long standing request that was already raised in the past @ricardogsilva. If a job breaks there's no way to recover it but going into the DB and manually changing the status back to |
|
@afabiani I will make the necessary adjustments. Some clarification wrt the issues you raise: for Issue 1, the current For issue 2, that is a clear bug, which I will fix ASAP. The current harvester worker does not claim to support maps, so it should just fail in an orderly fashion. For issue 3, that is a by-product of the bug described above probably. Generally speaking, the code is trying to catch known exceptions in the code so that it avoids entering that invalid state. The fact that it is still happening is a sign of some error in the code which is not being properly caught. |
|
@ricardogsilva regarding ISSUE 1, I expect the legacy harvester to work even on 3.3.x (the development demo) and master, since nothing from the old API and CSW has been dropped so far. |
|
@ricardogsilva just to be more clear a bit, notice that about ISSUE #1, if I enable |
|
btw, any chance to fix also this one while looking for the other issues? |
|
This pull request introduces 2 alerts when merging 45fc7c2 into bf23880 - view on LGTM.com new alerts:
|
|
My latest commit should fix the issues you reported. I fixed some missing metadata fields not being correctly passed to the resource manager. I also disabled harvesting of maps for the GeoNode harvester. The reason for this is that API does not really provide enough information for us to be able to replicate a map locally - there is no info about the map's layers, etc. To the best of my knowledge documents and datasets are now correctly harvested, regardless of being of type I cannot guarantee 100% that this works for all GeoNode remote deployments, as I'm not sure how stable the old API had been between releases. Seems to work OK with those deployments that I am using as a test bed. As for the failing test, I don't seem to get it locally. I'm waiting to let the CI build be done to check it output. |
…cted harvestable resources
Hoping to make lgtm happy
45fc7c2 to
a9ec56a
Compare
|
This pull request introduces 2 alerts when merging a9ec56a into bf23880 - view on LGTM.com new alerts:
|
|
@ricardogsilva there still something not fully working with the remote resources and in particular with the WMS harvester, but I'll postpone those fixes to #8149 which will include this PR too. |



References: #7945
This PR implements copying of remote layers and documents in the
GeonodeLegacyHarvester. It also includes a number of bug fixes.Copying remote data is configured by means of editing a
Harvester'sharvester type specific configurationparameter. The following additional configuration parameters are now available:copy_documentscopy_datasetsThey both default to
False, which means that by default a harvester will not copy remote data. Setting them toTruewill cause GeoNode to copy the remote data when performing harvesting.The video below shows the proposed implementation in action:
2021-09-21.17-35-34_18.6.12.mp4
In the video I am demonstrating harvesting with two harvesters simultaneously. One of the harvesters (
copys pcrafi) performs a copy of the remote data onto the local GeoNode, while the other (links pcrafi) merely copies the metadata. The video also demonstrates an error with the copying of a remote raster (the error is intended here) and how this is recorded in the corresponding harvesting session. The harvested resources appear on the GeoNode frontend.Improvements
HarvestingSessionmodel is now more useful, as it records the progress of harvesting operations as they are on-goingFixes
Metadata fields for resource's thumbnail, URL extension (for documents) and subtype (for datasets) are now correctly provided.
When harvesting is merely referencing the remote resources (i.e. we are not copying the actual data, just the metadata):
Notes
In the above video we can see that for remote documents, the GeoNode frontend offers a download of the file in the main page. This seems like a bug of the frontend, so I'm leaving it out of the current PR's scope (which is already quite big).
fixes #7945