Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish API/button needs to send message(s) to external endpoint on publication #3561

Closed
pdurbin opened this issue Jan 6, 2017 · 46 comments
Closed
Assignees
Labels
Feature: Publishing & Versions Type: Feature a feature request UX & UI: Design This issue needs input on the design of the UI and from the product owner

Comments

@pdurbin
Copy link
Member

pdurbin commented Jan 6, 2017

As discussed with @pameyer @djbrooke and @scolapasta yesterday the rsync solution we are working on in #3145 requires potentially time intensive movement of files from one filesystem called "hold" to another filesystem called "public" just before the dataset is published in Dataverse. In the doc we reviewed we called this "Publish API/button needs to send message(s) to external endpoint on publication" and the external endpoint in question is currently known as "The Backend" but will eventually be given an name and will expose a JSON-based REST endpoint. The communication between Dataverse and The Backend will go something like this:

  • Dataverse tells The Backend to move files from "hold" to "public"
    • Note that there are UI/UX considerations for this workflow in the GUI. Perhaps the user still clicks "Publish" but an indication needs to be shown that publishing will take a long time, perhaps overnight, due to the size of the dataset or simply because files were uploaded via an rsync script. See also File Upload w/ Rsync UI/Workflow changes #3348.
  • The Backend replies immediately with some sort of job id confirming that the message was recieved.
  • Time passes. Files are moved by The Backend.
  • The Backend tells Dataverse that file movement is complete.
  • Dataverse publishes the dataset.

We will iterate on this description and provide more details as needed.

@pdurbin pdurbin added Feature: API Feature: Publishing & Versions UX & UI: Design This issue needs input on the design of the UI and from the product owner SBGrid labels Jan 6, 2017
@pdurbin pdurbin added the ready label Jan 6, 2017
@pdurbin pdurbin removed their assignment Jan 13, 2017
@pameyer
Copy link
Contributor

pameyer commented Jan 17, 2017

"the backend" endpoint for Dataverse to communicate with needs to be defined/implemented. I'll set up another repo (analogous to how we approached the DCM) for that component; (initially with mocks) so that there's something to develop against.

@pameyer
Copy link
Contributor

pameyer commented Jan 23, 2017

Until renamed (and to avoid namespace confusion), "the backend" is RSAL.

Very preliminary API docs there, mocks to follow.

@pameyer
Copy link
Contributor

pameyer commented Jan 24, 2017

and mocks are in that repo as well

@djbrooke
Copy link
Contributor

We discussed this in sprint planning today, and @pameyer mentioned that @michbarsinai currently has what he needs to move this forward and will continue to provide things as needed.

@pdurbin
Copy link
Member Author

pdurbin commented Feb 7, 2017

It has been requested that the "Design Options for Remotely Managed Datasets" doc by @michbarsinai be reviewed by all interested parties in time to discuss it during the regular Tuesday morning call next week (Feb 14): https://docs.google.com/document/d/1_m4wHYsocNGDj-uBmBCXjwh0kVPyS5GyThjKnigS1zg/edit?usp=sharing

@pameyer
Copy link
Contributor

pameyer commented Mar 2, 2017

"Design Options for Remotely Managed Datasets" not currently applicable (to this issue), since @michbarsinai 's redesign (and subsequent meetings/conversations about it).

Some of these discussions touched on the need to prevent users from changing a dataset (or initiating other state transitions) at various points of the process; this intersects with the idea of ensuring that datasets aren't changeable while the dataset is being reviewed (#3172).

@pdurbin
Copy link
Member Author

pdurbin commented Mar 2, 2017

@pameyer good point. Here's a link to the new "Design Document for Publication Workflows" doc that we've switch to: https://docs.google.com/document/d/1GFrCh23TdYc87dBi4xg5Pl2_Gf1usQqrqaik2-i7V7A/edit?usp=sharing

@mheppler
Copy link
Contributor

mheppler commented Mar 2, 2017

After reviewing the aforementioned design doc in our mtg yesterday, we agreed that for now, we'll be able to utilize the UX example of dataset locking as it exists for tabular data file ingest.

Taking that example of disabling action buttons, we'll also use the message block at the top of the page to display a warning, with more details, as well as User Guide and Contact Support links. There is also the example of "Draft", "Unpublished" and "In Review" publication status labels next to the dataset title, where we could add new labels for "Reviewed" or any other state we may need to display.

mockup-dataset-lock

@pdurbin
Copy link
Member Author

pdurbin commented May 16, 2017

I just mentioned this issue to @donsizemore at http://irclog.iq.harvard.edu/dataverse/2017-05-16#i_52743 because his problem reminds of the reason we are introducing this "workflows" concept. The operation his user is attempting is slow:

thu-mai prepares to publish a dataset with 224 files in it. gears whir, white-gloved hands with knives start spinning. asynchronously, backgrounded. in the browser, thu-mai gets the spinning wheel of work

she waits a while, tries to publish again, gets an error. i'm at lunch. ingest and indexing can take a while with large datasets, but any way to get good/meaningful feedback to the user until it completes? publishing-in-progress semaphore or some such?

Also, I'd like to mention that @michbarsinai has been working away on a branch at https://github.com/IQSS/dataverse/tree/3561-publication-workflows that's ready for some code review.

@pdurbin
Copy link
Member Author

pdurbin commented May 18, 2017

Yesterday I asked @djbrooke about the status of https://github.com/IQSS/dataverse/tree/3561-publication-workflows and he indicated that it is not ready for code review. @michbarsinai is still working away on it. My bad.

I just left a related comment on this effort at #3725 (comment) that might be of interest.

@pameyer
Copy link
Contributor

pameyer commented May 18, 2017

at least according to my notes; @michbarsinai mentioned this was ready for an initial (non-UI) code review 5/2.

@pdurbin
Copy link
Member Author

pdurbin commented May 18, 2017

@pameyer I dunno. Yesterday @michbarsinai mentioned ServiceLoader and NoClassFoundException's so I think he's still plugging away. Sounds like SPI stuff (#3657).

@michbarsinai
Copy link
Member

michbarsinai commented May 18, 2017 via email

@michbarsinai
Copy link
Member

Some SPI pointers in @pdurbin 's comment:
#3657 (comment)

@pameyer
Copy link
Contributor

pameyer commented Jun 21, 2017

Possibly worth mentioning in the documentation that calls to api/workflows/$invokation_id need to have the content-type header set (aka - this info should live somewhere other than my notes).

@pdurbin pdurbin added the Type: Feature a feature request label Jun 28, 2017
@landreev
Copy link
Contributor

landreev commented Sep 5, 2017

(pushed a fix for this; but would like to review - see below). The "unpublished" tag immediately disappears, even when there is a workflow in progress.

(the fix I pushed should also take care of the improperly rendered "export metadata" button)

@pameyer
Copy link
Contributor

pameyer commented Sep 5, 2017

failure case when workflow uses an external system, and the external system is down. results in an unpublished dataset w\ the published button greyed out; probably should report an error and clear the workflow lock.

@landreev
Copy link
Contributor

landreev commented Sep 5, 2017

I added an info message for when the dataset is locked in the publish workflow; the actual message, dataset.publish.workflow.inprogress in the bundle is just a placeholder:

Publish workflow in progress

please replace it with something sensible

@landreev
Copy link
Contributor

landreev commented Sep 5, 2017

I fixed the missing "unpublished" tag by moving the setPublicationDate() line from the PublishDatasetCommand to FinalizePublishDatasetCommand (publication date being non null is the definition used by dvobject.isReleased(), used in turn by the dataset page to determine if the dataset is published).
I'm not 100% sure if this isn't going to break something somewhere else. And I'd like to quickly review the overall functionality split between PublishDatasetCommand and FinalizePublishDatasetCommand with somebody else... @scolapasta ?

@landreev
Copy link
Contributor

landreev commented Sep 5, 2017

If the fixes I made today are legit, this should be ready for QA.
but let's review tomorrow.

@michbarsinai
Copy link
Member

michbarsinai commented Sep 6, 2017 via email

@pameyer
Copy link
Contributor

pameyer commented Sep 6, 2017

@michbarsinai I'll re-test; but it was looking like there was a lock that wasn't getting cleared on rollback. I'll update with more definite information once I've done some investigation.

@pameyer
Copy link
Contributor

pameyer commented Sep 6, 2017

More investigation:

  • create external workflow with http/sr stepType for pre-publish
  • create dataset
  • submit for review
  • turn off the system listening at the url for the http/sr step.
  • admin user published dataset (through UI)
  • server log shows workflow engine rolling back workflow; dataset page shows (new) message about publication workflow in progress, de-activated publish button, no return to author button.

Same behavior seen for publishing without a submit for review step; also seen with publishing through API.

split into #4124

@pameyer
Copy link
Contributor

pameyer commented Sep 6, 2017

From discussion at stand-up and some investigation, this is probably better off in its own issue (source of the problems is system misconfiguration/failure, and there's a relatively straightforward way to get out of the stuck state (for a sysadmin).

@landreev
Copy link
Contributor

landreev commented Sep 6, 2017

@michbarsinai
@scolapasta and I have just reviewed the PublishDatasetCommand vs. FinalizeDatasetPublicationCommand functionality split and we have the following question:

What there a specific reason why the following steps were left in PublishDatasetCommand:

             theDataset.setPublicationDate(new Timestamp(new Date().getTime()));
             theDataset.getEditVersion().setVersionNumber(new Long(1)); 
            theDataset.getEditVersion().setMinorVersionNumber(new Long(0));
 (for a previously unpublished dataset)
            ...
            Timestamp updateTime = new Timestamp(new Date().getTime());
            theDataset.getEditVersion().setReleaseTime(updateTime);
            theDataset.getEditVersion().setLastUpdateTime(updateTime);
            theDataset.setModificationTime(updateTime);
             ....

as opposed to moving all this stuff to the Finalize command?
This way, if there is a workflow, we are creating this half-published version, that's still a draft, but has the version number, modification dates and other attributes of a published one... Presumably it all needs to be reversed if anything goes wrong with the workflow process. I've already had to move the setPublicationDate() to the Finalize command (since the publication date being non null was the test inside dvObject.isReleased(), so it was confusing the rendering on DatasetPage.

So far we are inclined to just move all that code to the Finalize command. Would that cause any kind of problem we are not aware of?

@kcondon
Copy link
Contributor

kcondon commented Sep 6, 2017

At Leonid's request, here is the info for configuring the RSAL Mock:
https://github.com/sbgrid/rsal/tree/master/mocks

Also, to configure Dataverse to use the new workflow you must do the following:

  1. Configure the RSAL URL:
    curl -X PUT -d 'http://<myipaddr>:5050' http://localhost:8080/api/admin/settings/:RepositoryStorageAbstractionLayerUrl
  2. Update workflow json with correct URL information:
    edit internal-httpSR-workflow.json and replace url and rollbackUrl to be the url of your RSAL mock.
  3. Create the workflow:
    curl http://localhost:8080/api/admin/workflows -X POST --data-binary @internal-httpSR-workflow.json -H "Content-type: application/json"
  4. List available workflows:
    curl http://localhost:8080/api/admin/workflows
  5. Set the workflow (id) as the default workflow for the appropriate trigger:
    curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset -X PUT -d 2
  6. Check that the trigger has the appropriate default workflow set:
    curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset
  7. Add RSAL to whitelist:
  8. When finished testing, unset the workflow:
    curl -X DELETE http://localhost:8080/api/admin/workflows/default/PrePublishDataset

@kcondon
Copy link
Contributor

kcondon commented Sep 6, 2017

Issues observed:
Doc:

  1. Unset workflow missing $ from triggertype:
    DELETE http://$SERVER/api/admin/workflows/default/triggerType
  2. Whitelist describes body as list of ip but how separated/formed? Pete says ;
  3. Might want a single paragraph or couple sentences describing what workflow is, how configure, current support status.

Behavior:

  1. Returning dataset to author sends two notifications and emails to author.
  2. Two info messages are listed at top of a dataset submitted for review, 1 green success, submitted for review, 1 brownish yellow saying dataset is now in review.
  3. When submitted dataset is in the process of being published, in review tag disappears but on the author's view, submit for review button is still visible and is once again enabled to be clicked, though it does say publish in progress at top of dataset.
  4. Rollback does not complete cleanly, created as a separate issue: workflow steps don't roll back cleanly #4124
    If for some reason the publish cannot complete, such as RSAL down or RSAL server not in whitelist, the rollback happens but does not fully complete since it leaves the dataset in a locked state.

@landreev landreev assigned kcondon and unassigned landreev and sekmiller Sep 6, 2017
pdurbin added a commit that referenced this issue Sep 7, 2017
landreev added a commit that referenced this issue Sep 7, 2017
@landreev
Copy link
Contributor

landreev commented Sep 7, 2017

Just pushed in a fix for the 3. (submit for review button needs to go away).
Thanks @pdurbin for the doc change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Publishing & Versions Type: Feature a feature request UX & UI: Design This issue needs input on the design of the UI and from the product owner
Projects
None yet
Development

No branches or pull requests