Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload displays FTP files while the the transfer is in progress - no warnings, can be imported as partial datasets #103

Closed
jennaj opened this issue Mar 16, 2018 · 7 comments
Assignees
Labels
bug fixathon 06/18 Fixathon June 2018 functionality usegalaxy.org tool/dependency/function fix usegalaxy.org test/retest-fail failed retest wontfix

Comments

@jennaj
Copy link
Member

jennaj commented Mar 16, 2018

Not sure how we can trap an aborted/resumed transfer and keep it out of the Upload tool, but it seems like we could catch those transfers in progress (even if resumed) and not display them until done. This might be a bug.

These are semi-large data (~4 GB each BAM, 14 GB for the SAM). I have had to accept cert/resume a few times. Many users load much larger datasets. Resuming is likely common.

A few more details are included here: #102 (comment) and here galaxyproject/tools-iuc#1774 (comment)

Thoughts? @guerler @natefoo

Example

UI/Upload tool versus Filezilla, screenshots taken a few mins apart:

screen shot 2018-03-15 at 7 25 43 pm

screen shot 2018-03-15 at 7 25 59 pm

Then the transfer. Note that all four have started up, but only three remain in the Upload tool at this point. Results are in the different tickets above that including testing results for the final datatype assignment and related comments.

screen shot 2018-03-15 at 7 48 48 pm

screen shot 2018-03-15 at 7 49 05 pm

screen shot 2018-03-15 at 7 50 12 pm

@jennaj jennaj added the question scope or action decision needed label Mar 16, 2018
@jennaj jennaj mentioned this issue Mar 16, 2018
4 tasks
@jennaj
Copy link
Member Author

jennaj commented Mar 16, 2018

After some datasets fully moved into the history, the FTP area updates to reflect those datasets still in an active/resumed transfer mode.

This is good because datasets loaded into the history, sticking around in the FTP area, was a prior bug fix by @guerler (thank you!!).

But also not-so-good, because no end users should not access/history load those transferring files yet - are incomplete.

screen shot 2018-03-15 at 9 15 15 pm

That's it! - we can follow up from here and decide what to do.

@jennaj jennaj added bug install usegalaxy.org tool install usegalaxy.org requested test/retest-do active tests and removed question scope or action decision needed labels May 16, 2018
@martenson martenson removed the install usegalaxy.org tool install usegalaxy.org requested label May 21, 2018
@jennaj jennaj added the functionality usegalaxy.org tool/dependency/function fix usegalaxy.org label May 21, 2018
@martenson martenson added the fixathon 06/18 Fixathon June 2018 label Jun 4, 2018
@martenson martenson self-assigned this Jun 4, 2018
@natefoo
Copy link
Member

natefoo commented Jun 4, 2018

Because the upload process is external to Galaxy there is really no way for it to know that the upload is incomplete. Verifying that the upload succeeded is up to the user. In addition to the file size on the info page, users can use the hash tool to verify that the file uploaded matches their local file.

We could add size and modification time columns to the FTP file selection window, which would allow for a quick visual indicator, but to be sure, users should be paying attention to their FTP client.

@martenson
Copy link
Member

We could add size and modification time columns to the FTP file selection window

it already has the size column

I agree that the users are in charge here, they need to make sure to understand handling of their own files.

@natefoo
Copy link
Member

natefoo commented Jun 4, 2018

Oh... right... it's right there in the screenshot, ha.

@hexylena
Copy link
Member

hexylena commented Jun 5, 2018

Maybe error prone but could decrease complaints?: when listing FTP files could we time.sleep(1) and compare filesizes? Ignoring those that have changed?

@natefoo
Copy link
Member

natefoo commented Jun 5, 2018

I considered this but worried it'd result in many false positives for bursty connections.

@jennaj
Copy link
Member Author

jennaj commented Jun 5, 2018

Ok, it sounds like we can't keep actively transferring files out of the Upload/FTP listing. That was my primary question -- if there was some way to do that.

Closing, but please re-open if a working solution comes up. Partial FTP uploads is the root issue with many downstream tool errors and are difficult to diagnose. I suspect some users don't even know it is going on (incomplete results from downstream tools) and so never report a problem, just think the results are odd when using Galaxy. What happens depends on the datatype and tool(s) used.

Agree, not resuming/tracking the load is one thing users should do -- but I don't think the common (or new!) sci user expects the datasets to show up in the GUI until the transfer is completed. Now, could still be partial, if they didn't resume and the connection was interrupted, but nothing we can do about that case on our side.

I'll add some usage-warning blurb to the FTP upload FAQ in the hub and link to the Troubleshooting FAQ. Might help, or at least gives a handle/link to explain proper usage when sending bug/biostars replies.

@jennaj jennaj closed this as completed Jun 5, 2018
@jennaj jennaj added wontfix test/retest-fail failed retest and removed test/retest-do active tests labels Jun 5, 2018
@jennaj jennaj mentioned this issue Jun 5, 2018
57 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fixathon 06/18 Fixathon June 2018 functionality usegalaxy.org tool/dependency/function fix usegalaxy.org test/retest-fail failed retest wontfix
Projects
None yet
Development

No branches or pull requests

4 participants