-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEMINI load 0.18.1.0 fails for missing yml configuration file and database index #165
Comments
Am I assigned this ticket to run the data manager or fix the tool wrapper? |
I would imagine not right - that would break compatibility? It would better to augment the data manager to add another column with something describing a schema version and have the tool filter the data table on compatible schema versions. I don't know if we can add a new column to an existing data table definition though - we might need to update the tools and data manager to use a new We could also embed the gemini version in the name of the tables produced - maybe that would be enough to indicate what users should do? |
@jmchilton Is the plan to fix the tool then reinstall that version? If that will take a while, we should get test-datamanger's data tables reset to cvmfs/main content done meanwhile. The server needs to be prepped before any new indexes can be created (for this wrapper, or any others). @natefoo Is this something John or I can do, or something that only you can do? Regarding version info: Users will still get confused if we just list the dates -- we don't include this for other tools and assumes too much technical understanding from users. I would suggest filtering for valid indexed based on this info and including it in the display/dropdown. Data versions matter for reproducibility overall. I think that adding a versioned loc/table for all existing data tables/locs would be a big step forward. We've needed that for a while and will definitely find it useful later on once indexes are shared across usegalaxy.* servers (and shared with users). We could add for this tool to have an example for GUI enhancement dev to display this to users and to model other uses against (determine best universal metadata format/content). For older indexes, we could probably capture this info from cvmfs index file creation dates. |
I see nothing in the above script that particularly ties the files being created to the version of Gemini being used to install it. And it sounds like with some more recent versions of Gemini that the reference data structures have stabilized across versions a bit. Since 98% of people are going to use the latest version of the Gemini tool and 98% of those uses are going to want to select the latest database - I'd recommend just running the latest data manager and see if the usage problems go away. If not, the next step would probably be to just manually change those data manager entries so the latest version is on top and the text in there gives some indication that older indices have problems with newer versions of the tool - @natefoo is possible to manually change the data manager generated table texts - not the keys but the display value? If after these two changes, there are still ongoing problems - we could consider introducing some sort of versioning to the data entries - but we would have to maintain that ourselves since Gemini doesn't seem to track this at all or have any concept of that. This is challenging because none of us is really expert enough to maintain that and it would require a lot more expertise to run the data manager I think. Also Gemini itself doesn't seem actively developed at a rapid pace - so this is a lot of overhead to handle future versions that may never materialize or may only materialize very slowly over many years. With any luck though the above small changes would reduce the usage problems to zero and be enough to workaround things for now. |
Other tools have well behaved reproducible index generation, this tool does not. We're pulling a bunch of random files down from S3. Anything other than this is a misrepresentation of what is happening and we shouldn't do that just to simplify things. The date simplifies things as much as we can I think - and while it does require some technical understanding from the users - the users should understand this is what is happening and when their index data was generated. If the goal is to hide these details from the users - we should eliminate the selection all together and always just use the latest data we have available - and while that is good for usability it is really not good for reproducibility or transparency - hence the dates. |
@jmchilton Thanks, I agree with all of this. I'd like to make the Gemini index, use the same date format for the naming, and all indexes can stay in the pull-down. In order for me to create the new index (and others), the server where we create data indexes needs to have the data tables refreshed to reflect the content of cvmfs (same data as available on main). In the test tables, there are currently duplicates and failed indexing jobs that left partial data. The server is test-datamanager and I believe has the same tables as the test server does. @davebx thinks this is possible and @natefoo has done that in the past. In short, go back to the original list of todo items above, and drop the last two. The first step is to reset the test tables back to canonical content. Could we do that this week? The tool is not usable with the currently available indexes (at org -- eu already has an updated index). |
Gemini data creation is failing. The data is downloaded and appears to be intact, but the YAML could not be written. This causes Genimi load to still fail. The Gemini DM ends in the history with a "green" dataset but has this in the stdout: https://gist.github.com/jennaj/0a4866aa58e082bb7ae352fde2b7ad4b @bgruening Does this look familiar or do you know of the best way to fix? I'm guessing we'll need to remove the indexes already downloaded and plus remove that run the Gemini loc file/data table, Then fix some permissions issue (??). And after that run the DM again from scratch. Or, can we just fix the permissions for the YAML write and rerun? My concern is creating duplicates or leaving an index on the tool form's "Choose a gemini annotation database" menu that is not useable. |
Ok -- we are upgrading the Gemini load tool on Test. A version mismatch between the tool/index may be the problem. I'm not quite sure how/when the yaml is used, it might be extra (for Ephemeris?). After upgrading I'll run it again and see happens. |
@bgruening The indexes are still problematic for us, even when using the updated load tool. Looks like the YAML file is needed. Could you help us to configure this correctly? Not sure if it will be @jmchilton or @natefoo doing that, or possibly @davebx. Guessing we need to wipe everything we have now, get the config correct, then rerun... ? |
Since we cannot figure out how to configure the indexes, and since Gemini only supports hg19, we decided to drop it from main. Moved checklist to card: https://github.com/galaxyproject/usegalaxy-playbook/projects/2#card-15351010
|
This may come a bit late, but I started working on updating the gemini tool suite and on a fix for the data manager as part of this. Based on some first experiments, it looks as if the data index / load tool issue is relatively easy to fix so, if it hasn't happened yet, you might want to reconsider the removal of the tools from main. |
Ok, I'll get some feedback. These are really useful tools (imo) and we do have Gemini users. If can be fixed up, then seems totally reasonable to install the updated versions. We were not sure if these were supported anymore for future updates We only have one version of the suite now, two older indexes that are in locs but not actually in cvmfs so won't work with the old indexes, and one newer index created (that is buggy) from the DM version that matches the suite version but it is just on our test server, not published to main/cvmfs. So no way to do the first Gemini step (load), in any combination of tool version/index version at main. The older tool/index versions don't work at EU either, only the newest tool with the newest index. If curious, I ran tests of all combos available at both servers in the test history above. We could decide (for main):
The first step would need to be done at some point anyway, keeping the old tools/indexes just take up space and create an opportunity for job failures that we can predict (and prevent by just not making them available). But users might wonder where they went or if being fixed or what. Maybe leaving the old up until replaced (if not too long!) would be better. Is more in line with what we do with other tools that have bugs, that we expect to be fixed, however, most tool bugs are not entirely fatal: eg, just problems with specific functions, sometimes working in earlier versions, etc -- so there is some workaround to offer. These tools have no workaround except to go use them at EU. (And we all know that downloading/transferring histories is tricky .. collection issues and all that.. but being worked on :) .. still, it is another thing to explain that is not quite working. Piles it on when 2 or more issues are involved. People tune out, as you have if stopped reading by now, lol. @wm75 Is there an estimate yet on when you'd be ready to publish the update to the MTS? Just general ballpark. If over 2-3 months, we might want to strip now. When tools don't work (at all, no workaround), confidence in the rest of the site drops. People don't always bother to find out why things are not working, especially if they are new users (of Galaxy entirely, or of main specifically). I fear they'll just move on, frustrated, and not look back... We really appreciate your help and feedback with all of this - thank you! |
I agree, but luckily it won't be that long. After 2 days of working on this I think I have a PR almost ready (will reference it here, when it's pushed to tools-iuc) |
So the relevant PR is galaxyproject/tools-iuc#2204 The corresponding new versions of the data manager and the gemini tools are also available from the TTS and appear to work well (though about half of the gemini tools still attempt to install the old data manager due to some metadata issue). |
@jmchilton @nekrut How do you feel about adding tools + DM from the TTS to main? ^^ |
@jmchilton removed from ticket Plan: Remove everything Gemini in GUI/data for now at main. Tools, DM, Data. |
@jennaj don't add anything from test. These tools and DM will appear soon on the MTS. |
@bgruening Thanks for the advice! We'll remove the older version/DM for now and once in MTS again can install tools/build indexes fresh using the updated version/DM. :) |
Wrappers for GEMINI 0.20.1 are now available through the MTS. Since the majority of tools in the suite have received major updates and some tools have been merged into one, I'd recommend to uninstall the old tools, then reinstall only the latest version. |
@jennaj, @davebx For fully functional gemini query and gemini actionable_mutations tools, you will have to patch a broken hyperlink in the gemini source. It's a single line of code, as mentioned here: Björn patched it on the EU-Server, but that's the only modification we did to the whole suite of tools. |
The current list of to-do items is in this card. We are waiting for Test to get stable before doing the next steps (reinstalling tools + DM, etc): https://github.com/galaxyproject/usegalaxy-playbook/projects/2#card-15351010 |
Test should be stable enough to attempt tool installs. |
Didn't install correctly. @davebx made a PR to fix the issue -- it should go into 19.05. Once on main, we can try again. |
Ok - tools installed and DM running! |
|
That’s a Python 3 error |
yes, that's right. The DM opens the file in binary mode, but tries to write a string to it. It must have escaped my attention when I updated the DM, otherwise I would have fixed it back then. |
Updated: Asked him if wants a ticket ... or do you @wm75 ? Against the IUC repo? Impacts everyone now with 19.05 coming out soon, yes? |
adding @nsoranzo |
There's already a WIP PR fixing these from @mvdbeek : galaxyproject/tools-iuc#2032
@jennaj No, unless someone installs Galaxy using Python3 (not the default yet). |
Excellent, thanks! |
Hum, Ok, then we are stuck for adding the index (or any new indexes created by impacted DMs) for cvmfs content that flows from Test > Main/CVMFS until the Python3 fixes are made. We've already upgraded :/ But good to know root problem! cc @natefoo @davebx @jmchilton Unless we can make our test-datamanager server work as a clone of main instead of test? Or .. I don't know .. some other ideas? Could/should I run this directly on main? |
@natefoo made a ticket for the Gemini DM fix here galaxyproject/tools-iuc#2408 @jmchilton could consider that for a weekly project unless someone else picks it up first, seems like we'll need it before anyone else |
ORG was recently updated to v 0.18.1.0. It needs to have the updated 2018 annotation indexes added. Choosing either 2014/2015 or 2016 annotation causes failures.
Workaround for users: Use the tool at https://usegalaxy.eu. It will fail at https://usegalaxy.org.
Current list of tasks: https://github.com/galaxyproject/usegalaxy-playbook/projects/2#card-15351010
OLD, keeping for tracking history
To fix the problem:
to get the 2018 indexes installed (DM already installed at Test DM server)
- [ ] Update all current DMs- [ ] Make list of missing DMs and add- [ ] Skip thisCleanup data tables so that only the most current Gemini indexes are listed.If users report problems, inform them to use the most current index by date.- [ ] Retest tool on Test to see if all install/dependency issues are resolved__dbkeys__
table(s)- [ ] Push the new indexes to CVMFS so they are available at ORG.- [ ] Test indexes "On Main"- [ ] Skip this: Consider updating Gemini wrapper so that it better handles what databases to list. It seems like the tool version 0.18.1.0 is dependent on a specific version of the annotation created by the updated data manager version 0.18.1.0. If this is expected ongoing (tool and index version dependent on eachother) -- should the DM should remove all prior indexes and replace with the new one?- [ ] Simplify thisNext time we update Gemini, updating the DM and creating indexes should be part of the install process.When the tool is updated, ping to have the existing "most current" index tested and if it fails, create a new one using the matching, newer DM version.The v 0.18.1.0 wrapper fails at both ORG and EU when the 2014/5 or 2016 indexes are selected but they are listed as being available on the tool form. I can't think of another tool that works this way - if an index is listed, it is compatible with the tool wrapper version.
ORG test history: https://usegalaxy.org:/u/jen/h/test-history-gemini
EU test history: https://usegalaxy.eu/u/jenj/h/test-history-gemini-tutorial
ping @davebx @natefoo @bgruening
The text was updated successfully, but these errors were encountered: