-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRA test creating a huge local file/cache #442
Comments
BTW, its the first test in the data provider that causes the problem. If I run just the second test (or even all the tests in the SRA package except that first one), the cache is there but much smaller. |
@a-nikitiuk Could you have a look at this? |
Thanks for the information provided. We are working on a fix |
The actual fix for this will be to supply a sparse-file implementation for HFS, which resides in the C library that will either need to be automatically downloaded or require a manual download step on the part of the user. We'll let you know when that's available. We can quickly disable the first test, if that helps in the meantime. |
In general tests should not leave any files lying around. Please have tests delete any files they create. A lot of people would be unpleasantly surprised to have ~/ncbi show up in their home directory without explanation, even if it only takes up 16mb. Is there a way for users to specify where the SRA cache goes? |
I've been considering how to address this one... It seems to me that we might want to be more clear about requirements, especially those that are in conflict with one another.
I haven't yet seen any evidence of a lack of robustness of SRA tests, but as I mentioned earlier we will disable tests that access real-world data if the issue is disk-space usage. As far as simultaneously providing a robust test suite that does not utilize the network or disk store, all while behaving as if we didn't exist as far as the user is concerned - these appear to be contradictory requirements. |
@kwrodarmer I didn't mean to attack the test suite in general. We're trying to make it so that everyone can live in harmony and happiness, and fixing problems as we find them is a necessary part of that.
5/6) It's totally fine to set up a temporary cache are for the tests. It's also fine to download necessary files from the internet in order to test that functionality. We have mechanisms for creating temp files / directory for just this sort of case. It's problematic if the files being downloaded are large though. If we're having to stream 40g files from somewhere then we should change the tests so they use artificially small data files. It sounded like the actual size is much smaller though, and the giant cache files are a result of a deficiency in the file handling code? |
I think we're in agreement with having adequate tests. What about the possibility of dividing the tests into +SRA and -SRA, where the former would imply the user's agreement that they want to use SRA facilities, probably including network access, downloads of native code, establishing cache areas, etc., and the latter would exclude SRA from the testing? Or perhaps divide them into network-enabled and network-disabled? One of the things that you seem to be struggling with is that the NGS Java code is nothing more than a language binding. Most of the real work happens in C within the shared library. To enable SRA without having previously installed this library, without network access, and/or without the ability to download and refresh SRA code might not make much sense. Incidentally, we are going to try to provide better stack traces from the C code across the Java boundary, because otherwise it becomes difficult to look into what really happens. So taking your points in order:
|
I''m able to run these tests now - I think this was fixed in #638 and can be closed ? |
@a-nikitiuk Is there anything else to be done here ? |
Nope, it can be closed |
Closed via #638. |
Running code pulled from master (up to commit 96a5571), this SRA test tests fill up the hard drive on my MacBook running Yosemite. The same code/build/test (ish) running on my Ubuntu VM on a Windows host leaves only a 16M file. In both case, the tests all pass. I'm fairly sure its not downloading that much data since I'm behind a relatively slow T1 line and the test runs much too quickly. If I delete the ncbi folder and rerun the test the same thing reappears:
~$ du -h ncbi
193G ncbi/public/sra
193G ncbi/public
193G ncbi
~$ ls -l ncbi/public/sra
total 403865600
-rw-r--r-- 1 cmn staff 206779187200 Jan 20 14:47 SRR822962.sra.cache
The text was updated successfully, but these errors were encountered: