Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a dockerfile using VEP 85, after 1.6.7 release. #64

Merged
merged 3 commits into from
Aug 11, 2016

Conversation

mattions
Copy link
Contributor

Hello,
this adds a DockerFile to the vcf2maf so it can be run from a container.

The commit is just after version 1.6.7, so we basically have the release Dockerizeit, however should happily land in master with no conflicts.

@ckandoth
Copy link
Collaborator

Nice work. Looks like INSTALL.pl doesn't download a local cache. vcf2maf expects an annotation cache it can use offline. I never added support for online mode, where VEP can contact the Ensembl database. Did you test this out?

@mattions
Copy link
Contributor Author

Hi @ckandoth,
usually a local cache does not get added to the docker image because it is too big.

We still expect the user to download the cache manually, in the same way it is happening now.

I didn't try to use without a local downloaded cache

@ckandoth
Copy link
Collaborator

Thanks. Merging this in.

@ckandoth ckandoth merged commit 3f757be into mskcc:master Aug 11, 2016
@sambrightman
Copy link

I have a few questions and suggestions about this Dockerfile:

  • cache could be populated on first docker run, either by using CMD and running the container as a daemon (subsequently using docker exec to run vcf2maf.pl) or by using ENTRYPOINT to get an executable image and combining that with a system similar to the data initialisation code used by MySQL
    • especially if intending to use the container as a daemon, install the tools in a more standard location (/usr/local/bin) and make them executable (they already have perl shebangs)
  • several cleanups are possible, e.g. https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/, no need to copy Dockerfile twice, no need to store archive files and delete them afterwards etc.
  • since the default parameters refer to ExAC.r0.3.sites.minus_somatic.vcf.gz perhaps that should also be downloaded by default?
  • more generally, the interface to configuration and output could be better defined. It might be nice, for example, to:
    • use default VEP locations (~/.vep, installed in ~/vep)
    • allow use of VEP's --config option or use vep.ini for all options (including the default setup)
    • run as daemon container, setting up cache when docker run is executed (one-time run) so cache is available to users but not bloating the image
    • users can now bind-mount their VEP configuration and cache into ~/.vep (currently we add a separate VEP step, then run vcf2maf.pl - allowing custom configuration would avoid having to do this)
    • they can also manually update cache or install extra plugins
    • can bind-mount an input file, referencing the name in the vep.ini (same applies to default vep.ini)

Essentially I'm imaging a situation where you use docker run once to setup your vcf2maf container with your standard VEP configuration and a populated cache. You then docker exec vcf2maf vcf2maf.pl --input-file /data/input.vcf > output.maf and get a result, which seems quite succinct.

Some of these points are clear cleanups, but for others it needs to be decided what the intended workflow is. I'm fairly new to Docker, so I don't know if this is an idiomatic workflow. Are you open to pull requests for this? I would need to try it out first.

@mattions
Copy link
Contributor Author

Hi @sambrightman ,

given that I've made the first Dockerfile, I would like to leave my two cents.

To the best of my knowledge, right now, if you do not add the Dockerfile to the image it is not possible to get it back in an easy way. It gets copied once only.
The other minor cleanup you are talking could be applied instead.

On where to install the tool, /opt is not a bad place, however also the /usr/local/bin is not bad as well. I think arguments can be made for both, so I guess it is @ckandoth decision at the end.

On the cache argument, It depends what you need to do. For example, my location of the Cache change it all the time, because it is on the cloud, so I do not want the tool to download a new cache each time this happens. I still think this could be avoided via clever configuration, however I'm not too sure it would be possible to get all the edge cases.

I'm not too sure about the running the container as a daemon. This is not a database or a persistence service, so I do not think it makes sense.

On VEP configuration, TBH, anything goes. The current one is one way to do it that it works.

I second we could put vcf2maf execution on the ENTRYPOINT or the CMD.

My two cents.

@sambrightman
Copy link

sambrightman commented Aug 25, 2016

How are you currently using the container? docker run -it --rm vcf2maf perl /opt/vcf2maf/vcf2maf.pl args? It doesn't seem problematic to me to run a container as an "environment daemon" which is set up only once to provide a tool-as-a-service in the desired environment and configuration. I would like to effectively treat a vcf2maf container as a service in this way, running it multiple times with single, simple commands instead of repeatedly passing in the same parameters.

I'm not exactly following what you mean with cloud-based cache. I'm suggesting that the container itself has a single idea of where the cache lives (probably the default location). The content of the cache can be fetched via INSTALL.pl when you create the container (not the image). However, if you docker run with a bind-mount, then it comes from a directory on your host which can contain whatever you want, obtained however you want. You could even run multiple containers, each acting as a service for a different cache.

@mattions
Copy link
Contributor Author

we run it on the Seven Bridges platform via CWL. The instance do arise and
get shutdown as needed, so, at least in our use-case, a daemon container
does not work.

On Thu, Aug 25, 2016 at 1:11 PM, Sam Brightman notifications@github.com
wrote:

How are you currently using the container? docker run -it --rm vcf2maf
perl /opt/vcf2maf/vcf2maf.pl args? It doesn't seem problematic to me to
run a container as an "environment daemon" which is set up only once to
provide a tool-as-a-service in the desired environment and configuration. I
would like to effectively treat a vcf2maf container as a service in this
way, running it multiple times with single, simple commands instead of
repeatedly passing in the same parameters.

I'm not exactly following what you mean with cloud-based cache. I'm
suggesting that the container itself has a single idea of where the cache
lives (probably the default location). The content of the cache can be
fetched via INSTALL.pl when you create the container (not the image), but
if you run with a bind-mount, then it comes from a directory on your
host, which can contain whatever you want, obtained however you want. You
could even run multiple containers, each acting as a service for a
different cache.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#64 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAF39fXPHzSDZLlXFz9ghz_WWTb6_BCEks5qjYZlgaJpZM4Jg7gG
.

@sambrightman
Copy link

I'm still not really following why it does not work - having the container run as a daemon does not prevent you from starting and stopping it as you wish (or even destroying it and setting it up again).

Do you build the cache inside the container or mount it?

@mattions
Copy link
Contributor Author

It's mounted on the fly, everytime.

On Thu, Aug 25, 2016 at 2:17 PM, Sam Brightman notifications@github.com
wrote:

I'm still not really following why it does not work - having the container
run as a daemon does not prevent you from starting and stopping it as you
wish (or even destroying it and setting it up again).

Do you build the cache inside the container or mount it?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#64 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAF39R_OfHWZ7gGLMdM-HrvTTznzqcO4ks5qjZX6gaJpZM4Jg7gG
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants