John Stanton-Geddes
2014-09-10 13:07:46 UTC
Hi Carl and rOpenSci,
Apologies for jumping in late here (and let me know if this should be asked
elsewhere or a new topic) but I've also recently discovered and become
intrigued by Docker for facilitating reproducible research.
My question: what's the advantage of Docker over an amazon EC2 machine
image?
I've moved my analyses to EC2 for better than my local university cluster.
Doesn't my machine image achieve Carl's acid test of allowing others to
build and extend on work? What do I gain by making a Dockerfile on my
already existing EC2 image? Being new to all this, the only clear
advantage I see is a Dockerfile is much smaller than a machine image, but
this seems like a rather trivial concern in comparison to 100s of gigs of
sequence data associated with my project.
thanks,
John
Apologies for jumping in late here (and let me know if this should be asked
elsewhere or a new topic) but I've also recently discovered and become
intrigued by Docker for facilitating reproducible research.
My question: what's the advantage of Docker over an amazon EC2 machine
image?
I've moved my analyses to EC2 for better than my local university cluster.
Doesn't my machine image achieve Carl's acid test of allowing others to
build and extend on work? What do I gain by making a Dockerfile on my
already existing EC2 image? Being new to all this, the only clear
advantage I see is a Dockerfile is much smaller than a machine image, but
this seems like a rather trivial concern in comparison to 100s of gigs of
sequence data associated with my project.
thanks,
John
Yeah, looks like DO doesn't have it yet. I'm happy to leave EC2 to support
the little guy. But as with anything, there is a huge diversity of AMIs and
greater discoverability on EC2, at least for now.
the little guy. But as with anything, there is a huge diversity of AMIs and
greater discoverability on EC2, at least for now.
Hmm, looks like DO is planning on it, but not possible yet. Do go upvote
this feature
https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/3249642-share-an-image-w-another-account
Nice, we could work on this working privately, then when sharing is
available, boom.
You received this message because you are subscribed to the Google Groups
"ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
this feature
https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/3249642-share-an-image-w-another-account
Nice, we could work on this working privately, then when sharing is
available, boom.
Great idea. Yeah, should be possible. Does the DO API support a way to
launch a job on the instance, or otherwise a way to share a custom machine
image publicly? (e.g. the way Amazon EC2 lets you make an AMI public from
an S3 bucket?)
I suspect we can just droplets_new() with the ubuntu_docker image they
have, but that we would then need a wrapper to ssh into the DO machine and
execute the single command needed to bring up the RStudio instance in the
browser.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
--
You received this message because you are subscribed to the Google
Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--launch a job on the instance, or otherwise a way to share a custom machine
image publicly? (e.g. the way Amazon EC2 lets you make an AMI public from
an S3 bucket?)
I suspect we can just droplets_new() with the ubuntu_docker image they
have, but that we would then need a wrapper to ssh into the DO machine and
execute the single command needed to bring up the RStudio instance in the
browser.
Carl,
Awesome, nice work.
Thoughts on whether we could wrap the docker workflow into my Digital
Ocean client so that a user never needs to leave R?
https://github.com/sckott/analogsea
Scott
You received this message because you are subscribed to the Google
Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--Awesome, nice work.
Thoughts on whether we could wrap the docker workflow into my Digital
Ocean client so that a user never needs to leave R?
https://github.com/sckott/analogsea
Scott
Hi folks,
Just thought I'd share an update on this thread -- I've gotten RStudio
Server working in the ropensci-docker
<https://github.com/ropensci/docker-ubuntu-r/blob/master/add-r-ropensci/Dockerfile>
image.
docker -d -p 8787:8787 cboettig/ropensci-docker
will make an RStudio server instance available to you in your browser
at localhost:8787. (Change the first number after the -p to have a
different address). You can log in with username:pw rstudio:rstudio and
have fun.
One thing I like about this is the ease with which I can now get an
RStudio server up and running in the cloud (e.g. I took this for sail on
DigitalOcean.com today). This means in few minutes and 1 penny you have a
URL that you and any collaborators could use to interact with R using the
familiar RStudio interface, already provisioned with your data and
dependencies in place.
To keep this brief-ish, I've restricted further commentary to my blog
http://www.carlboettiger.info/lab-notebook.html
Cheers,
Carl
least
going
downloading and
docker
getting
I
easy-to-use tools being built on docker. Is Drone ubuntu-only at
the moment
work
step
a
a
for
perspective on
quite a
facilitate
quite
use it
explore
from
than me
feedback on
think
builds
point to
then
tar.gz file
environment
need a
rmarkdown::render("manuscript.Rmd"). This
interactively
the code
in
projects
Third (in
scaling
recreates the
fewer
nothing to do
use as
(&
work
So, is
now,
with
running on
been
docker?
https://github.com/mingfang/docker-druid/issues/2)
fully
and
obviously
youâre
not the
virtual
their
the
virtual
closely
writing,
list to
broad
--
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
--
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
--
You received this message because you are subscribed to the Google
Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--Just thought I'd share an update on this thread -- I've gotten RStudio
Server working in the ropensci-docker
<https://github.com/ropensci/docker-ubuntu-r/blob/master/add-r-ropensci/Dockerfile>
image.
docker -d -p 8787:8787 cboettig/ropensci-docker
will make an RStudio server instance available to you in your browser
at localhost:8787. (Change the first number after the -p to have a
different address). You can log in with username:pw rstudio:rstudio and
have fun.
One thing I like about this is the ease with which I can now get an
RStudio server up and running in the cloud (e.g. I took this for sail on
DigitalOcean.com today). This means in few minutes and 1 penny you have a
URL that you and any collaborators could use to interact with R using the
familiar RStudio interface, already provisioned with your data and
dependencies in place.
To keep this brief-ish, I've restricted further commentary to my blog
http://www.carlboettiger.info/lab-notebook.html
Cheers,
Carl
Thanks Rich! some further thoughts / questions below
<javascript:>>Hi Carl,
Thanks for this!
I think that docker is always going to be for the "crazies", at
Thanks for this!
I think that docker is always going to be for the "crazies", at
in it's current form. It requires running on Linux for starters -
I've got it running on a virtual machine on OSX via virtualbox, but
the amount of faffing about there is pretty intimidating. I believe
it's possible to get it running via vagrant (which is in theory
I've got it running on a virtual machine on OSX via virtualbox, but
the amount of faffing about there is pretty intimidating. I believe
it's possible to get it running via vagrant (which is in theory
to be easier to distribute) but at that point it's all getting a bit
silly. It's enlightening to ask a random ecologist to go to the
website for docker (or heroku or vagrant or chef or any of these
newfangled tools) and ask them to guess what they do. We're down a
rabbit hole here.
Completely agree here. Anything that cannot be installed bysilly. It's enlightening to ask a random ecologist to go to the
website for docker (or heroku or vagrant or chef or any of these
newfangled tools) and ask them to guess what they do. We're down a
rabbit hole here.
clicking on something is dead in the water. It looks like Docker is
justdownload and click on Macs or Windows. (Haven't tested, I have only
linuxboxes handy). So I'm not sure that the regular user needs to know
that it'srunning a linux virtual machine under the hood when they aren't on a
linuxbox.
So I'm optimistic think the installation faffing will largely go
away, if itSo I'm optimistic think the installation faffing will largely go
hasn't yet. I'm more worried about the faffing after it is
installed.I've been getting drone (https://github.com/drone/drone ) up and
running here for one of our currently-closed projects. It uses
running here for one of our currently-closed projects. It uses
as a way of insulating the build/tests from the rest of the system,
but it's still far from ready to recommend for general use. The
advantages I see there are: our test suite can run for several hours
without worrying about running up against allowed times, and working
for projects that are not yet open source. It also simplifies
but it's still far from ready to recommend for general use. The
advantages I see there are: our test suite can run for several hours
without worrying about running up against allowed times, and working
for projects that are not yet open source. It also simplifies
things off the container, but I think there are a bunch of ways of
doing that easily enough. However, I'm on the lookout for something
much simpler to set up, especially for local use and/or behind NAT.
doing that easily enough. However, I'm on the lookout for something
much simpler to set up, especially for local use and/or behind NAT.
can post the dockerfile at some point (it's apparently not on this
computer!) but it's similarly simple to yours.
Very cool! Yeah, I think there's great promise that we'll see morecomputer!) but it's similarly simple to yours.
easy-to-use tools being built on docker. Is Drone ubuntu-only at
then?
leastAs I see it, the great advantage of all these types of approaches,
independent of the technology, is the recipe-based approach to
documenting dependencies. With travis, drone, docker, etc, you
document your dependencies and if it works for you it will probably
work for someone else.
Definitely. I guess this is the heart of the "DevOpts" approach (atindependent of the technology, is the recipe-based approach to
documenting dependencies. With travis, drone, docker, etc, you
document your dependencies and if it works for you it will probably
work for someone else.
according the BCE paper I linked -- they have nice examples that use
thesetools, but also include case studies of big collaborative science
projectsthat do more-or-less the same thing with Makefiles.
I think the devil is still in the details though. One thing I like
aboutI think the devil is still in the details though. One thing I like
Docker is the versioned images. If you re-run my build scripts even
5 daysfrom now, you'll get a different image due to ubuntu repo updates,
etc. Butit's easy to pull any of the earlier images and compare.
Contrast this to other approaches, where you're stuck with locking in
particular versions in the build script itself (a la packrat) or
just hopingContrast this to other approaches, where you're stuck with locking in
particular versions in the build script itself (a la packrat) or
the most recent version is good enough (a la CRAN).
etc)I'm OK with this being nerd only for a bit, because (like travis
it's going to be useful enough without having to be generally
accessible. But there will be ideas here that will carry over into
less nerdy activities. One that appeals to me would be to take
advantage of the fancy way that Docker does incremental builds to
accessible. But there will be ideas here that will carry over into
less nerdy activities. One that appeals to me would be to take
advantage of the fancy way that Docker does incremental builds to
with large data sets that are tedious to download: pull the raw data
as one RUN command, wrangle as another. Then a separate wrangle
as one RUN command, wrangle as another. Then a separate wrangle
will reuse the intermediate container (I believe). This is sort of
different way of doing the types of things that Ethan's "eco data
retriever" aims to do. There's some overlap here with make, but in
retriever" aims to do. There's some overlap here with make, but in
way that would let you jump in at a point in the analysis in a fresh
environment.
Great point, hadn't thought about that.environment.
I don't think that people will jump to using virtual environments
the sake of it - there has to be some pay off. Isolating the build
from the rest of your machine or digging into a 5 year old project
probably does not have widespread appeal to non-desk types either!
Definitely agree with that. I'd like to hear more about yourfrom the rest of your machine or digging into a 5 year old project
probably does not have widespread appeal to non-desk types either!
CI tools though -- of course we love them, but do you think that CI
has alarger appeal to the average ecologist than other potential
'benefits'? Ithink the tangible payoffs are: (Cribbing heavily from that Berkeley
1) For instructors: having students in a consistent and optimized
environment with little effort. That environment can become a
resource1) For instructors: having students in a consistent and optimized
environment with little effort. That environment can become a
maintained and enhanced by a larger community.
2) For researchers: easier to scale to the cloud (assuming the tool
is as2) For researchers: easier to scale to the cloud (assuming the tool
easy to use on the desktop as whatever they currently do -- clearly
we'renot there yet).
3) Easier to get collaborators / readers to use & re-use. (I think
that3) Easier to get collaborators / readers to use & re-use. (I think
only happens if lots of people are performing research and/or
teaching usingthese environments -- just like sharing code written in Go just
isn't thatuseful among ecologists. Clearly we may never get here.)
in theI
think that the biggest potential draws are the CI-type tools, but
there are probably other tools that require isolation/virtualisation
that will appeal broadly. Then people will accidentally end up with
reproducible work :)
Cheers,
Rich
think that the biggest potential draws are the CI-type tools, but
there are probably other tools that require isolation/virtualisation
that will appeal broadly. Then people will accidentally end up with
reproducible work :)
Cheers,
Rich
Hi rOpenSci list + friends [^1],
Yay, the ropensci-discuss list is revived!
Some of you might recall a discussion about reproducible research
Yay, the ropensci-discuss list is revived!
Some of you might recall a discussion about reproducible research
comments of Rich et alâs recent post on the rOpenSci blog where
few
of people mentioned the potential for Docker as a way to
of people mentioned the potential for Docker as a way to
this.
Iâve only just started playing around with Docker, and though Iâm
Iâve only just started playing around with Docker, and though Iâm
impressed, Iâm still rather skeptical that non-crazies would ever
productively. Nevertheless, Iâve worked up some Dockerfiles to
how
one might use this approach to transparently document and manage a
computational environment, and I was hoping to get some feedback
one might use this approach to transparently document and manage a
computational environment, and I was hoping to get some feedback
all of
you.
For those of you who are already much more familiar with Docker
you.
For those of you who are already much more familiar with Docker
(or
are looking for an excuse to explore!), Iâd love to get your
are looking for an excuse to explore!), Iâd love to get your
some of the particulars. For everyone, Iâd be curious what you
about
the general concept.
So far Iâve created a dockerfile and image
If you have docker up and running, perhaps you can give it a test
docker run -it cboettig/ropensci-docker /bin/bash
You should find R installed with some common packages. This image
the general concept.
So far Iâve created a dockerfile and image
If you have docker up and running, perhaps you can give it a test
docker run -it cboettig/ropensci-docker /bin/bash
You should find R installed with some common packages. This image
on
Dirk Eddelbuettelâs R docker images and serves as a starting
Dirk Eddelbuettelâs R docker images and serves as a starting
test
individual R packages or projects.
For instance, my RNeXML manuscript draft is a bit more of a bear
individual R packages or projects.
For instance, my RNeXML manuscript draft is a bit more of a bear
usual
to run, since it needs rJava (requires external libs), Sxslt (only
available
on Omegahat and requires extra libs) and latest phytools (a
to run, since it needs rJava (requires external libs), Sxslt (only
available
on Omegahat and requires extra libs) and latest phytools (a
from
Liamâs website), along with the usual mess of pandoc/latex
Liamâs website), along with the usual mess of pandoc/latex
to
compile the manuscript itself. By building on ropensci-docker, we
compile the manuscript itself. By building on ropensci-docker, we
docker run -it cboettig/rnexml /bin/bash
Once in bash, launch R and run
Once in bash, launch R and run
will recompile the manuscript from cache and leave you to
explore any of the R code shown.
Advantages / Goals
Being able to download a precompiled image means a user can run
Advantages / Goals
Being able to download a precompiled image means a user can run
without dependency hell (often not as much an R problem as it is
Python,
but nevertheless one that I hit frequently, particularly as my
but nevertheless one that I hit frequently, particularly as my
age), and also without altering their personal R environment.
principle) this makes it easy to run the code on a cloud server,
the
computing resources appropriately.
I think the real acid test for this is not merely that it
computing resources appropriately.
I think the real acid test for this is not merely that it
results, but that others can build and extend on the work (with
rather
than more barriers than usual). I believe most of that has
than more barriers than usual). I believe most of that has
with this whole software image thing â providing the methods you
general-purpose functions in an R package, or publishing the raw
processed) data to Dryad with good documentation will always make
more
modular and easier to re-use than cracking open someoneâs virtual
machine.
But that is really a separate issue.
In this context, we look for an easy way to package up whatever a
researcher
or group is already doing into something portable and extensible.
modular and easier to re-use than cracking open someoneâs virtual
machine.
But that is really a separate issue.
In this context, we look for an easy way to package up whatever a
researcher
or group is already doing into something portable and extensible.
this really portable and extensible?
This presupposes someone can run docker on their OS â and from the
command
line at that. Perhaps thatâs the biggest barrier to entry right
This presupposes someone can run docker on their OS â and from the
command
line at that. Perhaps thatâs the biggest barrier to entry right
(though
given dockerâs virulent popularity, maybe something smart people
given dockerâs virulent popularity, maybe something smart people
big
money might soon solve).
The only way to interact with thing is through a bash shell
money might soon solve).
The only way to interact with thing is through a bash shell
the
container. An RStudio server might be much nicer, but I havenât
container. An RStudio server might be much nicer, but I havenât
able to
get that running. Anyone know how to run RStudio server from
get that running. Anyone know how to run RStudio server from
https://github.com/mingfang/docker-druid/issues/2)
I donât see how users can move local files on and off the docker
container.
In some ways this is a great virtue â forcing all code to use
container.
In some ways this is a great virtue â forcing all code to use
resolved
paths like pulling data from Dryad instead of their hard-drive,
paths like pulling data from Dryad instead of their hard-drive,
pushing
results to a (possibly private) online site to view them. But
results to a (possibly private) online site to view them. But
a
barrier to entry. Is there a better way to do this?
Alternative strategies
1) Docker is just one of many ways to do this (particularly if
barrier to entry. Is there a better way to do this?
Alternative strategies
1) Docker is just one of many ways to do this (particularly if
not
concerned about maximum performance speed), and quite probably
concerned about maximum performance speed), and quite probably
easiest. Our friends at Berkeley D-Lab opted for a GUI-driven
machine instead, built with Packer and run in Virtualbox, after
experience proved that students were much more comfortable with
mouse-driven installation and a pixel-identical environment to the
instructorâs (see their excellen paper on this).
2) Will/should researchers be willing to work and develop in
instructorâs (see their excellen paper on this).
2) Will/should researchers be willing to work and develop in
environments? In some cases, the virtual environment can be
coupled
to the native one â you use your own editors etc to do all the
to the native one â you use your own editors etc to do all the
and
then execute in the virtual environment (seems this is easier in
docker/vagrant approach than in the BCE.
[^1]: friends ccâd above: Weâre reviving this ropensci-discuss
then execute in the virtual environment (seems this is easier in
docker/vagrant approach than in the BCE.
[^1]: friends ccâd above: Weâre reviving this ropensci-discuss
chat
about various issues related to our packages, our goals, and more
about various issues related to our packages, our goals, and more
scientific workflow issues. Iâd encourage you to sign up for the
https://groups.google.com/forum/#!forum/ropensci-discuss
â
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
https://groups.google.com/forum/#!forum/ropensci-discuss
â
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
--
You received this message because you are subscribed to the Google
Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google
Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
--
You received this message because you are subscribed to the Google
Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups
"ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ropensci-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ropensci-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.