Carsten Behring
2014-10-06 21:47:21 UTC
Dear all,
I have a question to people doing reproducible research in practice, which
is not my case. I have an IT background, but I am very interested in the
subject.
I did some data science course on Coursera (very good by the way) and we
did our exercises with reproducible research in mind and at a certain point
I asked myself, if docker could be the unique distribution format for all
type of research, even the smallest publication (if done in rmd format)
So, does it make sense to distribute even a single Rmd file as a docker
file ? (which contains the OS + Rstudio environment, which renders the Rmd
-> PDF, or html) ?
It might sound crazy, as a docker image including OS/Rstudio has a size of
hundreds of MByte.
But docker has a build-in, very smart, storage and distribution system for
the docker images. If everybody would uses the same (or a small number) of
base images only,
then every user of the scientific papers in form of docker images, would
only need to do the download of the base images ones.
Later downloads from the same person of different papers (as docker images
again), would only download the layer with the document (very small,
comparable to word or PDF files) , as the base images gets cached by the
docker client.
(This feature does not exist for classical Virtual Machine images)
The same is true for storage of those "docker-fiet" papers. Only the
"difference" towards the base images need to be stored each time.
This means as well, that it is feasible for a university/institute to run
their own private docker registry (code is open source) and host all their
research papers as docker images.
This registries could work with docker images using whatever technologies
(R, python, java, julia, whatever runs on Linux). Everything could be
distributed in the same format.
The concrete distribution of those images is then just a matter of telling
people the URL of the docker image. (as:
http://registry.my-host.my-domain/user-name/paper-name )
Everybody with access to a docker installation (local PC / cloud) could
then use those images and reproduce the analysis or paper in the "same"
environment as the original author.
Please provide me with any comments you might have.
Carsten
I have a question to people doing reproducible research in practice, which
is not my case. I have an IT background, but I am very interested in the
subject.
I did some data science course on Coursera (very good by the way) and we
did our exercises with reproducible research in mind and at a certain point
I asked myself, if docker could be the unique distribution format for all
type of research, even the smallest publication (if done in rmd format)
So, does it make sense to distribute even a single Rmd file as a docker
file ? (which contains the OS + Rstudio environment, which renders the Rmd
-> PDF, or html) ?
It might sound crazy, as a docker image including OS/Rstudio has a size of
hundreds of MByte.
But docker has a build-in, very smart, storage and distribution system for
the docker images. If everybody would uses the same (or a small number) of
base images only,
then every user of the scientific papers in form of docker images, would
only need to do the download of the base images ones.
Later downloads from the same person of different papers (as docker images
again), would only download the layer with the document (very small,
comparable to word or PDF files) , as the base images gets cached by the
docker client.
(This feature does not exist for classical Virtual Machine images)
The same is true for storage of those "docker-fiet" papers. Only the
"difference" towards the base images need to be stored each time.
This means as well, that it is feasible for a university/institute to run
their own private docker registry (code is open source) and host all their
research papers as docker images.
This registries could work with docker images using whatever technologies
(R, python, java, julia, whatever runs on Linux). Everything could be
distributed in the same format.
The concrete distribution of those images is then just a matter of telling
people the URL of the docker image. (as:
http://registry.my-host.my-domain/user-name/paper-name )
Everybody with access to a docker installation (local PC / cloud) could
then use those images and reproduce the analysis or paper in the "same"
environment as the original author.
Please provide me with any comments you might have.
Carsten
--
You received this message because you are subscribed to the Google Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ropensci-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ropensci-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.