tag:blogger.com,1999:blog-1208196493518969640.post418592130460741558..comments2015-08-10T11:33:19.185-07:00Comments on Reproducible Scientific Computing: Introduction: Portability is ReproducibilityDouglas Thainhttp://www.blogger.com/profile/10046446527813216338noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-1208196493518969640.post-39114064948936335682015-05-26T06:02:55.843-07:002015-05-26T06:02:55.843-07:00There are certainly cases in CS where the datasets...There are certainly cases in CS where the datasets are small, and so dumping everything into one VM is a perfectly good solution. And you are quite right about incentives.<br /><br />But if you look at the real sciences, the data situation is different. In bioinformatics, you frequently have thousands of analyses performed on a common dataset of tens to hundreds of gigabytes. You can squeeze 100GB into a VM, but you wouldn't want to duplicate it once for each analysis code, so you need a way to archive the data independently of the code, be able to refer to it, mount it at runtime, and verify that you have the right thing. Same thing in astronomy at larger scales (100s of TB) and high energy physics (100s of PBs) and many other fields as well.<br />Douglas Thainhttps://www.blogger.com/profile/10046446527813216338noreply@blogger.comtag:blogger.com,1999:blog-1208196493518969640.post-6604581509676492352015-05-25T14:47:43.573-07:002015-05-25T14:47:43.573-07:00I've had some experience with this in a variet...I've had some experience with this in a variety of contexts.<br />I don't think the problem is tooling, because apt and conda cover a wide variety of situations for deploying software.<br />Sometimes data is a problem, but for most computer science papers I read the data sets aren't that large.<br />For many of the papers (and some textbooks) I've read, reproducibility could be solved with a virtual machine snapshot. The fact that this isn't done very often is telling. The real problem is the lack of incentive to make work reproducible.<br /> Robert Zehhttps://www.blogger.com/profile/12647206624571931529noreply@blogger.com