August 4, 2020 From rOpenSci (https://deploy-preview-121--ropensci.netlify.app/blog/2020/08/04/osf/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
osfr provides a (hopefully) convenient R interface to OSF (Open Science Framework, https://www.osf.io), a free service for managing research developed by the Center for Open Science (COS).
osfr completed its rOpenSci peer-review earlier this year and has been available on CRAN since February. Throughout its development and since its release I’ve had numerous conversations with members of the R community about OSF (and osfr), and through these interactions a couple recurring patterns emerged. First, it seems that many R users have heard of OSF but relatively few have first-hand experience with it. Second, I’m often asked how OSF compares to GitHub and whether it would even be useful for someone who already uses GitHub to manage their research.
In a future post I’ll highlight some features of osfr and demonstrate how it can help form the basis of efficient and inclusive research workflows. However, before you can extract any value from osfr, you need to be an OSF user first. And so, I wanted to take this opportunity to provide a little background about OSF, what it offers, how it differs from something like GitHub, and where it might fit into your workflow as an R/GitHub user.
Before diving in, I want to acknowledge that OSF is a multi-faceted product, and includes a number of services under its umbrella, including things like pre-print servers and a research registration repository, which, while incredibly cool and noteworthy, fall outside the scope of this post, which focuses on project management.
It’s not a perfect analogy as the two services differ in important ways that I’ll discuss below, but if you’re already familiar with GitHub it certainly provides a useful mental model to get a sense of what OSF offers. For example, like GitHub, OSF is a hosted project management service organized around project repositories that can be private, shared with select collaborators, or made publicly accessible. OSF projects also include cloud storage for uploading project files, automatic file versioning, markdown-based wikis for documentation and/or shared task lists, traffic analytics to assess impact and engagement, and they can even be “forked” by other users.
Ultimately, the two services are different because they were designed for different purposes: facilitating software development vs facilitating scientific research. And while the line separating these endeavors is often blurry, there are some fundamental differences. For example, supporting research requires accommodating files of all shapes and sizes, whereas software (generally) comes in one flavor: plain text. Furthermore, GitHub is intended for software developers, an inherently technical user base, whereas OSF is designed to accommodate users with a much wider range of computational expertise. And so, while OSF does share a number of features in common with GitHub, they’re often implemented quite differently to better serve OSF’s research-centric goals and users.
One notable example is how OSF handles file versions. While OSF automatically stores all previous versions of files as new ones are uploaded, it’s not a true version control system in the sense of something like git
, which intelligently stores only the differences between file versions. On OSF, the latest version of a file is simply the last uploaded version, regardless of how much it actually changed. However, what OSF’s versioning feature lacks in power, it makes up for in accessibility. Creating a new version on OSF simply requires dragging and dropping an updated file onto the project’s file browser, a much more manageable procedure for users who may not be familiar with the command line, let alone git
.
Next, let’s look at some of the key features unique to OSF that are purpose-built for the management and dissemination of research.
First, I want to point out that adopting OSF does not require upending your current workflows or abandoning services you’re already happy with because OSF directly integrates with many popular services through its add-ons system. So although OSF provides its own free cloud storage, if you’re already using Dropbox, Google Drive, S3, GitHub, etc, you can simply add the relevant folder/bucket/repo from that service to your OSF project. Each of these linked locations will show up in your OSF project’s file browser right alongside OSF’s cloud storage bucket. You can continue using the services just as you did before (syncing files with Dropbox, editing docs on Google Drive, etc). Adding these services to OSF provides the benefit of a unified portal to all of your materials, allowing you (and any OSF collaborator with proper permissions) to manage them and track their activity in one place.
OSF provides a generous amount of cloud storage for projects, limiting only the size of individual files to < 5Gb. This allows you to store all of your project materials on OSF, even large data files. Furthermore, COS takes data preservation seriously, guaranteeing that, in the unlikely event that COS ceases to exist, your files will remain accessible for at least another 50 years from the day it closes its doors.
OSF projects are designed to be easily citable. Every OSF project is assigned a conveniently short URL that comprises a globally unique identifier (GUID); for example, the Psychology Reproducibility Project’s OSF page is reachable at https://osf.io/ezcuj, where ezcuj
is its GUID. The GUIDs are immutable and permanent, so you can feel confident sharing an OSF URL, or even including it in a publication, knowing that, as long the project exists and is publicly accessible, it won’t become another tragic dead link. You can also generate a DOI for any public project—no third-party service needed. OSF also strives to make it as convenient as possible for others to cite your work by prominently placing a citation widget on every project page that auto-generates citations in any CSL-supported format.
OSF also provides a registration service to create snapshots of a project’s current state. From OSF’s docs:
A registration is a frozen, time-stamped copy of an OSF project. Registrations cannot be edited or deleted.
This way, if you decide to include an OSF project in a publication to share materials and/or data, using the registered project’s URL ensures the link always points to the exact set of files exactly as they were at the time of publication.
Perhaps OSF’s greatest super-power is that all of its functionality is available through a user-friendly point and click interface with a relatively shallow learning curve. As mentioned earlier, OSF is designed for a much wider audience of users that aren’t necessarily computationally expert. By significantly lowering the barrier to entry, OSF makes it possible for researchers of all technical abilities to improve their research workflows and follow best practices for reproducibility.
As someone who uses R (and possibly GitHub), chances are you already have efficient workflows in place for common tasks like sharing and versioning files. But even the best workflows are only as efficient as your least compliant collaborator. As much as we’d all love to write our manuscripts in version controlled RMarkdown documents and receive edits from collaborators via pull requests, the reality is, in a world still dominated by email and Microsoft Office, that’s rarely a realistic option.
Fortunately, it’s in exactly these types of collaborations that OSF really shines, as a compromise between pure git
-managed-plain-text Nirvana and whatever the opposite of that might be (I’m picturing a swirling vortex of Word documents with tweetstorm-length filenames). In my experience, after providing short 10–15 minute tutorials, colleagues have been very willing to adopt OSF as a project hub. And if you do run into holdouts, short emails along the lines of
I posted the latest results on OSF. You can see them here:
https://osf.io/...
.
go a long way towards gently nudging them in the right direction. Generally, I’ve found that they really appreciate the peace of mind that comes with knowing their data is in a safe place and the convenience of a one-stop-shop for all project-related content. As for me, I appreciate being able to simply osfr::osf_download()
the latest version of a dataset rather than digging through my email—it’s almost like having an API for your collaborators 🙃.
If your interest is piqued and you’d like to try out OSF yourself, the first thing to do is sign-up for an account. From there, you may want to look over a few of OSF’s guides, especially the project management section for help getting started. I also recommend Ian Sullivan’s OSF 101 video, which provides a comprehensive, step-by-step tutorial for getting up and running. Finally, if you’d like to checkout a few OSF projects to get a sense of some real world applications, I’ve put together a short list of nice examples that represent a few different use cases:
#ScanAllFish
uses OSF as an open access data archive that provides a large collection of high-quality 3D CT scans for a wide range of I’d like to thank Tim York, Tim Errington, Claire Riss, and Bernice Wolen for their helpful comments and suggestions to improve this post, Stefanie Butland for providing the opportunity, and Steffi LaZerte for her thorough review and helpful feedback.