intro
intro.Rmd
Introduction
TL;DR
Extreme version
The projr
workflow:
-
Initialisation: Run
projr::projr_init()
to set up the project with metadata,projr
-specific configuration and a Git repo connected to GitHub (if you so choose). -
Building the project: Run
projr::projr_build_output()
to build the project, version the project and its components, and upload the components to GitHub, OSF and/or a local folder.
Project directory structure (e.g. that raw data are kept in the
data-raw
folder) and on-build actions (e.g. uploading to
GitHub) are specified in the _projr.yml
file.
There are only three functions to remember:
projr_init()
, projr_build_output()
and
projr_path_get()
. The projr
configuration
file, _projr.yml
, specifies project defaults that are
appropriate for the majority of projects, but can be customised to your
needs.
Less extreme version
projr
is a package that helps you to set up and maintain
a project in a way that is reproducible, shareable and versioned.
-
Initialisation: An interactive initialisation
function,
projr::projr_init()
.- This sets up the project with metadata,
projr
-specific configuration and a Git repo connected to GitHub (if you so choose).
- This sets up the project with metadata,
-
Project and component versioning: The project as a
whole is given a version (e.g.
v1.2.1
), and the components of the project (e.g. code, data, figures, documents) versioned individually and linked to this version.- This means that you can always find the exact inputs that were used to create the outputs.
-
Test builds: To test build the project, run
projr::projr_build_dev()
.- This renders the literate programming documents in your project and saves the outputs to a temporary directory (in your workspace).
-
Builds: To build the project, run
projr::projr_build_output()
.- This saves the outputs to their final destination (e.g. the
_output
folder), as these are regarded as the latest state of the project (rather than a development version). - In addition, a component of the pverall project version is bumped
(e.g.
v1.2.1
changes tov2.0.0
if the major version is bumped), and the individual project components (e.g. code, data, figures, documents) are versioned and linked to this new version. - Finally, the project components are uploaded to GitHub, OSF and/or a local folder.
- This saves the outputs to their final destination (e.g. the
The _projr.yml
file specifies the project directory
structure and on-build actions. The directories
key
specifies the project directory structure, in terms of where raw data,
outputs (intermediate and final) and rendered documents are kept. The
build
key specifies on-build actions, such as whether and
how the code are versioned, and which project components (raw data,
outputs, docs) are uploaded to GitHub, OSF and/or a local folder.
There are only three functions to remember:
projr_init()
, projr_build_output()
and
projr_path_get()
(which is described later). The
projr
configuration file, _projr.yml
,
specifies project defaults that are appropriate for the majority of
projects, but is highly customisable for your needs.
Install projr
Install projr
:
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("SATVILab/projr")
The projr
workflow
Initialising the project
- Open
R
in the working directory you want to use it in.
If using RStudio
, then the easiest way to do this is to
open RStudio
, click File
, then
New Project
then choose whether you want to open it in an
existing directory or create a new one.
- Run
projr::projr_init()
.
This will prompt you for metadata about your project, such as a description for it as well as your name and email address. You are also able to choose whether to create a Git repo and, if so, whether to use GitHub (if not already set up to use GitHub, then see the tips below on creating a GitHub account)
Overall, what projr_init
does is set up the project with
metadata, projr
-specific configuration and a Git repo
connected to GitHub (if you so choose).
Building the project
projr_build_output()
One can build the project using projr_build_output()
.
This will automatically detect whatever literate programming documents
are used (Rmd’s, Qmd’s, Quarto projects, bookdown
projects)
and render all of them. By deafult, the rendered documents (e.g. HTML,
PDF) are saved to the _docs
directory.
Versioning
Project versioning
The project, as whole, is versioned. This version is specified in the
DESCRIPTION
file, and is updated automatically by
projr
at each build.
Semantic versioning is used, so the version is of the form
x.y.z
, where x
is the major version,
y
is the minor version and z
is the patch
version. For example, a project version (before a build) might be
0.1.0
.
- If the
patch
version is updated, then the version would become0.1.1
. - If the
minor
version is updated, then the version would become0.2.0
. - If the
major
version is updated, then the version would become1.0.0
.
This helps communicate the size of the changes made in a build. In
software development, the patch
version is updated for
small changes, the minor
version for new features and the
major
version for breaking changes.
For analysis projects (including, e.g. data processing sub-projects),
the patch
version might be updated for small changes, the
minor
version for new analyses and the major
version for changes to the data processing pipeline. It’s up to you to
decide what each one means - just communicate it to your
collaborators.
Component versioning
At each build, projr
also versions project components
(code, R
packages and files). This ensures that the outputs
(e.g. figures, tables and documents) are linked to versioned inputs
(e.g. raw data and code). They are linked by the project version, so
that you can always find the exact inputs that were used to create the
outputs.
Sharing the project
Optionally, immediately after each build projr
uploads
these components to GitHub, OSF and/or a local folder. For example, by
default the raw data, output (e.g. figures) and rendered documents
(e.g. HTMLs, PDFs)s are uploaded to GitHub as GitHub releases.
It is not important to understand exactly what a GitHub release is at this point, other than to note that it is a free, easy, secure way to share non-code files (e.g. data, documents) with others.
This is done automatically at each build, and is highly customisable. Uploads to the Open Science Foundation (OSF) and local folders are also supported.
To understand and customise the above process further, it’s important
to familiarse yourself with the projr
configuration file,
_projr.yml
. Details of all the options are available in
later vignettes.
_projr.yml
The projr
configuration file, _projr.yml
,
specifies two main things:
- What is kept where (e.g. which folder contains the raw data). This
falls under the
directories
key. - If and how project inputs and outputs are shared and stored (e.g. on
GitHub, OSF or a local path). This falls under the
build
key.
directories
Structure
Here is the default directories
key:
# default directory settings:
directories:
data-raw:
path: _data_raw
cache:
path: _tmp
output:
path: _output
docs:
path: docs
The purpose is to specify the paths to the key folders in your project. They are as follows:
-
data-raw
: The folder where raw data is stored. -
cache
: This stores files created during project builds that you do not want to share. For example, if you use theR
packagetargets
to cache intermediate output, the cached intermediate output will go here. -
output
: This also stores files created during project builds - but these you want to share, e.g. figures or tables. -
docs
: This is where the rendered documents are stored, e.g. the HTML or PDF files.
To access the paths to these folders, use the
projr_path_get
function. For example:
projr_path_get("output")
will return
"_output"
It can be given further arguments, specifying sub-directories and file names. For example:
projr_path_get("output", "figure", "nobel_prize_winning_plot.png")
will return
"_output/figure/nobel_prize_winning_plot.png"
So, code saving a figure to this path might look like this:
png(filename = projr_path_get("output", "figure", "nobel_prize_winning_plot.png")
# ... code to create the plot
dev.off()
In future, if the path to the output folder changes in
_projr.yml
(e.g. to _output_figures
instead of
_output
), projr_path_get
will automatically
update the path.
Essentially, projr_path_get
should replace either
hardcoded paths
("_output/figure/nobel_prize_winning_plot.png"
) or
file.path
/here::here
calls.
Benefits
Using folders in the manner described (e.g. storing raw data in the
data-raw
folder and outputs in the output
folder) has the following benefits:
- A person new to the project can read one file
(
_projr.yml
) rather than trawling through source code to understand the project structure. - Sharing parts of the project (e.g. the raw data) and setting up the
project again (e.g. on a new computer or on a collaborator’s computer)
is easier as like things are kept together. This is as opposed to, for
example, keeping some raw data in
_data_raw
, others ininst/extdata
and others at the root of the project, which makes bringing all the pieces together harder. - It helps
projr
help you, as it understands your project structure. This takes the following forms:-
projr
can “protect” the output folder. Theprojr_build_dev()
function (described more below) builds the project, but saves outputs to thecache
folder rather than theoutput
folder. This prevents you accidentally overwriting outputs from a previous successful build whilst iterating on the code in preparation for the next build. -
projr
can upload the data to various places. As described in the next subsection (build
),projr
can send data from these folders (data-raw
,output
, etc.) to various places - at present, this includes GitHub releases, OSF and local folders (e.g. your OneDrive project folder). -
projr
can automatically ignore large files from Git. If you don’t understand what this means, then don’t worry - in fact, it’s great that you’re here as this meansprojr
can help you to use the incredibly useful versioning tool that is Git without having to understand its details.
-
There are other benefits, but they rely on understanding some more
advanced projr
features, which are not covered in this
vignette.
build
The build
key specifies actions that take place
automatically when you build the project.
Here is the first part of the default build
key (if the
project is connected to GitHub):
This specifies that, after the project is built, the
data-raw
directory (i.e. the contents of
_data_raw
) is uploaded to GitHub as a release whose name is
input
.
By default, it is uploaded to a file called
data-raw-v<project_version>.zip
, where
<project_version>
is the version of the project
(e.g. 0.1.0
). Furthermore, by default this only takes place
when anything changes within data-raw
, i.e. file(s) are
added, removed or changed.
These uploads are designed to be highly flexible, and can be
customised to your needs. Please see the destination
vignette for more details (evignette("destination")
).
By default, the build
key can also specify that the
output
and docs
folders are uploaded to GitHub
as releases, or alternatively to OSF or a local folder.
In addition, the build key handles Git settings, such as whether or not Git commits are automatically created before and after each build, and whether they are pushed to GitHub automatically. By default, they are, which means that the Git history is updated with each build, and the changes are pushed to GitHub.
Use projr_build_
functions to build your project
projr
provides two main functions to help you build your
project, projr_build_output()
and
projr_build_dev()
. In a nutshell,
projr_build_dev()
is for checking code as you write it, and
projr_build_output()
is for snapshotting and sharing your
project.
Both will automatically detect the type of source documents you have
(Rmd’s, Qmd’s, Quarto projects, bookdown
projects) and
render them accordingly.
projr_build_output()
projr_build_output()
- This does the same as
projr_build_dev()
, except that the output documents are
saved to their final destination (as specified in
docs
/output
) and the outputs (GitHub/OSF) are
sent to, as desired). - If you set the project up with GitHub, then by
default the data-raw
, docs
and
output
folders are uploaded to GitHub as GitHub
releases.
projr_build_dev()
projr_build_dev()
renders the literate programming
documents in your document.
Any results are saved to the cache
directory, under
<cache>/projr/v<project_version>/<directory_label>
.
For example, suppose we have the code
png(filename = projr_path_get("output", "figure", "nobel_prize_winning_plot.png")
# ... code to create the plot
dev.off()
in our Rmd
file. When using
projr_build_dev
, the plot will be saved to
_tmp/projr/v<project_version>/output/figure/nobel_prize_winning_plot.png
.
This is instead of saving it to
_output/figure/nobel_prize_winning_plot.png
, which is the
final destination.
The same goes for the rendered documents (e.g. HTML files), which
would be saved to
_tmp/projr/v<project_version>/docs
.
In addition, projr_build_dev
allows you to render just a
chosen selection of documents,
e.g. projr_build_dev("analysis.Rmd")
will render just the
analysis.Rmd
document (can do more than one).
Create a GitHub account, and save GitHub credentials
This section is not really necessary to introduct projr
,
but Git and GitHub are so useful for code versioning and sharing that
it’s worth mentioning here.
Do I need Git and GitHub?
Strictly speaking, you do not need a GitHub account.
However, it is a good idea to have one, as it is a great way to
collaborate with and share with others. Don’t worry about learning Git -
projr
can (and will by default) handle that for you.
There are three steps to setting up Git and GitHub:
- Create a GitHub account.
- Generate a GitHub personal authentication token (PAT).
- Save this token to your computer.
One does not need to install Git, as projr
reverts to
the gert
R
package if Git
is not
installed.
The steps
Create a GitHub account
If you don’t have a GitHub account, create one here. You can follow the advice here for choosing a user name.
Generate a GitHub PAT
If you have not set up your GitHub credentials, do the following:
- Install the
usethis
package:
if (!requireNamespace("usethis", quietly = TRUE)) {
install.packages("usethis")
}
- Use
usethis
to set up your GitHub credentials:
Run the following in R
:
usethis::create_github_token()
Your browser should open to GitHub. At the bottom of the page, click
Generate token
. Then, copy the token.
Save your GitHub PAT
- Use
usethis
to save your GitHub credentials:
Run the following in R
:
gitcreds::gitcreds_set()
Paste your Git credentials when prompted.
You should now be able to download and upload private repositories to GitHub (i.e. repositories that are only accessible to you and those you give access to).
Note: The above instructions are taken from here, which provides slightly more details.