Restoring Artifacts
restore-artifacts.RmdArtifact restoration allows you to download versioned project components from remote sources. This is useful for setting up projects on new machines, sharing reproducible research with collaborators, and recovering data from archived versions.
This documentation covers the three restoration workflows and their configuration options.
Overview
projr provides functions to restore project artifacts (raw data, outputs, documents) from remote sources. These functions download versioned files that were previously archived during builds.
What Are Artifacts?
In projr, artifacts are the versioned components of your project:
- raw-data - Source data files
-
cache - Intermediate computation results
- output - Final analysis outputs (figures, tables)
- docs - Rendered documents (HTML, PDF)
- code - All Git-tracked source files
These artifacts are archived to remote destinations during production builds and can be restored later.
When to Restore
Common scenarios for restoration:
- New machine setup - Clone repository and restore data to start working
- Collaboration - Team member needs project data to reproduce analysis
- Disaster recovery - Lost local files need to be recovered from archives
- Reproducibility - Restore inputs that created published outputs (always gets latest remote version)
Restoration Functions
projr provides three restoration functions for different workflows:
-
projr_restore_repo()- Clone repository and restore artifacts (most common) -
projr_restore_repo_wd()- Restore into current directory -
projr_restore()- Restore artifacts in existing local project
projr_restore_repo()
Clone a GitHub repository and restore artifacts in one step.
Syntax:
library(projr)
projr_restore_repo(
repo, # "owner/repo" or "repo"
path = NULL, # Where to clone (default: creates subdirectory)
label = NULL, # Which artifacts (default: all raw artifacts)
pos = NULL, # Source position (default: both "source" and "dest")
type = NULL, # Remote type (default: first available)
title = NULL # Remote title (default: first available for type)
)Basic usage:
# Clone into subdirectory and restore all raw artifacts
projr_restore_repo("owner/repo")
# Clone to specific location
projr_restore_repo("owner/repo", path = "~/projects/my-analysis")
# Clone into current directory (use with caution!)
projr_restore_repo("owner/repo", path = ".")What it does:
- Clones repository into specified directory (or creates subdirectory)
- Reads
_projr.ymlto find configured remotes - Restores artifacts (default: all
raw-*labels like raw-data; cache is NOT included) - Returns
TRUEif successful,FALSEotherwise
projr_restore_repo_wd()
Clone directly into current working directory, then restore artifacts.
Syntax:
projr_restore_repo_wd(
repo, # "owner/repo" or "repo"
label = NULL, # Which artifacts
pos = NULL, # Source position
type = NULL, # Remote type
title = NULL # Remote title
)Usage:
# From within target directory
setwd("~/projects/my-analysis")
projr_restore_repo_wd("owner/repo")Warning: This creates files directly in your current directory. Ensure you’re in the right location!
projr_restore()
Restore artifacts in an existing project without cloning.
Syntax:
projr_restore(
label = NULL, # Which artifacts (default: all raw artifacts)
pos = NULL, # Source position (default: both)
type = NULL, # Remote type (default: first available)
title = NULL # Remote title (default: first available for type)
)Usage:
# Navigate to project first
setwd("~/projects/my-analysis")
# Restore all raw artifacts
projr_restore()
# Restore specific artifacts
projr_restore(label = "raw-data")
projr_restore(label = c("raw-data", "cache"))Requires: Project must have
manifest.csv file in root directory.
Parameters
label Parameter
Controls which artifacts are restored.
label = NULL (default) - Restore all
raw-* artifacts
# Restores: raw-data and any other raw-* directories
# Note: cache is NOT restored by default (not a raw-* label)
projr_restore()
# To restore cache explicitly:
projr_restore(label = c("raw-data", "cache"))label = "raw-data" - Restore specific
artifact
projr_restore(label = "raw-data")label = c("raw-data", "cache") -
Restore multiple artifacts
projr_restore(label = c("raw-data", "cache"))Valid labels:
Any directory label defined in your _projr.yml:
-
raw-data- Source data files -
cache- Cached computation results -
output- Analysis outputs (use cautiously - usually regenerated) -
docs- Rendered documents (use cautiously - usually regenerated)
When to restore outputs/docs:
- Comparing results across versions
- When computation is expensive and results are archived
- Generally, prefer regenerating outputs with
projr_build_dev()
type Parameter
Specifies which remote type to use.
type = NULL (default) - Use first
available remote
# Checks remotes in the order they appear in _projr.yml
# (not a fixed order - depends on your configuration)
projr_restore()type = "github" - Use GitHub releases
only
projr_restore(type = "github")type = "local" - Use local directory
only
projr_restore(type = "local")type = "osf" - Use OSF only
projr_restore(type = "osf")When to specify:
- Multiple remotes configured and you want a specific one
- Testing specific remote sources
- One remote is faster or more reliable
title Parameter
Selects a specific remote configuration when multiple exist.
title = NULL (default) - Use first
available title for selected type
title = "network-backup" - Use specific
remote
projr_restore(type = "local", title = "network-backup")Example with multiple local remotes:
# _projr.yml
build:
local:
local-backup:
title: "local-backup"
content: [raw-data]
path: "~/backup/raw-data"
network-backup:
title: "network-backup"
content: [raw-data]
path: "/mnt/shared/raw-data"
# Restore from local backup
projr_restore(type = "local", title = "local-backup")
# Restore from network backup
projr_restore(type = "local", title = "network-backup")pos Parameter
Controls whether to restore from source directories or build destinations.
pos = NULL (default) - Check both in
order
# Checks "source" first, then "dest"
projr_restore()pos = "source" - Source directories
only
projr_restore(pos = "source")pos = "dest" - Build destinations
only
projr_restore(pos = "dest")pos = c("source", "dest") - Both
(explicit)
projr_restore(pos = c("source", "dest"))When to use:
- Usually the default is appropriate
- Use
pos = "dest"if you specifically archived outputs and want to restore them - Use
pos = "source"if you only want original source artifacts
Authentication
GitHub Authentication
Restoring from GitHub requires authentication.
Required: GitHub Personal Access Token (PAT)
Setup:
# Get detailed instructions
projr_instr_auth_github()Set environment variable:
Verify:
# Check token is set
Sys.getenv("GITHUB_PAT") # Should show your tokenOSF Authentication
Restoring from OSF requires authentication.
Required: OSF Personal Access Token
Setup:
# Get detailed instructions
projr_instr_auth_osf()Set environment variable:
Complete Examples
Example 1: New Collaborator Setup
Set up a project on a new machine:
# Step 1: Clone and restore
projr_restore_repo("satvilab/my-study")
# Step 2: Navigate into project
setwd("my-study")
# Step 3: Install R dependencies
renv::restore()
# Step 4: Run analysis
projr_build_dev() # Test build
projr_build_patch() # Production build when readyExample 2: Selective Restoration
Restore only specific artifacts, not all:
# Step 1: Clone repository
projr_restore_repo("owner/repo")
setwd("repo")
# Step 2: Restore only raw data (not cache or other artifacts)
projr_restore(label = "raw-data")
# Step 3: Regenerate other artifacts by running analysis
projr_build_dev()Example 3: Selective Restoration
Restore only raw data, not cached results:
# Restore only raw data
projr_restore(label = "raw-data")
# Regenerate cache by running analysis
projr_build_dev()Example 4: Fallback to Different Remote
Try multiple remote sources:
# Try GitHub first
result <- tryCatch(
projr_restore(type = "github"),
error = function(e) FALSE
)
# Fallback to local if GitHub fails
if (!result) {
projr_restore(type = "local")
}Example 5: Network Drive Restoration
Restore from shared network drive:
# _projr.yml configuration
build:
local:
network-data:
title: "network-data"
content: [raw-data]
path: "/mnt/shared/project-data"
# Restore from network drive
projr_restore(type = "local", title = "network-data")How Restoration Works
The Manifest System
projr uses manifest.csv to track file versions and
hashes.
Manifest structure:
label | fn | version | hash
-----------|-------------------|---------|----------------------------------
raw-data | data/survey.csv | v0.1.0 | 5d41402abc4b2a76b9719d911017c592
raw-data | data/survey.csv | v0.2.0 | 5d41402abc4b2a76b9719d911017c592
output | figures/plot.png | v0.1.0 | 098f6bcd4621d373cade4e832627b4f6
How it’s used:
- projr reads
manifest.csvfrom your project to identify available artifacts - Determines which remote version to restore (latest for archive remotes)
- Downloads files from configured remotes
Version Resolution
projr determines which version to restore from remotes:
For archive remotes (most common): Always restores the latest available version from the remote, regardless of your current Git checkout or project version.
For latest remotes: Restores the current snapshot.
Example:
Remote has archived versions: v0.1.0, v0.2.0, v0.3.0
Your Git checkout: v0.1.0
Result: Still restores v0.3.0 (latest available on remote)
Note: projr_restore() always fetches
the newest remote version for archive remotes. To work with older
artifact versions, manually download them from the remote source (e.g.,
GitHub Releases).
Restoration Process
Step-by-step restoration:
-
Read configuration - Parse
_projr.ymlto find remote sources -
Read manifest - Load
manifest.csvto identify available artifacts - Select sources - Choose appropriate remotes for each label
- Determine version - Identify which remote version to restore (latest for archives)
- Download files - Retrieve files from remotes and place in local directories
Remote Source Priority
projr checks remote sources in the order they appear in your
_projr.yml configuration. The order depends on how you’ve
configured your remotes and is not fixed.
When type = NULL (default), projr:
- Looks for sources or destinations in the order listed under
build:in_projr.yml - Uses the first remote that has the requested artifact configured
The specific order depends on your project’s configuration.
Best Practices
For Project Setup
Archive raw data:
# Configure raw data archiving (one-time setup)
projr_yml_dest_add_github(
title = "raw-data-@version",
content = "raw-data",
send_cue = "if-change"
)Document restoration:
Add to your project README.md:
For Collaborators
Share restoration instructions:
# Option 1: Automated setup
projr::projr_restore_repo("owner/repo")
setwd("repo")
renv::restore()
# Option 2: Manual clone, then restore
# git clone https://github.com/owner/repo
# cd repo
# R -e "projr::projr_restore()"Test restoration:
# Test in clean environment before sharing
tempdir <- tempdir()
projr::projr_restore_repo("owner/repo", path = tempdir)For Reproducibility
Archive everything needed:
# Archive code (all Git-tracked files)
projr_yml_dest_add_github(
title = "code-@version",
content = "code"
)
# Archive raw data
projr_yml_dest_add_github(
title = "raw-data-@version",
content = "raw-data"
)
# Archive outputs for comparison
projr_yml_dest_add_github(
title = "output-@version",
content = "output"
)Document dependencies:
Use renv for R package versions:
renv::snapshot()Document system dependencies in README.md:
For Version Control
Note on version history:
projr_restore() always fetches the latest remote version
for archive remotes. To work with artifacts from older versions, you’ll
need to manually download them from the remote source or implement
custom restoration logic.
Document version history:
Keep notes in NEWS.md or CHANGELOG.md:
Common Pitfalls
No Manifest File
Problem: "No manifest.csv found"
Solutions:
- The project needs to have been built at least once with projr
- Clone a project that has manifest in its repository
- If manifest is missing, manually download files from GitHub releases
No Remote Sources
Problem:
"No remote sources configured"
Solutions:
- Check
_projr.ymlhas remote destinations underbuild: - Verify remote configurations are valid with
projr_yml_check() - Manually download files from remote sources
Authentication Errors
Problem:
"Authentication required for GitHub"
Solutions:
- Set up
GITHUB_PATenvironment variable - Run
projr_instr_auth_github()for instructions - Verify token with
Sys.getenv("GITHUB_PAT") - Ensure token has
reposcope
Download Failures
Problem: Failed to download from remote
Solutions:
- Check internet connection
- Verify remote still exists (GitHub release, local path)
- Check authentication if required
- Try different remote type:
projr_restore(type = "local")
Hash Mismatches
Problem: "File hash mismatch"
Solutions:
- File on remote may be corrupted
- File may have been modified outside projr
- Try restoring from different remote
- Check if remote was updated manually
Partial Restoration
Problem: Some artifacts restore, others fail
Behavior: projr continues with available artifacts
Recovery:
# Try different remote type
projr_restore(type = "local")
# Try specific artifact
projr_restore(label = "raw-data")
# Manual download as last resort
# Download from GitHub releases manuallyTroubleshooting
Check Configuration
View remote configuration:
# View entire configuration
projr_yml_get()
# View specific sections
projr_yml_get()$build$github
projr_yml_get()$build$local
# Validate configuration
projr_yml_check()Verbose Output
Enable detailed output:
# Set environment variable
Sys.setenv(PROJR_OUTPUT_LEVEL = "debug")
# Run restoration
projr_restore()
# Check detailed logs
# Located in _tmp/projr/log/Test Restoration
Test in isolated environment:
# Create test directory
test_dir <- tempfile()
dir.create(test_dir)
# Test restoration
result <- tryCatch(
projr_restore_repo("owner/repo", path = test_dir),
error = function(e) {
message("Restoration failed: ", e$message)
FALSE
}
)
# Clean up
if (result) {
message("Restoration successful!")
unlink(test_dir, recursive = TRUE)
}Summary
Quick Start
# Most common: Clone and restore
projr_restore_repo("owner/repo")
# In existing project: Restore artifacts
projr_restore()
# Specific artifact: Restore raw data only
projr_restore(label = "raw-data")Key Functions
-
projr_restore_repo()- Clone repository and restore artifacts -
projr_restore_repo_wd()- Restore into current directory -
projr_restore()- Restore in existing project
Key Parameters
- label - Which artifacts to restore (default: all raw-* artifacts)
- type - Which remote type (github, local, osf; default: first available)
- title - Which remote configuration (default: first available for type)
- pos - Source position (source, dest; default: both)
Key Concepts
- Artifacts - Versioned project components (raw-data, output, docs, code)
- Manifest - CSV file tracking file versions and hashes
- Remotes - Sources from which artifacts can be restored
- Version resolution - Determining which version to restore
See Also
-
?projr_restore- Full restoration documentation -
?projr_restore_repo- Repository restoration documentation -
vignette("send-to-remotes")- Configuring remotes for archiving -
vignette("environment")- Environment variables includingGITHUB_PAT