COVIDScenarioPipeline

Photo by CDC on Unsplash

Disclaimer: The purpose of the Johns Hopkins IDD COVIDScenarioPipeline project is to provide tools for analysis of COVID-19 related data. These materials do not cover all aspects of the research process. We highly suggest that you seek external consultation from scientific experts regarding your data and the interpreation of your data.

This tutorial assumes that users have knowlege of R programming and limited command line experience. It does not require previous knowlege of GitHub. The tutorial however should be doable by someone without R programming or command line experience.

Summary

Welcome to the Johns Hopkins University Infectious Disease Dynamics COVID-19 Working Group’s COVID Scenario Pipeline, a flexible modeling framework that projects epidemic trajectories and healthcare impacts under different suites of interventions in order to aid in scenario planning.

In otherwords, this pipeline can help predict the effectiveness of an intervention in specific locations.

This tutorial will get you started with using the Johns Hopkins IDD COVIDScenarioPipeline by running the pipeline on fake data from Hawaii.

The model can applied to different spatial scales given shapefiles, population data, and COVID-19 confirmed case data. There are multiple components to the pipeline, which may be characterized as follows:

model/epidemic seeding: flight data and other known epidemic data
epidemic dynamics: disease transmission data and non-pharmaceutical intervention scenarios
calculation of health outcomes: hospital and ICU admissions and bed use, ventilator use, and deaths
summarization of model outputs into an easy to read report

The pipeline simulates the influence of interventions for a given location and can adjust for demographics of that location and neighboring locations.

Please post questions to GitHub issues with the question tag. We are prioritizing direct support for individuals engaged in public health planning and emergency response.

For more information on getting started, please visit our Getting Started wiki at HopkinsIDD/COVID19_Minimal.

This open-source project is licensed under GPL v3.0.

Getting Started

Making a pipeline repository

If you are new to GitHub, please see the New to GitHub section before proceeding.

There are two separate GitHub repositories that users need to clone/download:

One is the HopkinsIDD/COVID19_Minimal template repository, which is a template spatial repository for location-specific files. This will allow us to run the model for a specific location. We will call the directory containing these files and scripts COVID_Loc_X.

The second is the HopkinsIDD/COVIDScenarioPipeline repository, which we will download inside the COVID_Loc_X directory. This is where the scripts and files to actually run the model will be located.

First navigate to the Johns Hopkins IDD HopkinsIDD/COVID19_Minimal template repository: by clicking here.

You will see a page that looks like this:

Click on the green button that says “Use this template” as shown in the above image.

This will take you to a new page that looks like this:

Here you will:

Provide the name for your repository that you are about to create - “COVID_Loc_X” would work
Decide if you want your repository to be Public or Private
Press the green “Create repository from template” button

Great! Now you have a repository on GitHub which contains all the current HopkinsIDD/COVID19_Minimal template files and code.

It should look something like this:

Leave this open! You will want this for the next step!

Get the pipeline files onto your computer

Press the green “Clone or download” button in your github repository that you just created

This will bring up a small window. Press the small botton with an icon that looks like a clipboard. This will copy the location of your repository on GitHub.

Open a new project in RStudio (if this is new for you see New to R or RStudio)
Select the Terminal tab in Rstudio

Type the following words in the Terminal (but do not press enter yet):

git clone

Paste what is on your clipboard by either using keyboard shortcuts or edit –> paste in RStudio

Should look something like this after the dollar sign $:

git clone https://github.com/yourgithubusername/COVID_Loc_X.git

Where your github username is shown in between “github.com” and the name of the repository you created. Make sure you replace this!

Press enter

you should see some messages like:

Cloning into 'COVID_Loc_X'...

Once it is complete you will see that you now have a directory named the same as your GitHub repository that contains all the files in the repository.

Now go inside the repo by typing:

cd COVID_Loc_X and press enter

Now we will pull files from another github repo by typing the following command and pressing enter:

git clone https://github.com/HopkinsIDD/COVIDScenarioPipeline.git

You should get some output that looks something like this:

You will also now have a directory called “COVIDScenarioPipeline”.

We also need to get large files from this repository

To do this we will need git large file storage also called git-lfs.

So go to this link and download git-lfs.

Open up a new terminal window (!!but keep the other one open!!). To do this in R studio you can press on the downward arrowhead next to where it says “Terminal1”, like this:

Click on New Terminal.

For more information about Terminals in R studio see here.

In this new terminal:

On mac:

If you don’t already have homebrew do the following:

Go to your /users/local directory by typing this command in a new terminal window:

cd /users/local and press enter

Then type this command:

mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew and press enter

Then type these commands:

brew install git-lfs and enter
git lfs install and enter

Should get a message saying: git lfs initialized.

On windows:
Follow the directions located here.

On linux:
Follow the directions located here.

Go back to Terminal1 by clicking on the downward arrow next to Terminal2 and clicking on Terminal1:

Make sure you are still in the COVIDScenarioPipeline directory (the repository directory that you created with your second git pull command) by typing this command:

cd /home/app/covidsp/COVIDScenarioPipeline/ and enter

Type this command and enter:

git lfs pull

Great now we have the files we need on our local computer!

Accessing the required R and Python tools on your computer

To get the exact required versions of the R packages and Python packages, modules, and scripts, we can simply use something called Docker.

If you are new to Docker and need to set up an account go to the New to Docker section of the tutorial.

Once you are set up with Docker Desktop and Docker Hub you can proceed with the tutorial.

You can use the RStudio terminal for the next docker commands or any terminal that you perfer.

For the docker commands in this section, if you run into permissions problems, you will need to put sudo in front of the command.

First, we will pull the docker image from hub.docker.com (You’ll only have to do this the first time).

Type the following command into the Terminal 1 tab of RStudio and press enter.

docker pull hopkinsidd/covidscenariopipeline:latest

You will see something like this:

Note: This will take some time (possibly an hour or more)!

You will know it is finished when it stops printing output and the $ is back!

You should get a message that looks something like this:

If that did not happen, Docker suggests this:

Depending on how you’ve installed docker on your system, you might see a permission denied error after running the above command. If you’re on a Mac, make sure the Docker engine is running. If you’re on Linux, then prefix your docker commands with sudo. Alternatively, you can create a docker group to get rid of this issue.

What did we just do exactly?

The pull command caused Docker to grab the latest version of the hopkinsidd/covidscenariopipeline image and put it on your local machine.

If you type Docker images you will now see hello-world and hopkinsidd/covidscenariopipeline listed as repositories on your computer.

Now when we run the hopkinsidd/covidscenariopipeline image Docker, we can run commands in the hopkinsidd/covidscenariopipeline container. This is similar to running a command in a virtual machine but doesnt require booting up a virtual machine.

Here are the defintions of the various Docker terms according to Docker:

Images - The blueprints of our application which form the basis of containers. In the demo above, we used the docker pull command to download the busybox image. Containers - Created from Docker images and run the actual application. We create a container using docker run which we did using the busybox image that we downloaded. A list of running containers can be seen using the docker ps command. Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system which clients talk to. Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users. Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.

Now you will run the docker container with your current directory mounted as /home/app/covidsp/ by typing in one of the following commands (depening on your operating system):

On Linux or Mac:

docker run -it --rm -v "$(pwd)":/home/app/covidsp hopkinsidd/covidscenariopipeline

On Windows:

docker run -it --rm -v %CD%:/home/app/covidsp hopkinsidd/covidscenariopipeline

The -it flag creates an interactive tty to allow us to run commands in the container.

You may need to replace the %CD% with your absolute path for the directory you are working in on your machine.

Something like this: docker run -it --rm -v C:/Users/UserName/DirectoryName:/home/app/covidsp hopkinsidd/covidscenariopipeline

Great! now you are inside the docker container you can take a look around the files located here by typing ls.

you will see something like this:

You might also notice that the information to the left of the $ has changed as you are now in the container

docker ps shows you the containers that are running

docker ps -a shows you containers that were run in the past and currently running containers

You are running the container from the /home/app directory.

Now, the Docker container needs some local R packages installed. We can do that by typing the following command (followed by enter):

Rscript local_install.R

If there’s a prompt enter one or more numbers, or an empty line to skip updates:, just hit .

You will see lots of output printed to the screen.

We also need to mount the COVIDScenarioPipeline files in the Docker. To do this we will need Terminal2 again. (If you closed it earlier, no worries, just create another new terminal)

Then run the following command:

On Linux or Mac:

docker run -it --rm -v "$(pwd)":/home/app/covidsp -v "COVIDScenarioPipeline":/home/app/covidsp/COVIDScenarioPipeline hopkinsidd/covidscenariopipeline

Help note: I need someone to check this one… maybe no quotes???

On Windows:

docker run -it --rm -v %CD%::/home/app/covidsp -v "COVIDScenarioPipeline":/home/app/covidsp/COVIDScenarioPipeline hopkinsidd/covidscenariopipeline

We also need some R packages installed here: Rscript COVIDScenarioPipeline/local_install.R

Generating Data

Generate geodata.csv and mobility.csv

go to the terminal1 tab - or the terminal where you were first running DOCKER
go to covidsp directory In the terminal type: cd /home/app/covidsp/ and press enter
Use an Rscript to create the files: In the terminal type the following commands and press enter:

Rscript -e 'devtools::install_github("HopkinsIDD/covidImportation", ref = "v1.6")'

Rscript /home/app/covidsp/COVIDScenarioPipeline/R/scripts/build_US_setup.R -c config.yml -p /home/app/covidsp/COVIDScenarioPipeline -w TRUE

go to the data directory In the terminal type: cd /home/app/covidsp/data/ and press enter Then type: ls -l and press enter.

You should see some files named: mobility.csv and geodata.csv that were created today.

Generate Shapefiles

To do this you will need a key to gain access to an API about the census data.

To gain access go to this link.

After you fill out the information and press the “Submit Key Request” button, you will receive a message that your request has been succesfully submitted and to check your email about instructions on how to activate your new key.

Check your email for an email to acctivate your API key.

Once complete you will be taken to a page that says this: Note: if this doesnt work the first time request a new key

Now that you have the API key, you need to update the config.yml file with your key.

To do this we you can either use your favorite editor like vim, or you can simply open the config.yml file in RStudio.

This will allow you to easily modify and update the config file with your key to replace the text that says: <your census api key>.

This will open up the file in an editor in RStudio which will allow you to copy paste your API key.

Note: Make sure you copy paste your API key before the comment (# For use with the tidycensus package. ) or replace the comment like below:

Now to create our shape files we will run the following commands:

R
config <- covidcommon::load_config("config.yml") tidycensus::census_api_key(key = config$importation$census_api_key)
covidImportation::get_county_pops(c('HI'), 'HI')

In this example we are running the pipeline for Hawaii (start with this to see if you can get the pipeline to work). Later if you wanted to run the pipeline for a different state you would replace the state abbreviation. Like this for Maryland:

covidImportation::get_county_pops(c('MD'), 'MD')

After running these commands you will get some output like this: (don’t worry if you see some warnings about dplyr)

To exit R we need to run the following command:

type q() and press enter

and in response to this question: Save workspace image? [y/n/c]:

type n and press enter

Now if we go to our data directory we will see new files!

cd /home/app/covidsp/data/
ls to view the files

We now see a new county_pops_2010.csv file and a shp directory. (shp only shows up if we do more changes to the config!!)

Inside the shp directory (see inside with cd shp, followed by ls) you will see several new files:

Edit the rest of the Config file

The config file config.yml controls all of the options for running the pipeline that are currently available. More details can be found here.

Again, to edit the config file, you can find the file named config.yml in the Files pane of RStudio and click on the file name to open it in the RStudio editor.

There are two major different config setups to consider: modeling one or more US state, or modeling a location that is not a US state.

Either way, the first thing to edit is the first line that is not a comment - the first line not preceded by ##.

Change this line to say FALSE instead of TRUE:

This is just to make sure people edit the config.yml.

For following along with this tutorial to make sure the pipeline runs, please continue with our Hawaii example:

Edit config file for modeling US state

Change the modeled_states: to modeled_states: -HI

Change the setup_name: minimal to setup_name: HI.

Change popnodes: population to popnodes: pop2010

Make sure shaefile: and shapefile_name: are both shapefile: shp/counties_2010_HI.shp.

Thus the spatial_setup: section of the config file should match this:

Leave the next section the same and scroll down to the seeding: section.

Change folder_path: imporation/minimal/ to folder_path: importation/HI/`.

Edit config file for modeling Location other than US state

See here for details: https://github.com/HopkinsIDD/COVID19_Minimal/wiki/Getting-Started-Non-US-Location

Build and Run

cd /home/app/covidsp/
Rscript COVIDScenarioPipeline/R/scripts/make_makefile.R -c config.yml
mkdir notebooks
cd notebooks
mkdir HI_today
cd /home/app/covidsp/
Rscript -e 'rmarkdown::draft("notebooks/HI_today/HI_report.Rmd",template="state_report",package="report.generation",edit=FALSE)' 8.cd notebooks/HI_today/
ls 10.cd /home/app/covidsp/ 11.echo 'rmarkdown::render("notebooks/HI_today/HI_report.Rmd", params=list(state_usps="HI"))' >compile_Rmd.R 12.make clean 13.make

If you want to rerun

1.cd /home/app/covidsp/
2. ls -l
3. mv .files .oldfiles
4. make

New to GitHub

If you already have a GitHub account, you can skip this section and move onto the Getting Started section.

What is GitHub?

GitHub is a site that allows users to host and manage code and data files. Thus, you can store your code on the web so that you and others can easily access it (and so that is safe if something happens to your computer!).

It is especailly useful for what is called version control which allows you to track changes to documents overtime.

So although it is intended for version control of code, you can actually use GitHub for version control of many types of documents.

Why do I need an account?

By signing up for an account you can easily access up-to-date files and code for the COVIDScenarioPipeline to allow you to easily run the pipeline on your data.

Better yet, if you learn more about GitHub, you can also use your account to save the files and code for your analysis and track changes over time. You can share your analysis privately with just your team or you can even make it public for others to use.

To learn more about GitHub see here.

Create a GitHub Account:

Click this link

You will see a page that looks something like this:

Fill out a username (any name that works for you), email, and password
Click the green “sign up for GitHub” button

New to R or RStudio

Dowload and install R and RStudio

If you are new to R or RStudio, dont worry! You can follow these simple steps to get started.

You will need to download install RStudio (and possibly R if you do not already have it installed).

To do so follow this tutorial.

Create an RStudio project

Go to File –> New Project

Choose the directory for your covid project - likely you would want “New Directory”

Select “New Project” as the Project Type Note: you may not see all of the same options as shown here

4) If you selected a new directory, than designate the name of that new directory and double check that it’s location is somewhere on your computer that you would want. Perhaps COVID_Loc_X would be a good name. We will use this in our examples.

Great! Now you are ready to start using RStudio for the COVIDScenarioPipeline. Return to the Getting Started section of the tutorial.

New to Docker

What is Docker?

Docker allows people to have the same software and all of the required dependendencies easily. It is similar to a virtual machine, which allows you to run an instance of a particular operating system with the particular software. However, Docker uses your own operating system, so it doesnt require as much overhead.

Check out this guide for more information.

Why do I need an account?

Create a Docker Account and Download Docker Desktop:

Click this link

You will see a page that looks something like this:

Fill out a Docker ID (any name that works for you), email, and password - click that you are not a robot
Click the blue “Sign Up” button
You will be taken to a new window - Select the free Community Docker Plan

Verify your account through your email
This will take you to a new window that looks like this:

Click on “Get started with Docker Desktop”

This will take you to a window with an image like this which should have a button below for downloading Docker Desktop on your computer:

Note: This may take some time (possibly more than hour)!

Installing Docker Desktop

To install Docker follow the instructions for either:

Once installed the directions should have gotten you to the point where you can run docker run hello-world from the terminal tab in RStudio. (by typing it in and pressing enter)

This should give some ouput that starts like this:

Great! Now you are ready to return to the Accessing the required R and Python tools on your computer section of the tutorial (just click the name of the section here to return to it).

COVIDScenarioPipeline

Infectious Disease Dynamics Group (IDD) at Johns Hopkins University

Summary

Getting Started

Making a pipeline repository

Get the pipeline files onto your computer

Accessing the required R and Python tools on your computer

Generating Data

Generate geodata.csv and mobility.csv

Generate Shapefiles

Edit the rest of the Config file

Edit config file for modeling US state

Edit config file for modeling Location other than US state

Build and Run

If you want to rerun

New to GitHub

What is GitHub?

Why do I need an account?

Create a GitHub Account:

New to R or RStudio

Dowload and install R and RStudio

Create an RStudio project

New to Docker

What is Docker?

Why do I need an account?

Create a Docker Account and Download Docker Desktop:

Installing Docker Desktop