Disclaimer: The purpose of the Johns Hopkins IDD COVIDScenarioPipeline project is to provide tools for analysis of COVID-19 related data. These materials do not cover all aspects of the research process. We highly suggest that you seek external consultation from scientific experts regarding your data and the interpreation of your data.
This tutorial assumes that users have knowlege of R programming and limited command line experience. It does not require previous knowlege of GitHub. The tutorial however should be doable by someone without R programming or command line experience.
Welcome to the Johns Hopkins University Infectious Disease Dynamics COVID-19 Working Group’s COVID Scenario Pipeline, a flexible modeling framework that projects epidemic trajectories and healthcare impacts under different suites of interventions in order to aid in scenario planning.
In otherwords, this pipeline can help predict the effectiveness of an intervention in specific locations.
This tutorial will get you started with using the Johns Hopkins IDD COVIDScenarioPipeline by running the pipeline on fake data from Hawaii.
The model can applied to different spatial scales given shapefiles, population data, and COVID-19 confirmed case data. There are multiple components to the pipeline, which may be characterized as follows:
The pipeline simulates the influence of interventions for a given location and can adjust for demographics of that location and neighboring locations.
Please post questions to GitHub issues with the question tag. We are prioritizing direct support for individuals engaged in public health planning and emergency response.
For more information on getting started, please visit our Getting Started wiki at HopkinsIDD/COVID19_Minimal.
This open-source project is licensed under GPL v3.0.
If you are new to GitHub, please see the New to GitHub section before proceeding.
There are two separate GitHub repositories that users need to clone/download:
One is the HopkinsIDD/COVID19_Minimal template repository, which is a template spatial repository for location-specific files. This will allow us to run the model for a specific location. We will call the directory containing these files and scripts COVID_Loc_X.
The second is the HopkinsIDD/COVIDScenarioPipeline repository, which we will download inside the COVID_Loc_X directory. This is where the scripts and files to actually run the model will be located.
First navigate to the Johns Hopkins IDD HopkinsIDD/COVID19_Minimal template repository: by clicking here.
You will see a page that looks like this:
Click on the green button that says “Use this template” as shown in the above image.
This will take you to a new page that looks like this:
Here you will:
Great! Now you have a repository on GitHub which contains all the current HopkinsIDD/COVID19_Minimal template files and code.
It should look something like this:
Leave this open! You will want this for the next step!
Open a new project in RStudio (if this is new for you see New to R or RStudio)
Select the Terminal tab in Rstudio
git clone
Should look something like this after the dollar sign $:
git clone https://github.com/yourgithubusername/COVID_Loc_X.git
Where your github username is shown in between “github.com” and the name of the repository you created. Make sure you replace this!
you should see some messages like:
Cloning into 'COVID_Loc_X'...
Once it is complete you will see that you now have a directory named the same as your GitHub repository that contains all the files in the repository.
cd COVID_Loc_X and press enter
git clone https://github.com/HopkinsIDD/COVIDScenarioPipeline.git
You should get some output that looks something like this:
You will also now have a directory called “COVIDScenarioPipeline”.
To do this we will need git large file storage also called git-lfs.
So go to this link and download git-lfs.
Open up a new terminal window (!!but keep the other one open!!). To do this in R studio you can press on the downward arrowhead next to where it says “Terminal1”, like this:
Click on New Terminal.
For more information about Terminals in R studio see here.
In this new terminal:
On mac:
If you don’t already have homebrew do the following:
cd /users/local and press enter
mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew and press enter
Then type these commands:
brew install git-lfs and enter
git lfs install and enter
Should get a message saying: git lfs initialized.
On windows:
Follow the directions located here.
On linux:
Follow the directions located here.
Go back to Terminal1 by clicking on the downward arrow next to Terminal2 and clicking on Terminal1:
Make sure you are still in the COVIDScenarioPipeline directory (the repository directory that you created with your second git pull command) by typing this command:
cd /home/app/covidsp/COVIDScenarioPipeline/ and enter
Type this command and enter:
git lfs pull
Great now we have the files we need on our local computer!
To get the exact required versions of the R packages and Python packages, modules, and scripts, we can simply use something called Docker.
If you are new to Docker and need to set up an account go to the New to Docker section of the tutorial.
Once you are set up with Docker Desktop and Docker Hub you can proceed with the tutorial.
You can use the RStudio terminal for the next docker commands or any terminal that you perfer.
For the docker commands in this section, if you run into permissions problems, you will need to put sudo in front of the command.
Type the following command into the Terminal 1 tab of RStudio and press enter.
docker pull hopkinsidd/covidscenariopipeline:latest
You will see something like this:
Note: This will take some time (possibly an hour or more)!
You will know it is finished when it stops printing output and the $ is back!
You should get a message that looks something like this:
If that did not happen, Docker suggests this:
Depending on how you’ve installed docker on your system, you might see a permission denied error after running the above command. If you’re on a Mac, make sure the Docker engine is running. If you’re on Linux, then prefix your docker commands with sudo. Alternatively, you can create a docker group to get rid of this issue.
What did we just do exactly?
The pull command caused Docker to grab the latest version of the hopkinsidd/covidscenariopipeline image and put it on your local machine.
If you type Docker images you will now see hello-world and hopkinsidd/covidscenariopipeline listed as repositories on your computer.
Now when we run the hopkinsidd/covidscenariopipeline image Docker, we can run commands in the hopkinsidd/covidscenariopipeline container. This is similar to running a command in a virtual machine but doesnt require booting up a virtual machine.
Here are the defintions of the various Docker terms according to Docker:
Images - The blueprints of our application which form the basis of containers. In the demo above, we used the docker pull command to download the busybox image. Containers - Created from Docker images and run the actual application. We create a container using docker run which we did using the busybox image that we downloaded. A list of running containers can be seen using the docker ps command. Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system which clients talk to. Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users. Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.
/home/app/covidsp/ by typing in one of the following commands (depening on your operating system):On Linux or Mac:
docker run -it --rm -v "$(pwd)":/home/app/covidsp hopkinsidd/covidscenariopipeline
On Windows:
docker run -it --rm -v %CD%:/home/app/covidsp hopkinsidd/covidscenariopipeline
The -it flag creates an interactive tty to allow us to run commands in the container.
You may need to replace the %CD% with your absolute path for the directory you are working in on your machine.
Something like this: docker run -it --rm -v C:/Users/UserName/DirectoryName:/home/app/covidsp hopkinsidd/covidscenariopipeline
Great! now you are inside the docker container you can take a look around the files located here by typing ls.
you will see something like this:
You might also notice that the information to the left of the $ has changed as you are now in the container
docker ps shows you the containers that are running
docker ps -a shows you containers that were run in the past and currently running containers
You are running the container from the /home/app directory.
Rscript local_install.R
If there’s a prompt enter one or more numbers, or an empty line to skip updates:, just hit
You will see lots of output printed to the screen.
Then run the following command:
On Linux or Mac:
docker run -it --rm -v "$(pwd)":/home/app/covidsp -v "COVIDScenarioPipeline":/home/app/covidsp/COVIDScenarioPipeline hopkinsidd/covidscenariopipeline
Help note: I need someone to check this one… maybe no quotes???
On Windows:
docker run -it --rm -v %CD%::/home/app/covidsp -v "COVIDScenarioPipeline":/home/app/covidsp/COVIDScenarioPipeline hopkinsidd/covidscenariopipeline
Rscript COVIDScenarioPipeline/local_install.Rcd /home/app/covidsp/ and press enterRscript -e 'devtools::install_github("HopkinsIDD/covidImportation", ref = "v1.6")'
Rscript /home/app/covidsp/COVIDScenarioPipeline/R/scripts/build_US_setup.R -c config.yml -p /home/app/covidsp/COVIDScenarioPipeline -w TRUE
cd /home/app/covidsp/data/ and press enter Then type: ls -l and press enter.You should see some files named: mobility.csv and geodata.csv that were created today.
To do this you will need a key to gain access to an API about the census data.
To gain access go to this link.
After you fill out the information and press the “Submit Key Request” button, you will receive a message that your request has been succesfully submitted and to check your email about instructions on how to activate your new key.
Check your email for an email to acctivate your API key.
Once complete you will be taken to a page that says this: Note: if this doesnt work the first time request a new key
Now that you have the API key, you need to update the config.yml file with your key.
To do this we you can either use your favorite editor like vim, or you can simply open the config.yml file in RStudio.
This will allow you to easily modify and update the config file with your key to replace the text that says: <your census api key>.
This will open up the file in an editor in RStudio which will allow you to copy paste your API key.
Note: Make sure you copy paste your API key before the comment (# For use with the tidycensus package. ) or replace the comment like below:
Now to create our shape files we will run the following commands:
R
config <- covidcommon::load_config("config.yml") tidycensus::census_api_key(key = config$importation$census_api_key)
covidImportation::get_county_pops(c('HI'), 'HI')
In this example we are running the pipeline for Hawaii (start with this to see if you can get the pipeline to work). Later if you wanted to run the pipeline for a different state you would replace the state abbreviation. Like this for Maryland:
covidImportation::get_county_pops(c('MD'), 'MD')
After running these commands you will get some output like this: (don’t worry if you see some warnings about dplyr)
To exit R we need to run the following command:
type q() and press enter
and in response to this question: Save workspace image? [y/n/c]:
type n and press enter
Now if we go to our data directory we will see new files!
cd /home/app/covidsp/data/
ls to view the files
We now see a new county_pops_2010.csv file and a shp directory. (shp only shows up if we do more changes to the config!!)
Inside the shp directory (see inside with cd shp, followed by ls) you will see several new files:
The config file config.yml controls all of the options for running the pipeline that are currently available. More details can be found here.
Again, to edit the config file, you can find the file named config.yml in the Files pane of RStudio and click on the file name to open it in the RStudio editor.
There are two major different config setups to consider: modeling one or more US state, or modeling a location that is not a US state.
Either way, the first thing to edit is the first line that is not a comment - the first line not preceded by ##.
Change this line to say FALSE instead of TRUE:
This is just to make sure people edit the config.yml.
For following along with this tutorial to make sure the pipeline runs, please continue with our Hawaii example:
Change the modeled_states: to modeled_states: -HI
Change the setup_name: minimal to setup_name: HI.
Change popnodes: population to popnodes: pop2010
Make sure shaefile: and shapefile_name: are both shapefile: shp/counties_2010_HI.shp.
Thus the spatial_setup: section of the config file should match this:
Leave the next section the same and scroll down to the seeding: section.
Change folder_path: imporation/minimal/ to folder_path: importation/HI/`.
See here for details: https://github.com/HopkinsIDD/COVID19_Minimal/wiki/Getting-Started-Non-US-Location
cd /home/app/covidsp/Rscript COVIDScenarioPipeline/R/scripts/make_makefile.R -c config.ymlmkdir notebookscd notebooksmkdir HI_todaycd /home/app/covidsp/Rscript -e 'rmarkdown::draft("notebooks/HI_today/HI_report.Rmd",template="state_report",package="report.generation",edit=FALSE)' 8.cd notebooks/HI_today/ls cd /home/app/covidsp/ 11.echo 'rmarkdown::render("notebooks/HI_today/HI_report.Rmd", params=list(state_usps="HI"))' >compile_Rmd.R 12.make clean 13.make1.cd /home/app/covidsp/
2. ls -l
3. mv .files .oldfiles
4. make
If you already have a GitHub account, you can skip this section and move onto the Getting Started section.
GitHub is a site that allows users to host and manage code and data files. Thus, you can store your code on the web so that you and others can easily access it (and so that is safe if something happens to your computer!).
It is especailly useful for what is called version control which allows you to track changes to documents overtime.
So although it is intended for version control of code, you can actually use GitHub for version control of many types of documents.
By signing up for an account you can easily access up-to-date files and code for the COVIDScenarioPipeline to allow you to easily run the pipeline on your data.
Better yet, if you learn more about GitHub, you can also use your account to save the files and code for your analysis and track changes over time. You can share your analysis privately with just your team or you can even make it public for others to use.
To learn more about GitHub see here.
You will see a page that looks something like this:
If you are new to R or RStudio, dont worry! You can follow these simple steps to get started.
You will need to download install RStudio (and possibly R if you do not already have it installed).
To do so follow this tutorial.
4) If you selected a new directory, than designate the name of that new directory and double check that it’s location is somewhere on your computer that you would want. Perhaps COVID_Loc_X would be a good name. We will use this in our examples.
Great! Now you are ready to start using RStudio for the COVIDScenarioPipeline. Return to the Getting Started section of the tutorial.
Docker allows people to have the same software and all of the required dependendencies easily. It is similar to a virtual machine, which allows you to run an instance of a particular operating system with the particular software. However, Docker uses your own operating system, so it doesnt require as much overhead.
Check out this guide for more information.
You will see a page that looks something like this:
Click on “Get started with Docker Desktop”
Note: This may take some time (possibly more than hour)!
docker run hello-world from the terminal tab in RStudio. (by typing it in and pressing enter)This should give some ouput that starts like this:
Great! Now you are ready to return to the Accessing the required R and Python tools on your computer section of the tutorial (just click the name of the section here to return to it).