Getting Started
Installation
THORR is available on PyPI and can be installed using pip:
1pip install thorr
Initial Setup
Use THORR’s command-line interface to create a new project. The new-project command creates a new THORR project with the specified name and directory using the following syntax:
1python -m thorr new-project [OPTIONS] NAME [DIR]
Replace NAME with the name of your project and DIR with the directory where you want to create the project. If DIR is not provided, the project will be created in the current directory. When creating a new project, you can also download THORR’s data files. See thorr new-project for more information on the available options.
To create a new THORR project called my_project in the current directory and download the data files, run the following command:
1python -m thorr new-project --get-data my_project
This command will create a new directory with the project name and the following structure:
my_project/
├── .env/
└── data/
The .env directory contains the configuration files for the project, and the data directory contains the data files. By default, a template configuration file is created in the .env directory. You can customize the configuration file to suit your project’s needs. See Configuration for more information on configuring your project.
To create a new THORR project without downloading the data files or creating a template configuration file, run the following command:
1python -m thorr new-project my_project
Configuration
THORR uses a configuration file to manage project settings and parameters. The configuration file is an INI (.ini) file located in the .env directory of the project.
Here is an example of a THORR configuration file:
[project]
title = my_project
project_dir = my_project
region = global
description =
start_date =
end_date =
[database]
type = postgresql
user = my_username
password = my_password
host = localhost
port = 1234
database = name_of_database
schema = name_of_schema
[data]
gis_geopackage = data/gis/thorr_gis.gpkg
ml_model = data/ml/global_ml.joblib
[data.geopackage_layers]
basins = Basins
rivers = Rivers
dams = Dams
reservoirs = Reservoirs
reaches = Reaches
buffered_reaches = BufferedReaches
[ee]
private_key_path = /path/to/earth/engine/private/key.json
service_account = service_account_email
The configuration file contains the following sections: [project], [database], [database], and [ee]. Each section contains key-value pairs that define the settings and parameters for the project.
[project]
The [project] section contains the project settings, such as the project name, region, description and time frame. The following keys are available in the [project] section:
Key |
Value |
|---|---|
name |
The name or title of the project |
project_dir |
Path to the project directory |
region |
Abbreviation of the region for the project (See Regions) |
description |
Brief description of the project |
start_date |
Start date for THORR water temperature estimates |
end_date |
End date for THORR water temperature estimates |
Regions
The region key in the [project] section specifies the region for the project. THORR has separately trained models for different regions. The following regions are currently available in THORR:
Region |
Abbreviation |
|---|---|
Columbia River Basin |
crb |
[database]
The [database] section contains the database connection settings. The following keys are available in the [database] section:
Key |
Value |
|---|---|
type |
Type of database: postgresql or mysql |
user |
Username |
password |
Password |
host |
Host address |
port |
Port number |
database |
Name of the database where the schema is stored |
schema |
Name of the schema |
See Database Setup for more information on setting up the database.
[data]
The [data] section contains the paths to the GIS and machine learning data files. An additional [data.geopackage_layers] sub-section is dedicated to the GIS geopackage layers. The following keys are available in the [data] section:
Key |
Value |
|---|---|
gis_geopackage |
File path to the GIS geopackage file that contains all the vector files used by THORR. |
ml_model |
Path to the machine learning model used to generate water temperature |
[data.geopackage_layers]
The [data.geopackage_layers] sub-section contains the names of the layers in the GIS geopackage file. The following keys are available in the [data.geopackage_layers] sub-section:
Key |
Value |
|---|---|
basins |
Layer name for basins |
rivers |
Layer name for rivers |
dams |
Layer name for dams |
reservoirs |
Layer name for reservoirs |
reaches |
Layer name for reaches |
See GIS Setup for more information on how THORR’s GIS data is structured.
[ee]
The [ee] section contains the configuration settings for Google Earth Engine (GEE). THORR obtains satellite information from the GEE platform. Therefore, a GEE service account and private key are required. The following keys are available in the [ee] section:
Key |
Value |
|---|---|
private_key_path |
/path/to/earth/engine/private/key.json |
service_account |
service_account_email |
Workflow and Cronjob
Once the project is set up and configured, you can start using THORR to generate water temperature estimates. THORR’s workflow consists of 4 main steps:
Read and process GIS information from database
Retrieve and process satellite data from Google Earth Engine (See retrieve-data)
Generate water temeprature estimates using machine learning models
Save the results to the database
This workflow can be automated to run at regular intervals using a cronjob.