API Reference¶
Documentation for the FlexDamage estimation pipeline.
Pipeline¶
FlexDamagePipeline¶
from flexdamage.pipeline import FlexDamagePipeline
pipeline = FlexDamagePipeline("configs/agriculture_corn.yaml")
result = pipeline.run()
Main pipeline orchestrator for flexible damage function estimation.
Coordinates the full estimation pipeline:
- Standardize source data to fixed parquet format
- Estimate global income elasticity (gamma)
- Fit regional polynomials for each gamma quantile
- Compute error structure parameters
- Export results to CSV and JSON
Parameters:
| Name | Type | Description |
|---|---|---|
config_path |
str or Path |
Path to YAML configuration file |
Attributes:
| Name | Type | Description |
|---|---|---|
config |
RunConfig |
Validated configuration object |
config_path |
Path |
Path to the configuration file |
run() -> dict¶
Execute full estimation pipeline.
Runs the complete estimation workflow:
- Standardize input data
- Estimate gamma via fixed-effects regression
- Fit regional polynomials for each gamma quantile
- Compute error terms (rho, zeta, eta)
- Export parameters to CSV and JSON
Returns:
| Key | Type | Description |
|---|---|---|
gamma |
float |
Point estimate of income elasticity |
gamma_se |
float |
Standard error of gamma |
n_regions |
int |
Number of regions processed |
n_observations |
int |
Total observations |
output_csv |
str |
Path to output CSV file |
output_json |
str |
Path to global results JSON |
timings |
dict |
Timing for each pipeline stage |
Data Preparation¶
standardize(config, output_path=None) -> str¶
Transform source data into standardized parquet format.
Reads input data from any supported format (zarr, parquet, CSV) and produces a standardized parquet file with fixed column names for downstream estimation.
Parameters:
| Name | Type | Description |
|---|---|---|
config |
RunConfig |
Pipeline configuration with column mappings and data source path |
output_path |
str or Path, optional |
Output path for standardized parquet. If None, creates a temp file |
Returns:
str - Path to the standardized parquet file.
Output columns (always present):
| Column | Type | Description |
|---|---|---|
region |
str |
Region identifier |
year |
int |
Year |
y |
float |
Outcome variable |
T |
float |
Temperature anomaly (°C) |
log_income |
float |
Natural log of GDP per capita |
w |
float |
Population weight |
sdev |
float or None |
MC standard deviation |
scenario |
str or None |
Scenario identifier |
y_sign |
int |
Sign of y (+1 or -1) |
Estimation¶
estimate_gamma(con, config) -> dict¶
from flexdamage.estimation.gamma import estimate_gamma
global_results = estimate_gamma(con, config)
gamma = global_results["gamma"]
Estimate global income elasticity (gamma) via fixed-effects regression.
Uses pyfixest for fast high-dimensional fixed effects regression with two-way clustered standard errors (Cameron, Gelbach & Miller 2011).
The regression model is:
Parameters:
| Name | Type | Description |
|---|---|---|
con |
duckdb.DuckDBPyConnection |
Active DuckDB connection with 'standardized' view |
config |
RunConfig |
Pipeline configuration with gamma estimation settings |
Returns:
| Key | Type | Description |
|---|---|---|
gamma |
float |
Point estimate of income elasticity |
gamma_se |
float |
Clustered standard error |
gamma_quantiles |
list |
19 quantiles from N(gamma, SE) |
r_squared |
float |
R-squared of the FE regression |
n_obs |
int |
Number of observations |
n_fe_groups |
int |
Number of fixed effect groups |
Notes:
Positive gamma indicates adaptation: richer regions experience smaller damages from the same temperature change.
fit_regional_polynomials(con, gamma, config) -> DataFrame¶
from flexdamage.estimation.regional import fit_regional_polynomials
regional_df = fit_regional_polynomials(con, gamma, config)
Fit regional polynomial coefficients (alpha, beta) for all regions.
Fits the model for each region i:
where y_norm = y * Y^(-gamma) is the income-normalized outcome.
Parameters:
| Name | Type | Description |
|---|---|---|
con |
duckdb.DuckDBPyConnection |
Active DuckDB connection with 'standardized' view |
gamma |
float |
Income elasticity value (called once per quantile) |
config |
RunConfig |
Pipeline configuration with constraints and settings |
Returns:
pandas.DataFrame with columns:
| Column | Type | Description |
|---|---|---|
region |
str |
Region identifier |
gamma |
float |
Gamma value used |
alpha |
float |
Linear temperature coefficient |
beta |
float |
Quadratic temperature coefficient |
sigma11 |
float |
Variance of alpha |
sigma12 |
float |
Covariance of alpha, beta |
sigma22 |
float |
Variance of beta |
rsqr1 |
float |
R-squared of polynomial fit |
n |
int |
Number of observations |
Notes:
Uses vectorized OLS via sufficient statistics (no Python loops over regions).
Constraints (e.g., beta <= 0 for agriculture) are applied post-estimation.
compute_all_error_terms(con, regional_params, gamma, config) -> DataFrame¶
from flexdamage.estimation.errors import compute_all_error_terms
error_df = compute_all_error_terms(con, regional_params, gamma, config)
Compute error structure parameters (rho, zeta, eta) for all regions.
Computes the error decomposition for Monte Carlo sampling:
Parameters:
| Name | Type | Description |
|---|---|---|
con |
duckdb.DuckDBPyConnection |
Active DuckDB connection with 'standardized' view |
regional_params |
pandas.DataFrame |
Regional parameters with columns [region, intercept, alpha, beta] |
gamma |
float |
Gamma value used for income normalization |
config |
RunConfig |
Pipeline configuration |
Returns:
pandas.DataFrame with columns:
| Column | Type | Description |
|---|---|---|
region |
str |
Region identifier |
rho |
float |
Correlation with global residual process |
zeta |
float |
Temperature-dependent error scale |
eta |
float |
Residual noise standard deviation |
rsqr2 |
float |
R-squared of error model fit |
Export¶
export_parameters(regional_results, global_results, config, output_dir=None) -> dict¶
from flexdamage.export.parameters import export_parameters
paths = export_parameters(regional_results, global_results, config)
print(f"CSV: {paths['csv']}")
Export estimation results to standardized CSV and JSON files.
Creates three output files:
{sector}__{subsector}__regional_parameters.csv- 12-column parameter file{sector}__{subsector}__global_results.json- Gamma estimation results{sector}__{subsector}__metadata.json- Run configuration and statistics
Parameters:
| Name | Type | Description |
|---|---|---|
regional_results |
pandas.DataFrame |
Regional parameters for all gamma quantiles |
global_results |
dict |
Global estimation results from estimate_gamma() |
config |
RunConfig |
Pipeline configuration |
output_dir |
str or Path, optional |
Output directory. Defaults to config.output.parameters_dir |
Returns:
dict with keys csv, json, metadata containing paths to output files.
CSV columns (12 total):
| Column | Type | Description |
|---|---|---|
region |
str |
Region identifier |
gamma |
float |
Gamma quantile value used |
alpha |
float |
Linear temperature coefficient |
beta |
float |
Quadratic temperature coefficient |
sigma11 |
float |
Variance of alpha |
sigma12 |
float |
Covariance of alpha, beta |
sigma22 |
float |
Variance of beta |
rho |
float |
Correlation with global residuals |
zeta |
float |
Slope of |residuals| vs T |
eta |
float |
Std dev of residual noise |
rsqr1 |
float |
R² of regional polynomial fit |
rsqr2 |
float |
R² of error term fit |
Configuration¶
load_config(path) -> RunConfig¶
from flexdamage.config import load_config
config = load_config("configs/agriculture_corn.yaml")
print(config.run.name)
Load and validate a YAML configuration file.
Parameters:
| Name | Type | Description |
|---|---|---|
path |
str or Path |
Path to YAML configuration file |
Returns:
RunConfig - Validated configuration object with nested attributes for
run, data, estimation, output, and execution settings.