FlexDamage: Flexible Damage Function Estimation for Climate Impact Assessment

Author

Climate Impact Lab

Published

May 2, 2026

1 FlexDamage

Flexible Damage Functions are statistical emulators of the high-resolution CIL projections, which can be incorporated into other models to project climate impacts and damages. These damage functions map global or national temperature and national income levels to impact levels. This is useful to estimate impacts along new socioeconomic and climate scenarios and for models for which income and/or temperature are endogenous.

The estimated damage function is:

\[D_{it} = (\alpha_i T_t + \beta_i T_t^2) \cdot Y_{it}^{\gamma}\]

where \(D_{it}\) is the impact for region \(i\) at time \(t\), \(T_t\) is the global mean temperature anomaly from pre-industrial, \(Y_{it}\) is GDP per capita, and \(\gamma\) is the income elasticity.

The full projection equation with uncertainty is:

\[D_{it}^k = (\hat{\alpha}_{ik} T_t + \hat{\beta}_{ik} T_t^2) Y_{it}^{\hat{\gamma}_k} + \hat{\theta}_{ik} T_t Y_{it}^{\hat{\gamma}_k} + \hat{\phi}_{it}^k\]

where \(k\) indexes the Monte Carlo draw, \(\hat{\gamma}_k\) is one of 19 quantile values, \(\hat{\alpha}_{ik}\) and \(\hat{\beta}_{ik}\) are drawn from the joint normal with the provided VCV (sigma11, sigma12, sigma22), \(\hat{\theta}_{ik}\) is drawn from \(N(0, \zeta_{ik})\) (the run-specific temperature-dependent error), and \(\hat{\phi}_{it}^k\) is drawn from \(N(0, \eta_{ik})\) (the annual noise). \(\rho_i\) controls the correlation between regional and global \(\theta\) draws.

1.1 For users

The links below describe how to use and interpret the Flexible Damage Function parameter files.

  • Parameters – format, download, and usage
  • Methodology – how the parameters were estimated
  • Reports – diagnostic reports for each sector

1.2 For developers

The links below describe FlexDamage, CIL’s tool for creating Flexible Damage Functions.

2 Methodology

2.1 The damage function

The damage function relates sector-specific impacts to temperature and income:

\[D_{it} = (\alpha_i T_t + \beta_i T_t^2) \cdot Y_{it}^{\gamma}\]

Each region has its own polynomial (\(\alpha_i\), \(\beta_i\)) capturing how sensitive it is to temperature. The global parameter \(\gamma\) captures how income moderates impacts: if \(\gamma < 0\), richer regions experience less damage.

This differs from approaches like Carleton et al. (2022), which use time-varying global coefficients. Here, spatial heterogeneity is explicit through region-specific coefficients, and income effects are captured by a single elasticity rather than explicit adaptation pathways.

2.2 Projection equation

The full projection equation with uncertainty, used for Monte Carlo sampling, is:

\[D_{it}^k = (\hat{\alpha}_{ik} T_t + \hat{\beta}_{ik} T_t^2) Y_{it}^{\hat{\gamma}_k} + \hat{\theta}_{ik} T_t Y_{it}^{\hat{\gamma}_k} + \hat{\phi}_{it}^k\]

where \(k\) indexes the Monte Carlo draw, \(\hat{\gamma}_k\) is one of 19 quantile values, \(\hat{\alpha}_{ik}\) and \(\hat{\beta}_{ik}\) are drawn from the joint normal with the provided VCV (sigma11, sigma12, sigma22), \(\hat{\theta}_{ik}\) is drawn from \(N(0, \zeta_{ik})\) (the run-specific temperature-dependent error), and \(\hat{\phi}_{it}^k\) is drawn from \(N(0, \eta_{ik})\) (the annual noise). \(\rho_i\) controls the correlation between regional and global \(\theta\) draws.

2.3 Estimation steps

2.3.1 1. Data standardization

Raw simulation data (from Monte Carlo runs across GCMs, RCPs, and SSPs) is standardized into a fixed 9-column parquet: region, year, y, T, log_income, w, sdev, scenario, y_sign.

Code: standardize()

2.3.2 2. Income elasticity (gamma)

Taking logs of the damage function isolates \(\gamma\):

\[\log N_{it} = \sum_k \theta_{i,k}(T_t) + \gamma \log Y_{it} + \delta_t\]

The non-parametric temperature terms are absorbed by fixed effects (sign \(\times\) region \(\times\) temperature bin), and year fixed effects control for time trends.

Standard errors are clustered two-way by entity group and year. We extract 19 quantiles (5% through 95%) from \(N(\hat{\gamma}, SE(\hat{\gamma}))\).

Observations where the outcome is negative are assigned to separate fixed-effect groups. Values below the 5th percentile of \(|N|\) are dropped.

Code: estimate_gamma()

2.3.3 3. Global polynomial

Population-weighted global averages by year and scenario are fit with a quadratic:

\[\bar{N}_t / \bar{Y}_t^{\gamma} = \bar{\delta} + \bar{\alpha} T_t + \bar{\beta} T_t^2 + \bar{\epsilon}_t\]

The residuals \(\bar{\epsilon}_t\) are saved for computing the spatial correlation parameter \(\rho\).

2.3.4 4. Regional polynomials

For each gamma quantile, the outcome is normalized by the income effect and fit per region:

\[N_{it} / Y_{it}^{\gamma} = \delta_i + \alpha_i T_t + \beta_i T_t^2 + \epsilon_{it}\]

Sufficient statistics are computed via a single SQL GROUP BY query, then solved as a batch of 3x3 linear systems in numpy. Constraints are config-driven: \(\beta \leq 0\) for agriculture (concavity), \(\beta \geq 0\) for mortality (convexity).

The intercept \(\delta_i\) is dropped from the output: at pre-industrial temperatures (\(T = 0\)), damage is zero.

Code: fit_regional_polynomials()

2.3.5 5. Error terms

Three additional uncertainty components:

\(\rho_i\): Pearson correlation between regional and global polynomial residuals. Maintains spatial covariance in Monte Carlo draws.

\(\zeta_{ik}\): Temperature-dependent error scale, fit without intercept: \(E_{it} = \zeta_{ik} T_t + \phi_{it}\). Zero at pre-industrial by construction.

\(\eta_{ik}\): Standard deviation of \(\phi_{it}\).

Code: compute_all_error_terms()

2.3.6 6. Output

Each row contains 12 fields. region identifies the location. gamma is the income elasticity quantile value for this row (there are 19 rows per region, one per quantile). alpha and beta are the linear and quadratic temperature coefficients in the polynomial \(D(T) = \alpha T + \beta T^2\). The variance-covariance matrix of \((\alpha, \beta)\) is given by sigma11 (variance of alpha), sigma12 (covariance), and sigma22 (variance of beta), enabling joint uncertainty sampling. rho is the correlation between regional and global polynomial residuals, used to maintain spatial covariance in Monte Carlo draws. zeta is the temperature-dependent error scale and eta is the residual noise standard deviation; together they describe the prediction uncertainty that grows with temperature. rsqr1 measures the polynomial fit quality and rsqr2 measures the error model fit.

2.4 Comparison methodology (flex vs raw)

For validation, the fitted damage function is compared to the raw simulation data under specific scenarios (RCP \(\times\) SSP \(\times\) period). For a given scenario:

\[\hat{D}_i = (\alpha_i \bar{T} + \beta_i \bar{T}^2) \cdot \bar{Y}_i^{\gamma}\]

where \(\bar{T}\) and \(\bar{Y}_i\) are scenario-specific averages. This is compared to the raw simulation mean per region. The comparison reports correlation, RMSE, sign agreement, and identifies regions where the polynomial fit diverges most from the simulation data.

2.5 References

  • Carleton, T. et al. (2022). Valuing the Global Mortality Consequences of Climate Change. Quarterly Journal of Economics.
  • Hsiang, S. et al. (2017). Estimating economic damage from climate change in the United States. Science.

3 Units & Methodology

This page documents the units of every variable in the FlexDamage parameter bundles and the upstream sources they come from.

3.1 Damage function

The flexible damage function is

\[D(T, Y) = (\alpha\,T + \beta\,T^2) \cdot Y^{\gamma}\]

where:

Symbol Meaning Units
\(D\) Outcome (damage). Sector-specific (see below) sector-specific
\(T\) Local temperature anomaly °C above 1986 to 2005 climatology
\(Y\) GDP per capita 2005 USD PPP per capita
\(\alpha\), \(\beta\) Region-specific linear and quadratic coefficients \(D\)-units / °C and \(D\)-units / °C²
\(\gamma\) Globally-fitted income elasticity dimensionless

3.2 Outcome units per sector

Sector Subsector Outcome units Description
mortality allcause deaths_per_100k Excess all-cause all-age deaths per 100,000 person-years attributable to climate change (full adaptation including adaptation costs); rebased to 2005 baseline + costs/100k.
labor combined mins/worker/day Change in productive minutes per worker per day for all workers. Negative values indicate productivity loss; positive values indicate gain.
labor high_risk mins/worker/day Same as combined, for the high-heat-exposure worker subpopulation.
labor low_risk mins/worker/day Same as combined, for the low-heat-exposure worker subpopulation.
energy total kWh_per_capita Climate-attributable change in total electricity + non-electricity energy consumption per capita, rebased to 2005 baseline.
energy electricity kWh_per_capita As above, electricity only.
energy non_electricity kWh_per_capita As above, non-electricity (e.g., natural gas, oil) only.
agriculture corn / rice / soy / sorghum / cassava / wheat_combined / wheat_spring / wheat_winter log_yield_change Log change in crop yield (kg/Ha, log scale): \(\log(\text{yield}_{\text{climate}}) - \log(\text{yield}_{\text{histclim}})\).

Sign conventions:

  • Mortality: positive \(D\) means excess deaths.
  • Labor: negative \(D\) means productivity loss (typical sign under warming for hot countries); positive \(D\) means small gain (occasional in temperate countries).
  • Energy (total / electricity): positive \(D\) means more energy consumed.
  • Energy (non-electricity): typically negative under warming.
  • Agriculture: negative \(D\) means yield decline.

3.3 Temperature units

  • Variable: local-area-weighted temperature anomaly.
  • Units: °C.
  • Baseline: 1986 to 2005, computed per impact region from the GMFD reanalysis.

3.4 Income units

  • Variable: GDP per capita.
  • Units: 2005 USD PPP per capita (real, purchasing-power-parity adjusted to 2005 base year).
  • Source: SSP scenario projections from IIASA GDP and OECD Env-Growth IAMs.

3.5 Resolution

Parameters are estimated and published at two resolutions:

  • Impact Region (IR): ~24,000 sub-national regions defined by the CIL region hierarchy. The default scientific resolution.
  • Country: 245 to 253 country codes (ISO3-equivalent), obtained by population-weighted aggregation from the underlying impact-region data.

3.6 Files in each parameter bundle

For every (sector, subsector, resolution) combination on Zenodo:

File Contents
<sector>__<sub>__regional_parameters.csv Per-region parameters (one row per region × gamma quantile)
<sector>__<sub>__global_results.json Globally-fitted gamma + 19 quantile values + R²
<sector>__<sub>__metadata.json Full run config + summary statistics
<sector>__<sub>__README.md Per-sector unit summary and provenance

3.7 Reproducing

Code: https://github.com/ClimateImpactLab/flex-damage.

The estimation and report pipeline is developed at the Climate Impact Lab and runs on the University of Chicago Research Computing Center (RCC). End-to-end reproduction (from raw projection-system Monte Carlo outputs through to parameters and reports) requires access to the RCC and to the shared Climate Impact Lab data directories on Midway. Without that access, the codebase can still be inspected, and the published parameters and reports can be downloaded directly from Zenodo and the reports site.

4 Parameters

4.1 File structure

Each estimation produces three files:

regional_parameters.csv: 12 columns, 19 rows per region.

Each row contains 12 fields. region identifies the location. gamma is the income elasticity quantile value for this row (there are 19 rows per region, one per quantile). alpha and beta are the linear and quadratic temperature coefficients in the polynomial \(D(T) = \alpha T + \beta T^2\). The variance-covariance matrix of \((\alpha, \beta)\) is given by sigma11 (variance of alpha), sigma12 (covariance), and sigma22 (variance of beta), enabling joint uncertainty sampling. rho is the correlation between regional and global polynomial residuals, used to maintain spatial covariance in Monte Carlo draws. zeta is the temperature-dependent error scale and eta is the residual noise standard deviation; together they describe the prediction uncertainty that grows with temperature. rsqr1 measures the polynomial fit quality and rsqr2 measures the error model fit.

global_results.json: Gamma estimate, SE, R-squared, quantiles.

metadata.json: Run configuration and summary statistics.

4.2 Using the parameters

[Section to be expanded with user interface documentation]

4.3 Zenodo

Zenodo is a research data repository that assigns DOIs to datasets, ensuring long-term availability and citability.

4.3.1 DOI structure

Each Zenodo upload has two DOIs:

  • Version DOI: Points to a specific version (e.g., 10.5281/zenodo.19199919). Always returns the same data.
  • Concept DOI: Points to the latest version. When parameters are re-estimated, a new version is uploaded under the same concept DOI.

For reproducibility, cite the version DOI. For always-current data, use the concept DOI.

4.3.2 Current release

Version Date DOI Main changes
1.3.0 2026-04-30 10.5281/zenodo.19916065 Country-level parameters added for every sector, alongside the impact-region variants. Energy also ships an unconstrained country diagnostic.
1.1.0 2026-04-23 10.5281/zenodo.19712742 All sectors at impact-region resolution: agriculture (8 crops), mortality (all-cause all-age), labor (3 subgroups), energy (3 subsectors).
1.0.0-alpha 2026-03-22 10.5281/zenodo.19199919 First release: agriculture (8 crops) at impact-region resolution.

4.3.3 Download

Latest version (auto-resolves via concept DOI)

The concept DOI always redirects to the most recent published version. This is the recommended way for users who want the latest parameters.

Shell (curl with follow-redirects):

# Resolve the concept DOI to the latest record, then list files
curl -sL "https://zenodo.org/api/records/19199918/versions/latest" \
    | python -c "import json,sys; r=json.load(sys.stdin); print(r['id']); \
                 print('\n'.join(f['links']['self'] for f in r['files']))"

# Or just grab the zip directly from the DOI-resolved URL
curl -sLO "$(curl -sL https://zenodo.org/api/records/19199918/versions/latest \
    | python -c "import json,sys; r=json.load(sys.stdin); \
                 print(next(f['links']['self'] for f in r['files'] if f['key'].endswith('.zip')))")"
unzip flexdamage-parameters-v*.zip

Replace 19199918 with the actual concept DOI number for this dataset (find it on any Zenodo record page under “Cite all versions”).

Python (requests):

import requests

CONCEPT_ID = 19199918  # concept DOI numeric part

# Resolve to latest version
r = requests.get(f"https://zenodo.org/api/records/{CONCEPT_ID}/versions/latest")
record = r.json()

print(f"Latest version: {record['metadata']['version']}")
print(f"Record ID: {record['id']}")

# Download the zip
for f in record["files"]:
    if f["key"].endswith(".zip"):
        url = f["links"]["self"]
        zip_bytes = requests.get(url).content
        with open(f["key"], "wb") as out:
            out.write(zip_bytes)
        print(f"Downloaded {f['key']} ({f['size'] / 1e6:.1f} MB)")
        break

Specific version (pin for reproducibility)

When publishing results, cite the version DOI so others can retrieve the exact parameters you used.

Shell:

# Replace with your desired version record ID
VERSION_ID=19199919
curl -sL -O "https://zenodo.org/api/records/${VERSION_ID}/files/flexdamage-parameters-v1.0.0-alpha.zip/content"

Python:

import requests

VERSION_ID = 19199919  # specific version DOI
FILENAME = "flexdamage-parameters-v1.0.0-alpha.zip"

r = requests.get(f"https://zenodo.org/records/{VERSION_ID}/files/{FILENAME}")
with open(FILENAME, "wb") as f:
    f.write(r.content)

Browse all versions

import requests

CONCEPT_ID = 19199918
r = requests.get(f"https://zenodo.org/api/records/{CONCEPT_ID}/versions")
for hit in r.json()["hits"]["hits"]:
    v = hit["metadata"]["version"]
    rid = hit["id"]
    date = hit["metadata"]["publication_date"]
    print(f"  v{v:<12} id={rid:<10} published={date}")

4.3.4 ZIP contents

flexdamage-parameters-v1.0.0-alpha/
├── agriculture/
│   ├── corn/
│   │   ├── agriculture__corn__regional_parameters.csv
│   │   ├── agriculture__corn__global_results.json
│   │   └── agriculture__corn__metadata.json
│   ├── rice/
│   │   └── ...
│   └── ...
└── README.md

Each sector/subsector directory contains the three output files. The CSV is the primary file for downstream use; the JSON files contain metadata and global estimation results.

4.3.5 Versioning

When parameters are re-estimated (e.g., with updated input data or methodology changes), a new version is uploaded under the same concept DOI. Users who pin a specific version DOI will always get the same data. The concept DOI automatically resolves to the latest version.

4.3.6 Citation

Rising, J. and Cadavid Sanchez, S. (2026). Flexible Damage Function Parameters for Climate Impact Assessment (Version 1.3.0) [Data set]. Zenodo. https://zenodo.org/records/19916065

5 Pipeline

5.1 Running

python scripts/run.py configs/agriculture/corn.yaml

5.2 What happens step by step

5.2.1 1. Standardize

Reads the source data (zarr, parquet, CSV) and produces a 9-column parquet with fixed schema. Column mappings are defined in the YAML:

data:
  source: "/path/to/data.zarr"
  columns:
    y: "log_yield_impact"
    temperature: "temperature_anomaly"
    income: "gdppc"
    income_is_log: false
    weight: "population"
    region: "region"
    year: "year"

5.2.2 2. Gamma estimation

Fixed-effects regression using pyfixest:

\[\log N_{it} = \sum_k \theta_{i,k}(T_t) + \gamma \log Y_{it} + \delta_t\]

FE groups: sign(y) \(\times\) region \(\times\) floor(T / 0.5). Clustered SE: two-way by group and year. Output: 19 quantiles from \(N(\hat{\gamma}, SE)\).

5.2.3 3. Regional polynomials (x19 parallel)

For each gamma quantile, DuckDB computes sufficient statistics via GROUP BY, numpy solves the 2x2 system per region.

\[N_{it} / Y_{it}^{\gamma} = \delta_i + \alpha_i T_t + \beta_i T_t^2\]

Constraints:

estimation:
  constraints:
    - parameter: "beta"
      type: "max"
      value: 0        # beta <= 0 for agriculture

5.2.4 4. Error terms

\(\rho\), \(\zeta\), \(\eta\) computed in a single SQL pass.

5.2.5 5. Export

12-column CSV + metadata JSON.

5.3 Configuration

run:
  name: "agriculture_corn_ir"

sector:
  name: "agriculture"
  subsector: "corn"
  units: "log yield impact"

data:
  source: "/path/to/data.zarr"
  columns:
    y: "log_yield_impact"
    temperature: "temperature_anomaly"
    income: "gdppc"
    income_is_log: false
    weight: "population"
    region: "region"
    year: "year"
    scenario_columns: ["rcp", "ssp"]

estimation:
  gamma:
    temperature_bins: 0.5
    cluster_se: true
    include_sign_in_fe: true
    n_quantiles: 19
    trim_percentile: 0.05
  regional:
    min_observations: 10
  constraints:
    - parameter: "beta"
      type: "max"
      value: 0

output:
  results_dir: "/path/to/results"
  parameters_dir: "/path/to/parameters"

execution:
  workers: 0
  memory_limit_gb: 200

6 Diagnostic Reports

A diagnostic report is generated for each sector/subsector. It evaluates how well the fitted polynomials represent the underlying simulation data, following the evaluation framework described in the methodology.

6.1 Parameter distributions

Eight-panel histogram showing the distribution of each estimated parameter across all regions. For gamma, the histogram is generated by sampling from the estimated \(N(\hat{\gamma}, SE)\) distribution (matching the R reference approach). For all other parameters, the histogram shows the values across regions at the median gamma quantile.

What to look for: the shape of the distributions indicates how much heterogeneity exists across regions. A wide alpha distribution means regions differ substantially in their linear temperature response. Beta concentrated near zero suggests the constraint binds frequently.

6.2 Polynomial summary

Evaluates the fitted quadratic \(D(T) = \alpha T + \beta T^2\) across all regions:

  • Zero crossings: At what temperature does the polynomial change sign? Computed as \(T^* = -\alpha / \beta\). Regions where \(\beta = 0\) (constraint binds) have no crossing. Regions with \(T^* > 20\) or \(T^* < 0\) are outside the relevant temperature range.

  • Slope distribution: The maximum slope of the polynomial between 0 and 10 C, computed as \(\max(|\alpha|, |\alpha + 20\beta|)\). Extreme slopes suggest regions where small temperature changes produce large impact changes.

  • Convexity: Fraction of regions where the unconstrained estimate would have had \(\beta > 0\) (and was therefore clipped). For agriculture with the \(\beta \leq 0\) constraint, this indicates how often the constraint binds.

6.3 Damage curves

Spaghetti plot of sampled regional polynomials, with the cross-region mean and interquartile range overlaid. Plotted from 0 to 8 C temperature anomaly.

The spread of curves indicates the range of regional responses. The mean curve shows the “typical” damage function shape. Wide IQR bands suggest substantial regional heterogeneity.

6.4 Flex vs raw comparison

The core validation: does the fitted polynomial, evaluated with scenario-specific temperature and income, reproduce the raw simulation data?

For each scenario (RCP \(\times\) SSP \(\times\) period):

\[\hat{D}_i = (\alpha_i \bar{T} + \beta_i \bar{T}^2) \cdot \bar{Y}_i^{\gamma}\]

is compared to the raw simulation mean \(\bar{N}_i\) for that region and scenario.

Diagnostics reported per scenario:

  • Correlation: Pearson correlation between flex and raw across regions. Values above 0.95 indicate the polynomial captures the spatial pattern well.

  • RMSE: Root mean squared error. Scale depends on the outcome variable units.

  • Sign agreement: Fraction of regions where flex and raw have the same sign. Sign disagreement means the polynomial predicts benefit where the simulation shows damage (or vice versa).

  • Maps: Four maps per scenario showing the spatial distribution of flex predictions, raw means, their difference, and sign agreement. Red indicates damage or overprediction; blue indicates benefit or underprediction.

  • Worst predictions: The 10 regions with the largest absolute residual between flex and raw.

Results are organized by period (2080-2099, 2060-2079) with tabs for each RCP/SSP combination.

6.5 Generating reports

python scripts/generate_report.py --sector agriculture --subsector corn \
    --results-dir /path/to/parameters \
    --format html

Reports are self-contained HTML files. Each report covers one sector/subsector combination.

6.6 Available reports

All reports are hosted at c1587s.github.io/flex-damage-reports. Each subsector name below is a direct link to its report. Empty cells indicate combinations that are not produced.

Resolution Agriculture Mortality Labor Energy
Impact region Cassava · Corn · Rice · Sorghum · Soy · Wheat (combined) · Wheat (spring) · Wheat (winter) All-cause all-age Combined · High-risk · Low-risk Total · Electricity · Non-electricity
Country Cassava · Corn · Rice · Sorghum · Soy · Wheat (combined) · Wheat (spring) · Wheat (winter) All-cause all-age Combined · High-risk · Low-risk Total · Electricity · Non-electricity
Country (unconstrained) Total · Electricity · Non-electricity

The unconstrained column only applies to energy: those reports drop the \(\beta \geq 0\) convexity constraint. Compare them against the standard country reports to see where the constraint binds (\(\beta = 0\) in the constrained run, \(\beta\) non-zero in the unconstrained run). For non-electricity in particular the unconstrained fit gives more regions concave \(\beta\) values, but tropical countries’ positive raw signal is still not recovered. That is a structural limit of the flex form against pop-weighted country data, not a constraint problem.

Each report is a self-contained HTML file with interactive plots, country maps, and scenario comparisons.