Data Formats

SDX licensed content is available in three formats. Choose the format that best fits your data infrastructure.

Supported formats

FormatBest forCompression
JSONAPIs, web applications, programmatic consumptiongzip
CSVSpreadsheets, SQL imports, simple analysisgzip
ParquetData warehouses, big data pipelines, analyticsSnappy (built-in)

All formats contain the same data — only the encoding differs.

Schema: Benchmark subscription

JSON

{
  "product": "sdx_market_benchmark",
  "period": "2025-Q4",
  "generated_at": "2026-01-15T00:00:00Z",
  "segments": [
    {
      "use_type": "office",
      "geography": "US",
      "size_band": "large",
      "building_count": 3842,
      "metrics": {
        "eui_kwh_m2": {
          "min": 62.1,
          "p25": 142.3,
          "median": 189.7,
          "p75": 248.5,
          "max": 520.0
        },
        "carbon_kgco2e_m2": {
          "min": 18.4,
          "p25": 48.2,
          "median": 65.1,
          "p75": 89.3,
          "max": 198.0
        },
        "water_litres_m2": {
          "min": 120,
          "p25": 380,
          "median": 520,
          "p75": 710,
          "max": 1850
        }
      }
    }
  ]
}

CSV

use_type,geography,size_band,building_count,eui_min,eui_p25,eui_median,eui_p75,eui_max,carbon_min,carbon_p25,carbon_median,carbon_p75,carbon_max
office,US,large,3842,62.1,142.3,189.7,248.5,520.0,18.4,48.2,65.1,89.3,198.0

The CSV header row is always present. Fields are comma-separated, strings are quoted where they contain commas.

Schema: Market index

JSON

{
  "product": "sdx_global_index",
  "index_type": "eui_kwh_m2",
  "base_period": "2020-Q1",
  "base_value": 100.0,
  "data_points": [
    { "period": "2020-Q1", "value": 100.0, "building_count": 45000 },
    { "period": "2020-Q2", "value": 98.2, "building_count": 46200 },
    { "period": "2025-Q4", "value": 87.4, "building_count": 112000 }
  ]
}

Parquet schema

ColumnTypeDescription
periodstringQuarter identifier (YYYY-QN)
valuefloat64Index value (base = 100.0)
building_countint64Number of buildings in the calculation
index_typestringMetric (eui_kwh_m2, carbon_kgco2e_m2, water_litres_m2)

Schema: Anonymised building-level data pack

JSON

{
  "product": "anonymised_buildings",
  "period": "2025-Q4",
  "records": [
    {
      "record_id": "anon_00001",
      "use_type": "office",
      "country": "US",
      "climate_zone": "4A",
      "gfa_band": "25000-50000",
      "year_built_band": "2000-2009",
      "eui_kwh_m2": 195.3,
      "carbon_kgco2e_m2": 67.8,
      "water_litres_m2": 540,
      "sdx_score": 72,
      "data_quality_grade": "A",
      "crrem_stranding_band": "2035-2040"
    }
  ]
}

CSV columns

ColumnTypeDescription
record_idstringAnonymised unique identifier
use_typestringBuilding use type
countrystringISO 3166-1 alpha-2
climate_zonestringASHRAE climate zone
gfa_bandstringFloor area band in m²
year_built_bandstringConstruction decade band
eui_kwh_m2floatEnergy Use Intensity
carbon_kgco2e_m2floatCarbon intensity
water_litres_m2floatWater use intensity
sdx_scoreintegerSDX benchmark percentile
data_quality_gradestringLetter grade (A–F)
crrem_stranding_bandstringProjected stranding year range

Versioning

Data schemas are versioned. The schema version is included in the file metadata (JSON root object, CSV filename, Parquet file metadata). Breaking schema changes are communicated at least 90 days in advance.

Next steps