Data Formats

SDX licensed content is available in three formats. Choose the format that best fits your data infrastructure.

Supported formats

Format	Best for	Compression
JSON	APIs, web applications, programmatic consumption	gzip
CSV	Spreadsheets, SQL imports, simple analysis	gzip
Parquet	Data warehouses, big data pipelines, analytics	Snappy (built-in)

All formats contain the same data — only the encoding differs.

Schema: Benchmark subscription

JSON

{
  "product": "sdx_market_benchmark",
  "period": "2025-Q4",
  "generated_at": "2026-01-15T00:00:00Z",
  "segments": [
    {
      "use_type": "office",
      "geography": "US",
      "size_band": "large",
      "building_count": 3842,
      "metrics": {
        "eui_kwh_m2": {
          "min": 62.1,
          "p25": 142.3,
          "median": 189.7,
          "p75": 248.5,
          "max": 520.0
        },
        "carbon_kgco2e_m2": {
          "min": 18.4,
          "p25": 48.2,
          "median": 65.1,
          "p75": 89.3,
          "max": 198.0
        },
        "water_litres_m2": {
          "min": 120,
          "p25": 380,
          "median": 520,
          "p75": 710,
          "max": 1850
        }
      }
    }
  ]
}

CSV

use_type,geography,size_band,building_count,eui_min,eui_p25,eui_median,eui_p75,eui_max,carbon_min,carbon_p25,carbon_median,carbon_p75,carbon_max
office,US,large,3842,62.1,142.3,189.7,248.5,520.0,18.4,48.2,65.1,89.3,198.0

The CSV header row is always present. Fields are comma-separated, strings are quoted where they contain commas.

Schema: Market index

JSON

{
  "product": "sdx_global_index",
  "index_type": "eui_kwh_m2",
  "base_period": "2020-Q1",
  "base_value": 100.0,
  "data_points": [
    { "period": "2020-Q1", "value": 100.0, "building_count": 45000 },
    { "period": "2020-Q2", "value": 98.2, "building_count": 46200 },
    { "period": "2025-Q4", "value": 87.4, "building_count": 112000 }
  ]
}

Parquet schema

Column	Type	Description
`period`	string	Quarter identifier (YYYY-QN)
`value`	float64	Index value (base = 100.0)
`building_count`	int64	Number of buildings in the calculation
`index_type`	string	Metric (eui_kwh_m2, carbon_kgco2e_m2, water_litres_m2)

Schema: Anonymised building-level data pack

JSON

{
  "product": "anonymised_buildings",
  "period": "2025-Q4",
  "records": [
    {
      "record_id": "anon_00001",
      "use_type": "office",
      "country": "US",
      "climate_zone": "4A",
      "gfa_band": "25000-50000",
      "year_built_band": "2000-2009",
      "eui_kwh_m2": 195.3,
      "carbon_kgco2e_m2": 67.8,
      "water_litres_m2": 540,
      "sdx_score": 72,
      "data_quality_grade": "A",
      "crrem_stranding_band": "2035-2040"
    }
  ]
}

CSV columns

Column	Type	Description
`record_id`	string	Anonymised unique identifier
`use_type`	string	Building use type
`country`	string	ISO 3166-1 alpha-2
`climate_zone`	string	ASHRAE climate zone
`gfa_band`	string	Floor area band in m²
`year_built_band`	string	Construction decade band
`eui_kwh_m2`	float	Energy Use Intensity
`carbon_kgco2e_m2`	float	Carbon intensity
`water_litres_m2`	float	Water use intensity
`sdx_score`	integer	SDX benchmark percentile
`data_quality_grade`	string	Letter grade (A–F)
`crrem_stranding_band`	string	Projected stranding year range

Data schemas are versioned. The schema version is included in the file metadata (JSON root object, CSV filename, Parquet file metadata). Breaking schema changes are communicated at least 90 days in advance.

Data Formats

Supported formats

Schema: Benchmark subscription

JSON

CSV

Schema: Market index

JSON

Parquet schema

Schema: Anonymised building-level data pack

JSON

CSV columns

Versioning

Next steps