A Unified Framework for Representative Subset Selection in Multi-Variate Time-Series Spaces

Unified Framework Fully Modular Protocol-Based

Decompose any time-series aggregation method into five interchangeable components. Pick one implementation per pillar, wire them together, and run.

Why this package?

Energy system models, capacity expansion studies, and other time-series-heavy applications often need to reduce a full year of hourly data to a small set of representative periods — days, weeks, or months — without losing what matters.

The literature offers many methods (k-means, k-medoids, MILP-based selection, genetic algorithms, etc.), but the landscape is dense and tangled: each method bundles multiple decisions — how to represent data, what to optimize, how to search — into a single procedure, making it hard to see which choices matter, compare approaches on equal footing, or adapt a method to your specific problem.

A Unified Framework

Decomposes any time-series aggregation method into five interchangeable components. Every established methodology is a specific instantiation of this structure. The framework provides a common language for describing, comparing, and assembling methods.

Read the framework paper

A Modular Python Package

Implements this framework as a library of composable, protocol-based modules. You pick one implementation per component, wire them together, and run. Adding a new algorithm or score metric means implementing a single protocol — everything else stays the same.

Browse all modules

The Five Components

Any time-series aggregation method decomposes into these five pillars. Mix and match implementations to build the exact workflow you need.

F

Feature Space

How raw time-series are transformed into comparable representations

O

Objective

How candidate selections are scored for quality

S

Selection Space

What is being selected (historical subsets, synthetic archetypes, etc.)

R

Representation Model

How selected periods represent the full dataset

A

Search Algorithm

The engine that finds optimal selections

How It Works

Define your problem, pick one implementation per component, wire them into a workflow, and run. The result contains the selected periods, their weights, and all objective scores.

Quick Start Full walkthrough →

Implemented Modules

A growing library of composable components. Each implements a protocol — swap any module without touching the rest of your pipeline.

Feature Engineering

StandardStatsFeatureEngineer PCAFeatureEngineer FeaturePipeline

Transform raw time-series into statistical summaries, PCA projections, or chained multi-step feature pipelines.

Score Components

WassersteinFidelity CorrelationFidelity DurationCurveFidelity DiversityReward ...

Evaluate candidate selections on distribution similarity, correlation preservation, diversity, and more. 10 components and counting.

Combination Generators

ExhaustiveCombiGen GroupQuotaCombiGen HierarchicalCombiGen

Define the selection space: enumerate all k-of-n combinations, enforce group quotas (e.g. one per season), or compose hierarchically.

Representation Models

UniformRepresentationModel KMedoidsClustersizeRepresentation BlendedRepresentationModel

Assign responsibility weights: equal 1/k, cluster-membership proportional, or soft blended assignment where every period contributes.

Search Algorithms

ObjectiveDrivenCombinatorialSearchAlgorithm
WeightedSumPolicy ParetoMaxMinStrategy

Generate-and-test search with pluggable selection policies: weighted sum, Pareto front, or custom strategies.

Diagnostics

ResponsibilityBars FeatureRadar ScoreHeatmap

Interactive Plotly visualizations for feature distributions, score comparisons, and responsibility weight analysis.

Part of the MESQUAL Family

energy-repset is a standalone library within the MESQUAL family — a collection of open-source tools for energy system analysis, scenario comparison, and modeling support.