Here, we first describe how to load provided time-series input data or your own time-series input data as ClustData. We second describe how to aggregate the loaded time-series input data.

Provided Data

load_timeseries_data_provided() loads the data for a given region for which data is provided in this package. The optional input parameters to load_timeseries_data_provided() are the number of time steps per period T and the years to be imported.

    load_timeseries_data_provided(region::String="GER_1"; T::Int=24, years::Array{Int,1}=[2016], att::Array{String,1}=Array{String,1}())
  • Adding the information in the *.csv file at data_path to the data dictionary

The *.csv files shall have the following structure and must have the same length:

TimestampYear[column names...]

The first column should be called Timestamp if it contains a time iterator The other columns can specify the single timeseries like specific geolocation. for regions:

  • "GER_1": Germany 1 node
  • "GER_18": Germany 18 nodes
  • "CA_1": California 1 node
  • "CA_14": California 14 nodes
  • "TX_1": Texas 1 node

Your Own Data

The keys of {your-time-series}.data have to match "{time_series (as declared in techs.csv)}-{node}"

function load_timeseries_data(data_path::String;

Return all time series as ClustData struct that are stored as csv files in the specified path.

  • Loads *.csv files in the folder or the file data_path
  • Loads all attributes (all *.csv files) if the att-Array is empty or only the files specified in att
  • The *.csv files shall have the following structure and must have the same length:
TimestampYear[column names...]
  • The first column of a .csv file should be called Timestamp if it contains a time iterator
  • The second column should be called Year and contains the corresponding year
  • Each other column should contain the time series data. For one node systems, only one column is used; for an N-node system, N columns need to be used. In an N-node system, each column specifies time series data at a specific geolocation.
  • Returns time series as ClustData struct
  • The .data field of the ClustData struct is a Dictionary where each column in [file name].csv file is the key (called "[file name]-[column name]"). file name should correspond to the attribute name, and column name should correspond to the node name.

Optional inputs to load_timeseries_data:

  • region-region descriptor
  • T- Number of Segments
  • years::Array{Int,1}= The years to be selected from the csv file as specified in years column
  • att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.
function load_timeseries_data(existing_data::Symbol;

Return time series of example data sets as ClustData struct.

The choice of example data set is given by e.g. existing_data=:CEP-GER1. Example data sets are:

  • :DAM_CA : Hourly Day Ahead Market Electricity prices for California-Stanford 2015
  • :DAM_GER : Hourly Day Ahead Market Electricity prices for Germany 2015
  • :CEP_GER1 : Hourly Wind, Solar, Demand data Germany one node
  • :CEP_GER18: Hourly Wind, Solar, Demand data Germany 18 nodes

Optional inputs to load_timeseries_data:

  • region-region descriptor
  • T- Number of Segments
  • years::Array{Int,1}= The years to be selected from the csv file as specified in years column
  • att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.


Time series aggregation can be applied to reduce the temporal dimension while (if done problem-specific correctly) keeping output precise. Aggregation methods are explained in TimeSeriesClustering High encouragement to run a second stage validation step if you use aggregation on your model. Second stage operational validation step


Loading time series data

using CapacityExpansion
# load ts-input-data
ts_input_data = load_timeseries_data_provided(state; T=24, years=[2016])
using Plots
plot(["solar-germany"], legend=false, linestyle=:dot, xlabel="Time [h]", ylabel="Solar availability factor [%]")


Aggregating time series data

ts_clust_data = run_clust(ts_input_data;method="kmeans",representation="centroid",n_init=50,n_clust=5).clust_data
plot(["solar-germany"], legend=false, linestyle=:solid, width=3, xlabel="Time [h]", ylabel="Solar availability factor [%]")
