Note

Download this example as a Jupyter notebook here: https://github.com/pypsa/atlite/examples/historic-comparison-germany.ipynb

Historic comparison PV and wind (Germany)

In this example we are examining the ranalysed power feed-ins calculated by atlite. Based on data for capacity distributions and weather during the year 2013, we try to match historical statistics.

Importing packages

[1]:
import atlite
import xarray as xr
import pandas as pd
import scipy.sparse as sp
import numpy as np

import pgeocode
from collections import OrderedDict
Disable urllib3 and cdsapi warnings.
[2]:
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set_style('whitegrid')

Download Necessary Data Files

  1. We need to download the locations of all the PV installations in Germany to later tell atlite where to setup the PV panels and with which capacity. This information is available in Germany (thanks to the EEG feed-in-tariffs in the so-called “Anlagenregister”).

  2. We also download a reference time-series to compare our results against later. We retrieve the data from https://open-power-system-data.org which in return gets it from ENTSO-E.

  3. Finally we also download a cutout of weather data from the ERA5 dataset containing Germany and the year we want to examine (2012).

[3]:
import requests
import os
import zipfile

def download_file(url, local_filename):
    # variant of http://stackoverflow.com/a/16696317
    if not os.path.exists(local_filename):
        r = requests.get(url, stream=True)
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
    return local_filename

Reference time-series

The reference is downloaded from Open Power System Data (OPSD) and contains data reported by the TSOs and DSOs.

[4]:
opsd_fn = download_file('https://data.open-power-system-data.org/index.php?package=time_series&version=2019-06-05&action=customDownload&resource=3&filter%5B_contentfilter_cet_cest_timestamp%5D%5Bfrom%5D=2012-01-01&filter%5B_contentfilter_cet_cest_timestamp%5D%5Bto%5D=2013-05-01&filter%5BRegion%5D%5B%5D=DE&filter%5BVariable%5D%5B%5D=solar_generation_actual&filter%5BVariable%5D%5B%5D=wind_generation_actual&downloadCSV=Download+CSV',
                        'time_series_60min_singleindex_filtered.csv')
[5]:
opsd = pd.read_csv(opsd_fn, parse_dates=True, index_col=0)

# we later use the (in current version) timezone unaware datetime64
# to work together with this format, we have to remove the timezone
# timezone information. We are working with UTC everywhere.

opsd.index = opsd.index.tz_convert(None)

# We are only interested in the 2012 data
opsd = opsd[("2011" < opsd.index) & (opsd.index < "2013")]

PV locations (“Anlagenregister”)

Download and unzip the archive containing all reported PV installations in Germany in 2011 from energymap.info.

[6]:
eeg_fn = download_file('http://www.energymap.info/download/eeg_anlagenregister_2015.08.utf8.csv.zip',
                        'eeg_anlagenregister_2015.08.utf8.csv.zip')

with zipfile.ZipFile(eeg_fn, "r") as zip_ref:
    zip_ref.extract("eeg_anlagenregister_2015.08.utf8.csv")

Create a Cutout from ERA5

Load the country shape for Germany and determine its geographic bounds for downloading the appropriate cutout from ECMWF’s ERA5 data set.

[7]:
import cartopy.io.shapereader as shpreader
import geopandas as gpd
shp = shpreader.Reader(shpreader.natural_earth(resolution='10m', category='cultural', name='admin_0_countries'))
de_record = list(filter(lambda c: c.attributes['ISO_A2'] == 'DE', shp.records()))[0]
de = gpd.GeoSeries({**de_record.attributes, 'geometry':de_record.geometry})
x1, y1, x2, y2 = de['geometry'].bounds
/home/hofmann/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: FutureWarning:     You are passing non-geometry data to the GeoSeries constructor. Currently,
    it falls back to returning a pandas Series. But in the future, we will start
    to raise a TypeError instead.
  """
[8]:
cutout = atlite.Cutout('germany-2012',
                       module='era5',
                       x=slice(x1-.2,x2+.2), y=slice(y1-.2, y2+.2),
                       chunks={'time':100},
                       time="2012")
[9]:
cutout.prepare()
2020-06-05 12:47:57,449 INFO Downloading data for feature 'wind' to /tmp/tmpjqqv92gm.
[########################################] | 100% Completed | 29.1s
2020-06-05 12:48:27,038 INFO Downloading data for feature 'influx' to /tmp/tmpjqqv92gm.
[########################################] | 100% Completed | 34.0s
2020-06-05 12:49:01,687 INFO Downloading data for feature 'temperature' to /tmp/tmpjqqv92gm.
[########################################] | 100% Completed | 32.4s
2020-06-05 12:49:34,469 INFO Downloading data for feature 'runoff' to /tmp/tmpjqqv92gm.
[########################################] | 100% Completed | 29.6s
[########################################] | 100% Completed | 18.3s
[9]:
<Cutout "germany-2012">
 x = 5.75 ⟷ 15.00, dx = 0.25
 y = 47.25 ⟷ 55.25, dy = 0.25
 time = 2012-01-01 ⟷ 2012-12-31, dt = H
 prepared_features = ['height', 'wind', 'influx', 'temperature', 'runoff']

Downloading the cutout can take a few seconds or even an hour, depending on your internet connection and whether the dataset was recently requested from the data set provider (and is thus cached on their premise). For us this took ~2 minutes the first time. Preparing it again (a second time) is snappy (for whatever reason you would want to download the same cutout twice).

Creating time-series

Generate capacity layout

The capacity layout represents the installed generation capacities in MW in each of the cutout’s grid cells. For this example we have generation capacities in kW on a postal code (and partially even more detailed) level. Using the function below, we load the data, fill in geocoordinates where missing for all capacities and then dissolve them to the grid raster provided by the cutout. The dissolving is done by aggregating the capacities to their respective closest grid cell center obtained from the cutout.grid_coordinates().

[10]:
def capacity_layout(cutout, typ, cap_range=None, until=None):
    """Aggregate selected capacities to the cutouts grid into a capacity layout.

    Parameters
    ----------
        cutout : atlite.cutout
            The cutout for which the capacity layout is contructed.
        typ : str
            Type of energy source, e.g. "Solarstrom" (PV), "Windenergie" (wind).
        cap_range : (optional) list-like
            Two entries, limiting the lower and upper range of capacities (in kW)
            to include. Left-inclusive, right-exclusive.
        until : str
            String representation of a datetime object understood by pandas.to_datetime()
            for limiting to installations existing until this datetime.

    """

    # Load locations of installed capacities and remove incomplete entries
    cols = OrderedDict((('installation_date', 0),
                        ('plz', 2), ('city', 3),
                        ('type', 6),
                        ('capacity', 8), ('level', 9),
                        ('lat', 19), ('lon', 20),
                        ('validation', 22)))
    database = pd.read_csv('eeg_anlagenregister_2015.08.utf8.csv',
                       sep=';', decimal=',', thousands='.',
                       comment='#', header=None,
                       usecols=list(cols.values()),
                       names=list(cols.keys()),
                       # German postal codes can start with '0' so we need to treat them as str
                       dtype={'plz':str},
                       parse_dates=['installation_date'],
                       na_values=('O04WF', 'keine'))

    database = database[(database['validation'] == 'OK') & (database['plz'].notna())]

    # Query postal codes <-> coordinates mapping
    de_nomi = pgeocode.Nominatim('de')
    plz_coords = de_nomi.query_postal_code(database['plz'].unique())
    plz_coords = plz_coords.set_index('postal_code')

    # Fill missing lat / lon using postal codes entries
    database.loc[database['lat'].isna(), 'lat'] = database['plz'].map(plz_coords['latitude'])
    database.loc[database['lon'].isna(), 'lon'] = database['plz'].map(plz_coords['longitude'])

    # Ignore all locations which have not be determined yet
    database = database[database['lat'].notna() & database['lon'].notna()]

    # Select data based on type (i.e. solar/PV, wind, ...)
    data = database[database['type'] == typ].copy()

    # Optional: Select based on installation day
    if until is not None:
        data = data[data['installation_date'] < pd.to_datetime(until)]

    # Optional: Only installations within this caprange (left inclusive, right exclusive)
    if cap_range is not None:
        data = data[(cap_range[0] <= data['capacity']) & (data['capacity'] < cap_range[1])]

    # Determine nearest cells from cutout
    cells = gpd.GeoDataFrame({'geometry': cutout.grid_cells,
                              'lon': cutout.grid_coordinates()[:,0],
                              'lat': cutout.grid_coordinates()[:,1]})

    nearest_cell = cutout.data.sel({'x': data.lon.values,
                                    'y': data.lat.values},
                                   'nearest').coords

    # Map capacities to closest cell coordinate
    data['lon'] = nearest_cell.get('lon').values
    data['lat'] = nearest_cell.get('lat').values

    new_data = data.merge(cells, how='inner')

    # Sum capacities for each grid cell (lat, lon)
    # then: restore lat lon as coumns
    # then: rename and reindex to match cutout coordinates
    new_data = new_data.groupby(['lat','lon']).sum()

    layout = new_data.reset_index().rename(columns={'lat':'y','lon':'x'})\
                        .set_index(['y','x']).capacity\
                        .to_xarray().reindex_like(cutout.data)

    layout = (layout/1e3).fillna(.0).rename('Installed Capacity [MW]')

    return layout

Examine Solar Feed-Ins

The layout defines the production capacity per grid cell. Let’s see how it looked like in 2013.

[11]:
solar_layout = capacity_layout(cutout, 'Solarstrom', until="2012")

solar_layout.plot(cmap="inferno_r", size=8, aspect=1)
plt.title("Installed PV in Germany until 2012")
plt.tight_layout()
../_images/examples_historic-comparison-germany_22_0.png

What did the total production of this capacity distribution looked like? We pass the layout to the conversion funtion cutout.pv. This calculates the total production over the year.

[12]:
pv = cutout.pv(panel="CSi", orientation={'slope': 30., 'azimuth': 0.}, layout=solar_layout)
2020-06-05 12:50:34,975 INFO Convert and aggregate 'pv'.
[########################################] | 100% Completed |  5.4s

As OPSD also provides data on the total production, let’s compare those two.

[13]:
compare = pd.DataFrame(dict(atlite=pv.to_pandas(), opsd=opsd['DE_solar_generation_actual'])) /1e3 # in GW
compare.resample('1W').mean().plot(figsize=(8,5))
plt.ylabel("Feed-In [GW]")
plt.title('PV time-series Germany 2012')
plt.tight_layout()
2020-06-05 12:50:40,818 INFO Note: NumExpr detected 24 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-06-05 12:50:40,820 INFO NumExpr defaulting to 8 threads.
../_images/examples_historic-comparison-germany_26_1.png

Looks like atlite is underestimating the power feed-ins. This might be caused by conservative technological assumptions (slope, panel).

For example when we set an optimal slope of the panels. The production per week looks much better

[14]:
pv_opt = cutout.pv(panel="CSi", orientation='latitude_optimal', layout=solar_layout)
compare_opt = pd.DataFrame(dict(atlite=pv_opt.to_pandas(), opsd=opsd['DE_solar_generation_actual']))/1e3 # in GW
compare_opt.resample('1W').mean().plot(figsize=(8,5))
plt.ylabel("Feed-In [GW]")
plt.title('PV time-series Germany 2012')
plt.tight_layout()
2020-06-05 12:50:41,368 INFO Convert and aggregate 'pv'.
[########################################] | 100% Completed |  2.7s
../_images/examples_historic-comparison-germany_28_2.png

Looks better :) How about zooming in? Let’s plot a specific week. We see that the peaks are differing a bit. In this case atlite alternates between over and underestimating a bit.

But for a reanalysis, not too bad…

[15]:
compare_opt.loc["2012-04"].plot(figsize=(10,6))
plt.ylabel('Feed-in [GW]')
plt.title('PV time-series Germany April 2012')
plt.tight_layout()
../_images/examples_historic-comparison-germany_30_0.png
[16]:
fig, (ax1, ax2) = plt.subplots(1,2, subplot_kw={'aspect': 'equal', 'xlim':[0,20]}, figsize=(12,16), sharey=True)
compare.plot(x='atlite', y='opsd', kind='scatter', s=1, ax=ax1, title='Slope 30°')
compare_opt.plot(x='atlite', y='opsd', kind='scatter', s=1, ax=ax2, title='Slope Optimal')

ax1.plot([0,20],[0,20], c='gray')
ax2.plot([0,20],[0,20], c='gray')
plt.tight_layout()
../_images/examples_historic-comparison-germany_31_0.png

Whereas in the spatial aggregation the optimal slope performs better, on a detailed level the spread is very high. Thus the steady slope of 30° might be a saver choice knowing that this tends to underestimate.

Let’s look at the duration curves of the 30° slope pv timeseries.

[17]:
compare['atlite'].sort_values(ascending=False).reset_index(drop=True).plot(figsize=(10,6))
compare['opsd'].sort_values(ascending=False).reset_index(drop=True).plot(figsize=(10,6))
plt.legend()
plt.title('Duration curves')
plt.tight_layout()
../_images/examples_historic-comparison-germany_33_0.png

Examine Wind Feed-Ins

Now we want to examine the wind potentials in Germany for year 2013.

These wind turbines are available in atlite.

[18]:
for t in atlite.windturbines: print(f'* {t}')
* Vestas_V66_1750kW
* Enercon_E101_3000kW
* Vestas_V164_7MW_offshore
* Siemens_SWT_107_3600kW
* Suzlon_S82_1.5_MW
* Enercon_E126_7500kW
* Vestas_V112_3MW_offshore
* Vestas_V80_2MW_gridstreamer
* Bonus_B1000_1000kW
* NREL_ReferenceTurbine_5MW_offshore
* Vestas_V112_3MW
* Vestas_V90_3MW
* Vestas_V47_660kW
* Enercon_E82_3000kW
* Vestas_V25_200kW
* Siemens_SWT_2300kW

We define capacity range to roughly match the wind turbine type.

[19]:
turbine_categories = [
        dict(name='Vestas_V25_200kW', up=400.),
        dict(name='Vestas_V47_660kW', up=700.),
        dict(name='Bonus_B1000_1000kW', up=1100.),
        dict(name='Suzlon_S82_1.5_MW', up=1600.),
        dict(name='Vestas_V66_1750kW', up=1900.),
        dict(name='Vestas_V80_2MW_gridstreamer', up=2200.),
        dict(name='Siemens_SWT_2300kW', up=2500.),
        dict(name='Vestas_V90_3MW', up=50000.)
    ]

[20]:
low = 0
for index, turbine_cat in enumerate(turbine_categories):

    layout = capacity_layout(cutout, 'Windkraft',
                             cap_range=[low,turbine_cat['up']],
                             until="2012")

    turbine_categories[index]['layout'] = layout
    low = turbine_cat['up']

We create a layout for each capacity range, each with a different windturbine model

[21]:
wind = xr.Dataset()
for turbine_cat in turbine_categories:
    name = f"< {turbine_cat['up']} kW"
    wind[name] = cutout.wind(turbine=turbine_cat['name'], layout=turbine_cat['layout'], show_progress=False)

wind['total'] = sum(wind[c] for c in wind)
wind
2020-06-05 12:51:17,460 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:17,703 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:17,946 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:18,188 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:18,452 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:18,739 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:19,037 INFO Convert and aggregate 'wind'.
2020-06-05 12:51:19,361 INFO Convert and aggregate 'wind'.
[21]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • time: 8784
    • time
      (time)
      datetime64[ns]
      2012-01-01 ... 2012-12-31T23:00:00
      long_name :
      time
      array(['2012-01-01T00:00:00.000000000', '2012-01-01T01:00:00.000000000',
             '2012-01-01T02:00:00.000000000', ..., '2012-12-31T21:00:00.000000000',
             '2012-12-31T22:00:00.000000000', '2012-12-31T23:00:00.000000000'],
            dtype='datetime64[ns]')
    • < 400.0 kW
      (time)
      float64
      14.37 17.37 21.26 ... 74.27 72.48
      array([14.36759232, 17.3721168 , 21.25811295, ..., 76.244886  ,
             74.27408618, 72.48494836])
    • < 700.0 kW
      (time)
      float64
      260.0 291.3 ... 1.277e+03 1.271e+03
      array([ 259.97473704,  291.27601501,  344.13749336, ..., 1289.58766627,
             1276.84554856, 1270.75338076])
    • < 1100.0 kW
      (time)
      float64
      175.9 191.8 213.9 ... 829.3 821.9
      array([175.92730143, 191.75951184, 213.85818501, ..., 846.24474466,
             829.33067332, 821.88816193])
    • < 1600.0 kW
      (time)
      float64
      1.125e+03 1.208e+03 ... 4.563e+03
      array([1124.73229295, 1208.41198207, 1352.92136527, ..., 4494.04929787,
             4556.66911555, 4563.42047683])
    • < 1900.0 kW
      (time)
      float64
      322.0 356.2 ... 1.739e+03 1.73e+03
      array([ 321.96966276,  356.20420308,  411.24177172, ..., 1754.01146352,
             1739.32602448, 1730.08824679])
    • < 2200.0 kW
      (time)
      float64
      991.7 1.05e+03 ... 4.819e+03
      array([ 991.69469246, 1049.57301962, 1187.1192299 , ..., 4723.87272773,
             4803.65267905, 4818.76751494])
    • < 2500.0 kW
      (time)
      float64
      545.8 599.2 ... 1.897e+03 1.891e+03
      array([ 545.76049149,  599.21099479,  703.22354969, ..., 1879.43665915,
             1896.83608815, 1890.68363627])
    • < 50000.0 kW
      (time)
      float64
      269.6 298.4 ... 1.023e+03 1.006e+03
      array([ 269.62982404,  298.37592598,  337.07574267, ..., 1028.64973001,
             1023.13628608, 1005.88721901])
    • total
      (time)
      float64
      3.704e+03 4.012e+03 ... 1.617e+04
      array([ 3704.0565945 ,  4012.1837692 ,  4570.83545057, ...,
             16092.09717521, 16200.07050139, 16173.97358488])

Again, let’s compare the result with the feed-in statistics from OPSD. We add a extra column for wind turbines with capacity lower than 1600 kW

[22]:
compare = pd.DataFrame({"atlite":wind['total'].to_pandas(),
                            "< 1600 kW":wind['< 1600.0 kW'].to_pandas(),
                            "opsd":opsd['DE_wind_generation_actual']})

compare = compare/1e3 # in GW
[23]:
compare.resample('1W').mean().plot(figsize=(10,6))
plt.ylabel('Feed-in [GW]')
plt.title('Wind time-series Germany 2012')
plt.tight_layout()
../_images/examples_historic-comparison-germany_43_0.png
[24]:
compare.loc["2012-04"].plot(figsize=(10,6))
plt.ylabel('Feed-in [GW]')
plt.title('Wind time-series Germany April 2012')
plt.tight_layout()
../_images/examples_historic-comparison-germany_44_0.png
[25]:
ax = compare.plot(x='atlite', y='opsd', kind='scatter', figsize=(12,8))
ax.set_aspect('equal')
ax.set_xlim(0,30)
ax.plot([0,30],[0,30],c='gray')
plt.tight_layout()
../_images/examples_historic-comparison-germany_45_0.png
[26]:
compare['atlite'].sort_values(ascending=False).reset_index(drop=True).plot(figsize=(10,6))
compare['opsd'].sort_values(ascending=False).reset_index(drop=True).plot(figsize=(10,6))
plt.legend()
plt.title('Duration curves')
plt.ylabel('Wind Feed-in [GW]')
plt.xlabel('Accumulated hours')
plt.tight_layout()
../_images/examples_historic-comparison-germany_46_0.png

Looks quite aggreeable!

Splitting time-series into shapes

The generation time-series can also be aggregated based on shapes. In this example, we aggregate on the basis of the German “Laender” (federal states).

[27]:
shp = shpreader.Reader(shpreader.natural_earth(resolution='10m',
                                               category='cultural',
                                               name='admin_1_states_provinces'))
de_records = list(filter(lambda r: r.attributes['iso_3166_2'].startswith('DE'), shp.records()))
laender = gpd.GeoDataFrame([{**r.attributes, 'geometry':r.geometry} for r in de_records])\
             .rename(columns={"iso_3166_2":"state"}).set_index("state")
[28]:
x1, y1, x2, y2 = de['geometry'].bounds
[29]:
print(type(laender.loc['DE-TH'].geometry))
laender.loc['DE-TH']
<class 'shapely.geometry.polygon.Polygon'>
[29]:
featurecla                                   Admin-1 scale rank
scalerank                                                     3
adm1_code                                              DEU-1577
diss_me                                                    1577
wikipedia                                                  None
                                    ...
name_tr                                              Türingiya
name_vi                                              Thüringen
name_zh                                               图林根
ne_id                                                1159310983
geometry      POLYGON ((9.949863314740185 51.30301870404992,...
Name: DE-TH, Length: 83, dtype: object
[30]:
pv = cutout.pv(panel="CSi", orientation={'slope': 30., 'azimuth': 0.},
               shapes=laender['geometry'],
               layout=solar_layout)
2020-06-05 12:51:22,198 INFO Convert and aggregate 'pv'.
[########################################] | 100% Completed |  2.4s
[31]:
pv.sel(state=['DE-TH', 'DE-BY']).to_pandas().resample('1W').mean().plot(figsize=(10,6))
[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb24ea1810>
../_images/examples_historic-comparison-germany_54_1.png