PlantData and PlantMetaData Schema#
openoa.plant.PlantData
and openoa.schema.PlantMetaData
are the core data
classes used to contain all data relevant to a wind plant and is used throughout OpenOA.
PlantData
holds multiple Pandas data frames, each with a specified schema.
You can take advantage of the data structures in the plant
module by creating it using one
of the available constructors.
A quick reference for the required data for any analysis type or generally the types and units
of the data that openoa.plant.PlantData
expects can be found in the repository at
OpenOA/openoa/schema/ in any of the
JSON or YAML files, with the README giving a brief overview of each.
Additionally, PlantData
requires a metadata specification, as provided
through the openoa.schema.PlantMetaData
class, which enable a series of data validations
that run at initialization. Optionally, this can be re-run later using
openoa.plant.PlantData.validate()
. Specifically, using the new
PlantMetaData
structure, a user can map the column names already present
in their data to those that OpenOA will use internally, set the expected frequency of the their
time-dependent data, and check the expected units and datatypes that the data should use. These
configurations can be set in either a dictionary, or a metadata file using a JSON or YAML data
format, whichever is preferable to the user. In the examples, the file “examples/data/plant_meta.yml”
or “examples/data/plant_meta.json” are used interchangeably, and can be used as a guide.
Using the metadata configurations specified in a metadata file (or dictionary), an PlantData
object can be created as follows, where “X_df” represents a pandas DataFrame containing data
for a specific data type. Alternatively, these DataFrame arguments can be replaced by file paths to
csv files where the data are saved.:
plant = PlantData(
analysis_type=None, # List of analysis methods for which the data will be validated
metadata="{path_to_metadata_file}/plant_meta.yml",
scada=scada_df,
meter=meter_df,
curtail=curtail_df,
asset=asset_df,
reanalysis=reanalysis_dict,
)
The following sections will show how each of the data should be configured, and where to check for
these settings in the code itself. It should be noted that neither the meta data class
dtypes
(where “X” represents a specific data type), nor the meta data units
,
can be set manually, or updated as they are exclusively for reference to users.
Each of the meta data classes accept the inputs of the elements under the column “Field Name” in
the following subsections, in addition to the frequency (freq
) for time-dependent inputs. All
other attributes of the metadata classes for user reference, and therefore immutable. After setting
each of the inputs, users can access the dictionary elements col_map
, dtypes
, and units
to
work with the various mappings. Below, is a demonstration of this mapping in practice, showing the
SCADA data mapping used in “examples/data/plant_meta.yml”, where the keys are the OpenOA column names,
and the values are the La Haute Borne data naming conventions. This mapping can be repeated for each
of the other metadata types.
37scada:
38 frequency: 10min # timestamp frequency
39 asset_id: Wind_turbine_name # Unique ID of wind turbine
40 WROT_BlPthAngVal: Ba_avg # pitch angle, degrees
41 WTUR_W: P_avg # power produced, kW
42 WMET_EnvTmp: Ot_avg # temperature, C
43 time: Date_time # timestamps
44 WMET_HorWdDir: Wa_avg # wind direction, degrees
45 WMET_HorWdDirRel: Va_avg # wind direction relative to nacelle orientation, degrees
Data Schema User Guide#
The following subsections will demonstrate the required data mapping schemas to enable
PlantData
to validate and and convert user-specified data to a validated
OpenOA schema for use throughout the codebase. The data columns and their associated units and
datatypes will be shown in a table, followed by a demonstration of how this is used in the La Haute
Borne example data used for all of the example analysis workflows. It should be noted that the
column “Field Name” is the internal naming convention, and should be the dictionary or JSON/YAML
key with the actual column naming as its associated value (as is seen in the YAML snippets for each
section).
It should be noted though, that validating a PlantData
object with
analysis_type
= “all” will check for all of the field names listed below for all provided
data. However, if the PlantData
object is only being validated for a specific analysis
type, or types, then only the data specified in
openoa.plant.ANALYSIS_REQUIREMENTS
(shown below) will be checked, and in the case of
analysis_type
= None, then no errors will be raised during validation.
Additionally, some analysis types have modified uses, which mean the following:
MonteCarloAEP-temp
adds in the reanalysis temperature data for the long term correction.MonteCarloAEP-wd
adds in the reanalysis wind direction data for the long term correction.MonteCarloAEP-temp
adds in the reanalysis temperature and wind direction data for the long term correction.WakeLosses-scada
uses the wind speed and direction data from the SCADA dataWakeLosses-tower
uses the wind speed and direction data from the met tower data
1from pprint import pprint
2from openoa.schema.metadata import ANALYSIS_REQUIREMENTS
3
4# The valid analysis_type inputs
5print("Valid `analyis_type`s with OpenOA-provided schema")
6for analysis in sorted(ANALYSIS_REQUIREMENTS):
7 print(analysis)
8
9# An example of the contents for required columns from different data sources and their frequencies
10print()
11print(
12 "Requirements for the modified MonteCarloAEP analysis using renalysis\n"
13 "temperature as an additonal variable:"
14)
15pprint(ANALYSIS_REQUIREMENTS["MonteCarloAEP-temp"])
Valid `analyis_type`s with OpenOA-provided schema
ElectricalLosses
MonteCarloAEP
MonteCarloAEP-temp
MonteCarloAEP-temp-wd
MonteCarloAEP-wd
StaticYawMisalignment
TurbineLongTermGrossEnergy
WakeLosses-scada
WakeLosses-tower
Requirements for the modified MonteCarloAEP analysis using renalysis
temperature as an additonal variable:
{'curtail': {'columns': ['IAVL_DnWh', 'IAVL_ExtPwrDnWh'],
'freq': ('MS', 'ME', 'W', 'D', 'h', 'min', 's', 'ms', 'us', 'ns')},
'meter': {'columns': ['MMTR_SupWh'],
'freq': ('MS', 'ME', 'W', 'D', 'h', 'min', 's', 'ms', 'us', 'ns')},
'reanalysis': {'columns': ['WMETR_HorWdSpd', 'WMETR_AirDen', 'WMETR_EnvTmp'],
'freq': ('MS',
'ME',
'W',
'D',
'h',
'min',
's',
'ms',
'us',
'ns')}}
SCADA#
PlantData.scada
is configured by the openoa.schema.SCADAMetaData
class, which is set in the configuration
data with the “scada” key. Users can set each of the following “Field Name” keys with their own
data’s column names in their SCADA data, plus the “freq” field. Users just have to ensure that the
columns are already using the specified units, and that each column is already using the listed
data type or can be converted to that type.
Field Name |
Descriptive Name |
Data Type (SCADAMetaData.dtypes) |
Units (SCADAMetaData.units) |
---|---|---|---|
time |
time stamp |
datetime64[ns] |
datetime64[ns] |
asset_id |
id |
string |
None |
WTUR_W |
power |
float |
kW |
WMET_HorWdSpd |
windspeed |
float |
m/s |
WMET_HorWdDir |
winddirection |
float |
degrees |
WTUR_TurSt |
status |
string |
None |
WROT_BlPthAngVal |
pitch |
float |
degrees |
WMET_EnvTmp |
temp |
float |
Celsius |
37scada:
38 frequency: 10min # timestamp frequency
39 asset_id: Wind_turbine_name # Unique ID of wind turbine
40 WROT_BlPthAngVal: Ba_avg # pitch angle, degrees
41 WTUR_W: P_avg # power produced, kW
42 WMET_EnvTmp: Ot_avg # temperature, C
43 time: Date_time # timestamps
44 WMET_HorWdDir: Wa_avg # wind direction, degrees
45 WMET_HorWdDirRel: Va_avg # wind direction relative to nacelle orientation, degrees
Meter#
PlantData.meter
is configured by the openoa.schema.MeterMetaData
class, which is set in the configuration
data with the “meter” key. Users can set each of the following “Field Name” keys with their own
data’s column names in their SCADA data, plus the “freq” field. Users just have to ensure that the
columns are already using the specified units, and that each column is already using the listed
data type or can be converted to that type.
Field Name |
Descriptive Name |
Data Type (MeterMetaData.dtypes) |
Units (MeterMetaData.units) |
---|---|---|---|
time |
time stamp |
datetime64[ns] |
datetime64[ns] |
MMTR_SupWh |
energy |
float |
kWh |
17meter:
18 MMTR_SupWh: net_energy_kwh # net energy, kWh
19 time: time # timestamp
Tower#
PlantData.tower
is configured by the openoa.schema.TowerMetaData
class, which is set in the configuration
data with the “tower” key. Users can set each of the following “Field Name” keys with their own
data’s column names in their met tower data, plus the “freq” field. Users just have to ensure that
the columns are already using the specified units, and that each column is already using the listed
data type or can be converted to that type.
Field Name |
Descriptive Name |
Data Type (TowerMetaData.dtypes) |
Units (TowerMetaData.units) |
---|---|---|---|
time |
time |
datetime64[ns] |
datetime64[ns] |
asset_id |
id |
string |
None |
Curtail#
PlantData.curtail
is configured by the openoa.schema.CurtailMetaData
class, which is set in the configuration
data with the “curtail” key. Users can set each of the following “Field Name” keys with their own
data’s column names in their curtailment data, plus the “freq” field. Users just have to ensure that
the columns are already using the specified units, and that each column is already using the listed
data type or can be converted to that type.
Field Name |
Descriptive Name |
Data Type (CurtailMetaData.dtypes) |
Units (CurtailMetaData.units) |
---|---|---|---|
time |
time stamp |
datetime64[ns] |
datetime64[ns] |
IAVL_ExtPwrDnWh |
curtailment |
float |
kWh |
IAVL_DnWh |
availability |
float |
kWh |
9curtail:
10 IAVL_DnWh: availability_kwh # availability, kWh
11 IAVL_ExtPwrDnWh: curtailment_kwh # curtailment, kWh
12 frequency: 10min # timestamp frequency
13 time: time # timestamp
Status#
PlantData.status
is configured by the openoa.schema.StatusMetaData
class, which is set in the configuration
data with the “status” key. Users can set each of the following “Field Name” keys with their own
data’s column names in their turbine status data, plus the “freq” field. Users just have to ensure
that the columns are already using the specified units, and that each column is already using the
listed data type or can be converted to that type.
Note
This section does not get used by OpenOA internally, though it is expected to be used in the future.
Field Name |
Descriptive Name |
Data Type (StatusMetaData.dtypes) |
Units (StatusMetaData.units) |
---|---|---|---|
time |
time stamp |
datetime64[ns] |
datetime64[ns] |
asset_id |
id |
string |
None |
status_id |
status id |
int |
None |
status_code |
status code |
int |
None |
status_text |
status text |
string |
None |
Asset#
PlantData.asset
is configured by the openoa.schema.AssetMetaData
class, which is set in the configuration
data with the “asset” key. Users can set each of the following “Field Name” keys with their own
data’s column names in their turbine and met tower asset data. Users just have to ensure that the
columns are already using the specified units, and that each column is already using the listed data
type or can be converted to that type.
Field Name |
Descriptive Name |
Data Type (AssetMetaData.dtypes) |
Units (AssetMetaData.units) |
---|---|---|---|
asset_id |
id |
string |
None |
latitude |
latitude |
float |
WGS-84 |
longitude |
longitude |
float |
WGS-84 |
rated_power |
rated power |
float |
kW |
hub_height |
hub height |
float |
m |
rotor_diameter |
rotor diameter |
float |
m |
elevation |
elevation |
float |
m |
type |
type |
string |
None |
1asset:
2 elevation: elevation_m # elevation abouve sea level, meters
3 hub_height: Hub_height_m # hub height, meters
4 asset_id: Wind_turbine_name
5 latitude: Latitude # WGS-84 latitude
6 longitude: Longitude # WGS-84 longitude
7 rated_power: Rated_power # rated power, MW
8 rotor_diameter: Rotor_diameter_m # rotor diameter, meters
Reanalysis#
PlantData.reanalysis
is configured by the openoa.schema.ReanlysisMetaData
class, which is set in the configuration
data with the “reanalysis” key, but it should be noted that reanalysis data should be a dictionary
of settintgs for each of the reanalysis products provided. For instance, if MERRA2 and ERA5 data are
both provided, then each data set’s configurations should be provided under reanalysis as dictionary
key-value pairs, where the key is the name of the reanalysis product, and the values are the
reanalysis settings for that product’s data. For each product, users can set each of the following
“Field Name” keys with their own data’s column names in their turbine and met tower asset data, plus
the “freq” field. Users just have to ensure that the columns are already using the specified units,
and that each column is already using the listed data type or can be converted to that type.
MERRA-2#
Data are based on the single-level diagnostic data available here: https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_V5.12.4/summary?keywords=”MERRA-2”
Wind speed and direction are taken directly from the diagnostic 50-m u- and v-wind fields provided in this dataset. Air density at 50m is calculated using temperature and pressure estimations at 50m and the ideal gas law. Temperature at 50m is estimated by taking the 10-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 50m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.
NCEP-2#
Data are based on the single-level diagnostic data available here: https://rda.ucar.edu/datasets/ds091.0/
Wind speed and direction are taken directly from the diagnostic 10-m u- and v-wind fields provided in this dataset. Air density at 10m is calculated using temperature and pressure estimations at 10m and the ideal gas law. Temperature at 10m is estimated by taking the 2-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 10m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.
ERA5#
Data are based on the model-level data available here: https://rda.ucar.edu/datasets/ds627.0/
Model levels are based on sigma coordinates (i.e. fractions of surface pressure). From this dataset, we extract temperature, u-wind, and v-wind at the 58th model level, which is on average about 72m above ground level (https://www.ecmwf.int/en/forecasts/documentation-and-support/60-model-levels). We also extract surface pressure data. Air density at the 58th model level is calculated using temperature data extracted at that level and an estimation of pressure at that level using the ideal gas law. Pressure at the 58th model level is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.
For any and all of the renalysis data defined, a dictionary should be provided (seen below the table) to determine which data sets are being used (dictionary keys) and their schema (dictionary values) as would be provided for any other schema defintion.
Field Name |
Descriptive Name |
Data Type (ReanalysisMetaData.dtypes) |
Units (ReanalysisMetaData.units) |
---|---|---|---|
time |
time stamp |
datetime64[ns] |
datetime64[ns] |
WMETR_HorWdSpd |
windspeed |
float |
m/s |
WMETR_HorWdSpdU |
eastward windspeed |
float |
m/s |
WMETR_HorWdSpdV |
northward windspeed |
float |
m/s |
WMETR_HorWdDir |
wind direction |
float |
degrees |
WMETR_EnvTmp |
temperature |
float |
Kelvin |
WMETR_AirDen |
air density |
float |
kg/m^3 |
WMEsTR_EnvPres |
surface pressure |
float |
Pa |
20reanalysis:
21 era5: # reanalysis product name/ID
22 frequency: h # timestamp frequency
23 WMETR_EnvPres: surf_pres # surface pressure, Pa
24 WMETR_EnvTmp: t_2m # temperature, K
25 time: datetime # timestamps
26 WMETR_HorWdSpdU: u_100 # u-direction windspeed, m/s
27 WMETR_HorWdSpdV: v_100 # v-direction windspeed, m/s
28 WMETR_HorWdDir: winddirection_deg # wind direction, degrees
29 merra2: # reanalysis product name/ID
30 frequency: h # timestamp frequency
31 WMETR_EnvPres: surface_pressure # surface pressure, Pa
32 WMETR_EnvTmp: temp_2m # temperature, K
33 time: datetime # timestamps
34 WMETR_HorWdSpdU: u_50 # u-direction windspeed, m/s
35 WMETR_HorWdSpdV: v_50 # v-direction windspeed, m/s
36 WMETR_HorWdDir: winddirection_deg # wind direction, degrees