PlantData and PlantMetaData Schema#

openoa.plant.PlantData and openoa.schema.PlantMetaData are the core data classes used to contain all data relevant to a wind plant and is used throughout OpenOA. PlantData holds multiple Pandas data frames, each with a specified schema. You can take advantage of the data structures in the plant module by creating it using one of the available constructors.

A quick reference for the required data for any analysis type or generally the types and units of the data that openoa.plant.PlantData expects can be found in the repository at OpenOA/openoa/schema/ in any of the JSON or YAML files, with the README giving a brief overview of each.

Additionally, PlantData requires a metadata specification, as provided through the openoa.schema.PlantMetaData class, which enable a series of data validations that run at initialization. Optionally, this can be re-run later using openoa.plant.PlantData.validate(). Specifically, using the new PlantMetaData structure, a user can map the column names already present in their data to those that OpenOA will use internally, set the expected frequency of the their time-dependent data, and check the expected units and datatypes that the data should use. These configurations can be set in either a dictionary, or a metadata file using a JSON or YAML data format, whichever is preferable to the user. In the examples, the file “examples/data/plant_meta.yml” or “examples/data/plant_meta.json” are used interchangeably, and can be used as a guide.

Using the metadata configurations specified in a metadata file (or dictionary), an PlantData object can be created as follows, where “X_df” represents a pandas DataFrame containing data for a specific data type. Alternatively, these DataFrame arguments can be replaced by file paths to csv files where the data are saved.:

plant = PlantData(
    analysis_type=None,  # List of analysis methods for which the data will be validated
    metadata="{path_to_metadata_file}/plant_meta.yml",
    scada=scada_df,
    meter=meter_df,
    curtail=curtail_df,
    asset=asset_df,
    reanalysis=reanalysis_dict,
)

The following sections will show how each of the data should be configured, and where to check for these settings in the code itself. It should be noted that neither the meta data class dtypes (where “X” represents a specific data type), nor the meta data units, can be set manually, or updated as they are exclusively for reference to users.

Each of the meta data classes accept the inputs of the elements under the column “Field Name” in the following subsections, in addition to the frequency (freq) for time-dependent inputs. All other attributes of the metadata classes for user reference, and therefore immutable. After setting each of the inputs, users can access the dictionary elements col_map, dtypes, and units to work with the various mappings. Below, is a demonstration of this mapping in practice, showing the SCADA data mapping used in “examples/data/plant_meta.yml”, where the keys are the OpenOA column names, and the values are the La Haute Borne data naming conventions. This mapping can be repeated for each of the other metadata types.

37scada:
38  frequency: 10min  # timestamp frequency
39  asset_id: Wind_turbine_name  # Unique ID of wind turbine
40  WROT_BlPthAngVal: Ba_avg  # pitch angle, degrees
41  WTUR_W: P_avg  # power produced, kW
42  WMET_EnvTmp: Ot_avg  # temperature, C
43  time: Date_time  # timestamps
44  WMET_HorWdDir: Wa_avg  # wind direction, degrees
45  WMET_HorWdDirRel: Va_avg  # wind direction relative to nacelle orientation, degrees

Data Schema User Guide#

The following subsections will demonstrate the required data mapping schemas to enable PlantData to validate and and convert user-specified data to a validated OpenOA schema for use throughout the codebase. The data columns and their associated units and datatypes will be shown in a table, followed by a demonstration of how this is used in the La Haute Borne example data used for all of the example analysis workflows. It should be noted that the column “Field Name” is the internal naming convention, and should be the dictionary or JSON/YAML key with the actual column naming as its associated value (as is seen in the YAML snippets for each section).

It should be noted though, that validating a PlantData object with analysis_type = “all” will check for all of the field names listed below for all provided data. However, if the PlantData object is only being validated for a specific analysis type, or types, then only the data specified in openoa.plant.ANALYSIS_REQUIREMENTS (shown below) will be checked, and in the case of analysis_type = None, then no errors will be raised during validation.

Additionally, some analysis types have modified uses, which mean the following:

  • MonteCarloAEP-temp adds in the reanalysis temperature data for the long term correction.

  • MonteCarloAEP-wd adds in the reanalysis wind direction data for the long term correction.

  • MonteCarloAEP-temp adds in the reanalysis temperature and wind direction data for the long term correction.

  • WakeLosses-scada uses the wind speed and direction data from the SCADA data

  • WakeLosses-tower uses the wind speed and direction data from the met tower data

 1from pprint import pprint
 2from openoa.schema.metadata import ANALYSIS_REQUIREMENTS
 3
 4# The valid analysis_type inputs
 5print("Valid `analyis_type`s with OpenOA-provided schema")
 6for analysis in sorted(ANALYSIS_REQUIREMENTS):
 7    print(analysis)
 8
 9# An example of the contents for required columns from different data sources and their frequencies
10print()
11print(
12    "Requirements for the modified MonteCarloAEP analysis using renalysis\n"
13    "temperature as an additonal variable:"
14)
15pprint(ANALYSIS_REQUIREMENTS["MonteCarloAEP-temp"])
Valid `analyis_type`s with OpenOA-provided schema
ElectricalLosses
MonteCarloAEP
MonteCarloAEP-temp
MonteCarloAEP-temp-wd
MonteCarloAEP-wd
StaticYawMisalignment
TurbineLongTermGrossEnergy
WakeLosses-scada
WakeLosses-tower

Requirements for the modified MonteCarloAEP analysis using renalysis
temperature as an additonal variable:
{'curtail': {'columns': ['IAVL_DnWh', 'IAVL_ExtPwrDnWh'],
             'freq': ('MS', 'ME', 'W', 'D', 'h', 'min', 's', 'ms', 'us', 'ns')},
 'meter': {'columns': ['MMTR_SupWh'],
           'freq': ('MS', 'ME', 'W', 'D', 'h', 'min', 's', 'ms', 'us', 'ns')},
 'reanalysis': {'columns': ['WMETR_HorWdSpd', 'WMETR_AirDen', 'WMETR_EnvTmp'],
                'freq': ('MS',
                         'ME',
                         'W',
                         'D',
                         'h',
                         'min',
                         's',
                         'ms',
                         'us',
                         'ns')}}

SCADA#

PlantData.scada is configured by the openoa.schema.SCADAMetaData class, which is set in the configuration data with the “scada” key. Users can set each of the following “Field Name” keys with their own data’s column names in their SCADA data, plus the “freq” field. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

Field Name

Descriptive Name

Data Type (SCADAMetaData.dtypes)

Units (SCADAMetaData.units)

time

time stamp

datetime64[ns]

datetime64[ns]

asset_id

id

string

None

WTUR_W

power

float

kW

WMET_HorWdSpd

windspeed

float

m/s

WMET_HorWdDir

winddirection

float

degrees

WTUR_TurSt

status

string

None

WROT_BlPthAngVal

pitch

float

degrees

WMET_EnvTmp

temp

float

Celsius

37scada:
38  frequency: 10min  # timestamp frequency
39  asset_id: Wind_turbine_name  # Unique ID of wind turbine
40  WROT_BlPthAngVal: Ba_avg  # pitch angle, degrees
41  WTUR_W: P_avg  # power produced, kW
42  WMET_EnvTmp: Ot_avg  # temperature, C
43  time: Date_time  # timestamps
44  WMET_HorWdDir: Wa_avg  # wind direction, degrees
45  WMET_HorWdDirRel: Va_avg  # wind direction relative to nacelle orientation, degrees

Meter#

PlantData.meter is configured by the openoa.schema.MeterMetaData class, which is set in the configuration data with the “meter” key. Users can set each of the following “Field Name” keys with their own data’s column names in their SCADA data, plus the “freq” field. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

Field Name

Descriptive Name

Data Type (MeterMetaData.dtypes)

Units (MeterMetaData.units)

time

time stamp

datetime64[ns]

datetime64[ns]

MMTR_SupWh

energy

float

kWh

17meter:
18  MMTR_SupWh: net_energy_kwh  # net energy, kWh
19  time: time  # timestamp

Tower#

PlantData.tower is configured by the openoa.schema.TowerMetaData class, which is set in the configuration data with the “tower” key. Users can set each of the following “Field Name” keys with their own data’s column names in their met tower data, plus the “freq” field. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

Field Name

Descriptive Name

Data Type (TowerMetaData.dtypes)

Units (TowerMetaData.units)

time

time

datetime64[ns]

datetime64[ns]

asset_id

id

string

None

Curtail#

PlantData.curtail is configured by the openoa.schema.CurtailMetaData class, which is set in the configuration data with the “curtail” key. Users can set each of the following “Field Name” keys with their own data’s column names in their curtailment data, plus the “freq” field. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

Field Name

Descriptive Name

Data Type (CurtailMetaData.dtypes)

Units (CurtailMetaData.units)

time

time stamp

datetime64[ns]

datetime64[ns]

IAVL_ExtPwrDnWh

curtailment

float

kWh

IAVL_DnWh

availability

float

kWh

 9curtail:
10  IAVL_DnWh: availability_kwh  # availability, kWh
11  IAVL_ExtPwrDnWh: curtailment_kwh  # curtailment, kWh
12  frequency: 10min  # timestamp frequency
13  time: time  # timestamp

Status#

PlantData.status is configured by the openoa.schema.StatusMetaData class, which is set in the configuration data with the “status” key. Users can set each of the following “Field Name” keys with their own data’s column names in their turbine status data, plus the “freq” field. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

Note

This section does not get used by OpenOA internally, though it is expected to be used in the future.

Field Name

Descriptive Name

Data Type (StatusMetaData.dtypes)

Units (StatusMetaData.units)

time

time stamp

datetime64[ns]

datetime64[ns]

asset_id

id

string

None

status_id

status id

int

None

status_code

status code

int

None

status_text

status text

string

None

Asset#

PlantData.asset is configured by the openoa.schema.AssetMetaData class, which is set in the configuration data with the “asset” key. Users can set each of the following “Field Name” keys with their own data’s column names in their turbine and met tower asset data. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

Field Name

Descriptive Name

Data Type (AssetMetaData.dtypes)

Units (AssetMetaData.units)

asset_id

id

string

None

latitude

latitude

float

WGS-84

longitude

longitude

float

WGS-84

rated_power

rated power

float

kW

hub_height

hub height

float

m

rotor_diameter

rotor diameter

float

m

elevation

elevation

float

m

type

type

string

None

1asset:
2  elevation: elevation_m  # elevation abouve sea level, meters
3  hub_height: Hub_height_m  # hub height, meters
4  asset_id: Wind_turbine_name
5  latitude: Latitude  # WGS-84 latitude
6  longitude: Longitude  # WGS-84 longitude
7  rated_power: Rated_power  # rated power, MW
8  rotor_diameter: Rotor_diameter_m  # rotor diameter, meters

Reanalysis#

PlantData.reanalysis is configured by the openoa.schema.ReanlysisMetaData class, which is set in the configuration data with the “reanalysis” key, but it should be noted that reanalysis data should be a dictionary of settintgs for each of the reanalysis products provided. For instance, if MERRA2 and ERA5 data are both provided, then each data set’s configurations should be provided under reanalysis as dictionary key-value pairs, where the key is the name of the reanalysis product, and the values are the reanalysis settings for that product’s data. For each product, users can set each of the following “Field Name” keys with their own data’s column names in their turbine and met tower asset data, plus the “freq” field. Users just have to ensure that the columns are already using the specified units, and that each column is already using the listed data type or can be converted to that type.

MERRA-2#

Data are based on the single-level diagnostic data available here: https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_V5.12.4/summary?keywords=”MERRA-2”

Wind speed and direction are taken directly from the diagnostic 50-m u- and v-wind fields provided in this dataset. Air density at 50m is calculated using temperature and pressure estimations at 50m and the ideal gas law. Temperature at 50m is estimated by taking the 10-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 50m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

NCEP-2#

Data are based on the single-level diagnostic data available here: https://rda.ucar.edu/datasets/ds091.0/

Wind speed and direction are taken directly from the diagnostic 10-m u- and v-wind fields provided in this dataset. Air density at 10m is calculated using temperature and pressure estimations at 10m and the ideal gas law. Temperature at 10m is estimated by taking the 2-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 10m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

ERA5#

Data are based on the model-level data available here: https://rda.ucar.edu/datasets/ds627.0/

Model levels are based on sigma coordinates (i.e. fractions of surface pressure). From this dataset, we extract temperature, u-wind, and v-wind at the 58th model level, which is on average about 72m above ground level (https://www.ecmwf.int/en/forecasts/documentation-and-support/60-model-levels). We also extract surface pressure data. Air density at the 58th model level is calculated using temperature data extracted at that level and an estimation of pressure at that level using the ideal gas law. Pressure at the 58th model level is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

For any and all of the renalysis data defined, a dictionary should be provided (seen below the table) to determine which data sets are being used (dictionary keys) and their schema (dictionary values) as would be provided for any other schema defintion.

Field Name

Descriptive Name

Data Type (ReanalysisMetaData.dtypes)

Units (ReanalysisMetaData.units)

time

time stamp

datetime64[ns]

datetime64[ns]

WMETR_HorWdSpd

windspeed

float

m/s

WMETR_HorWdSpdU

eastward windspeed

float

m/s

WMETR_HorWdSpdV

northward windspeed

float

m/s

WMETR_HorWdDir

wind direction

float

degrees

WMETR_EnvTmp

temperature

float

Kelvin

WMETR_AirDen

air density

float

kg/m^3

WMEsTR_EnvPres

surface pressure

float

Pa

20reanalysis:
21  era5:  # reanalysis product name/ID
22    frequency: h  # timestamp frequency
23    WMETR_EnvPres: surf_pres  # surface pressure, Pa
24    WMETR_EnvTmp: t_2m  # temperature, K
25    time: datetime  # timestamps
26    WMETR_HorWdSpdU: u_100  # u-direction windspeed, m/s
27    WMETR_HorWdSpdV: v_100  # v-direction windspeed, m/s
28    WMETR_HorWdDir: winddirection_deg  # wind direction, degrees
29  merra2:  # reanalysis product name/ID
30    frequency: h  # timestamp frequency
31    WMETR_EnvPres: surface_pressure  # surface pressure, Pa
32    WMETR_EnvTmp: temp_2m  # temperature, K
33    time: datetime  # timestamps
34    WMETR_HorWdSpdU: u_50  # u-direction windspeed, m/s
35    WMETR_HorWdSpdV: v_50  # v-direction windspeed, m/s
36    WMETR_HorWdDir: winddirection_deg  # wind direction, degrees