PlantData and the Meta Data APIs#

PlantData API#

The PlantData object is the primary engine for OpenOA’s data storage, validation, and analysis.

class openoa.plant.PlantData(log_level: str = 'WARNING', metadata: str | Path | dict | PlantMetaData = {}, analysis_type: Sequence | str | int | float | None = None, scada: str | Path | DataFrame | None = None, meter: str | Path | DataFrame | None = None, tower: str | Path | DataFrame | None = None, status: str | Path | DataFrame | None = None, curtail: str | Path | DataFrame | None = None, asset: str | Path | DataFrame | None = None, reanalysis: dict[str | Path | DataFrame] | None = None)[source]#

Bases: object

Overarching data object used for storing, accessing, and acting on the primary operational analysis data types, including: SCADA, meter, tower, status, curtailment, asset, and reanalysis data. As of version 3.0, this class provides an automated validation scheme through the use of analysis_type as well as a secondary scheme that can be run after further manipulations are performed. Additionally, version 3.0 incorporates a metadata scheme PlantMetaData to map between user column naming conventions and the internal column naming conventions for both usability and code consistency.

Parameters:
  • metadata (PlantMetaData) – A nested dictionary of the schema definition for each of the data types that will be input, and some additional plant parameters. See PlantMetaData, SCADAMetaData, MeterMetaData, TowerMetaData, StatusMetaData, CurtailMetaData, AssetMetaData, and/or ReanalysisMetaData for more information.

  • analysis_type (list[str]) – A single, or list of, analysis type(s) that will be run, that are configured in ANALYSIS_REQUIREMENTS. See openoa.schema.metadata.ANALYSIS_REQUIREMENTS for requirements details.

    • None: Don’t raise any errors for errors found in the data. This is intended for loading in messy data, but validate() should be run later if planning on running any analyses.

    • “all”: This is to check that all columns specified in the metadata schema align with the data provided, as well as data types and frequencies (where applicable).

    • “MonteCarloAEP”: Checks the data components that are relevant to a Monte Carlo AEP analysis.

    • “MonteCarloAEP-temp”: Checks the data components that are relevant to a Monte Carlo AEP analysis with ambient temperature data.

    • “MonteCarloAEP-wd”: Checks the data components that are relevant to a Monte Carlo AEP analysis using an additional wind direction data point.

    • “MonteCarloAEP-temp-wd”: Checks the data components that are relevant to a Monte Carlo AEP analysis with ambient temperature and wind direction data.

    • “TurbineLongTermGrossEnergy”: Checks the data components that are relevant to a turbine long term gross energy analysis.

    • “ElectricalLosses”: Checks the data components that are relevant to an electrical losses analysis.

    • “WakeLosses-scada”: Checks the data components that are relevant to a wake losses analysis that uses the SCADA-based wind speed and direction data.

    • “WakeLosses-tower”: Checks the data components that are relevant to a wake losses analysis that uses the met tower-based wind speed and direction data.

  • scada (pd.DataFrame) – Either the SCADA data that’s been pre-loaded to a pandas DataFrame, or a path to the location of the data to be imported. See SCADAMetaData for column data specifications.

  • meter (pd.DataFrame) – Either the meter data that’s been pre-loaded to a pandas DataFrame, or a path to the location of the data to be imported. See MeterMetaData for column data specifications.

  • tower (pd.DataFrame) – Either the met tower data that’s been pre-loaded to a pandas DataFrame, or a path to the location of the data to be imported. See TowerMetaData for column data specifications.

  • status (pd.DataFrame) – Either the status data that’s been pre-loaded to a pandas DataFrame, or a path to the location of the data to be imported. See StatusMetaData for column data specifications.

  • curtail (pd.DataFrame) – Either the curtailment data that’s been pre-loaded to a pandas DataFrame, or a path to the location of the data to be imported. See CurtailMetaData for column data specifications.

  • asset (pd.DataFrame) – Either the asset summary data that’s been pre-loaded to a pandas DataFrame, or a path to the location of the data to be imported. See AssetMetaData for column data specifications.

  • reanalysis (dict[str, pd.DataFrame]) – Either the reanalysis data that’s been pre-loaded to a dictionary of pandas DataFrame with keys indicating the data source, such as “era5” or “merra2”, or a dictionary of paths to the location of the data to be imported following the same key naming convention. See ReanalysisMetaData for column data specifications.

Raises:

ValueError – Raised if any analysis specific validation checks don’t pass with an error message highlighting the appropriate issues.

Method generated by attrs for class PlantData.

data_validator(instance: Attribute, value: DataFrame | None) None[source]#

Validator function for each of the data buckets in PlantData that checks that the appropriate columns exist for each dataframe, each column is of the right type, and that the timestamp frequencies are appropriate for the given analysis_type.

Parameters:
  • instance (attrs.Attribute) – The attrs.Attribute details

  • value (pd.DataFrame | None) – The attribute’s user-provided value. A dictionary of dataframes is expected for reanalysis data only.

reanalysis_validator(instance: Attribute, value: dict[str, DataFrame] | None) None[source]#

Validator function for the reanalysis data that checks for both matching reanalysis product keys in the PlantMetaData.reanalysis metadata definition, and the following: appropriate columns exist for each dataframe, each column is of the right type, and that the timestamp frequencies are appropriate for the given analysis_type.

Parameters:
  • instance (attrs.Attribute) – The attrs.Attribute details.

  • value (dict[str, pd.DataFrame] | None) – The attribute’s user-provided value. A dictionary of dataframes is expected for reanalysis data only.

markdown()[source]#

A markdown-formatted version of the __str__.

property data_dict: dict[str, DataFrame]#

Property that returns a dictionary of the data contained in the PlantData object.

Returns:

A mapping of the data type’s name and the DataFrame.

Return type:

(dict[str, pd.DataFrame])

to_csv(save_path: str | Path, with_openoa_col_names: bool = True, metadata: str = 'metadata', scada: str = 'scada', meter: str = 'meter', tower: str = 'tower', asset: str = 'asset', status: str = 'status', curtail: str = 'curtail', reanalysis: str = 'reanalysis') None[source]#

Saves all of the dataframe objects to a CSV file in the provided save_path directory.

Parameters:
  • save_path (str | Path) – The folder where all the data should be saved.

  • with_openoa_col_names (bool, optional) – Use the PlantData column names (True), or convert the column names back to the originally provided values. Defaults to True.

  • metadata (str, optional) – File name (without extension) to be used for the metadata. Defaults to “metadata”.

  • scada (str, optional) – File name (without extension) to be used for the SCADA data. Defaults to “scada”.

  • meter (str, optional) – File name (without extension) to be used for the meter data. Defaults to “meter”.

  • tower (str, optional) – File name (without extension) to be used for the tower data. Defaults to “tower”.

  • asset (str, optional) – File name (without extension) to be used for the asset data. Defaults to “scada”.

  • status (str, optional) – File name (without extension) to be used for the status data. Defaults to “status”.

  • curtail (str, optional) – File name (without extension) to be used for the curtailment data. Defaults to “curtail”.

  • reanalysis (str, optional) – Base file name (without extension) to be used for the reanalysis data, where each dataset will use the name provided to form the following file name: {save_path}/{reanalysis}_{name}. Defaults to “reanalysis”.

validate(metadata: dict | str | Path | PlantMetaData | None = None) None[source]#

Secondary method to validate the plant data objects after loading or changing data with option to provide an updated metadata object/file as well

Parameters:

metadata (Optional[dict]) – Updated metadata object, dictionary, or file to create the updated metadata for data validation, which should align with the mapped column names during initialization.

Raises:

ValueError – Raised at the end if errors are caught in the validation steps.

parse_asset_geometry(reference_system: str | None = None, utm_zone: int | None = None, reference_longitude: float | None = None) None[source]#

Calculate UTM coordinates from latitude/longitude.

The UTM system divides the Earth into 60 zones, each 6deg of longitude in width. Zone 1 covers longitude 180deg to 174deg W; zone numbering increases eastward to zone 60, which covers longitude 174deg E to 180deg. The polar regions south of 80deg S and north of 84deg N are excluded.

Ref: http://geopandas.org/projections.html

Parameters:
  • reference_system (str, optional) – Used to define the coordinate reference system (CRS). If None is used, then the metadata.reference_system value will be used. Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.

  • utm_zone (int, optional) – UTM zone. If None is used, then the metadata.utm_zone value will be used. Defaults to the being calculated from reference_longitude.

  • reference_longitude (float, optional) – Reference longitude for calculating the UTM zone. If None is used, then the metadata.reference_longitude value will be used. Defaults to the mean of asset.longitude.

Returns: None

Sets the asset “geometry” column.

update_column_names(to_original: bool = False) None[source]#

Renames the columns of each dataframe to the be the keys from the metadata.xx.col_map that was passed during initialization.

Parameters:

to_original (bool, optional) – An indicator to map the column names back to the originally passed values. Defaults to False.

property turbine_ids: ndarray#

The 1D array of turbine IDs. This is created from the asset data, or unique IDs from the SCADA data, if asset is undefined.

property n_turbines: int#

The number of turbines contained in the data.

turbine_df(turbine_id: str) DataFrame[source]#

Filters scada on a single turbine_id and returns the filtered data frame.

Parameters:

turbine_id (str) – The asset_id of the turbine to retrieve its data.

Returns:

The turbine-specific SCADA data frame.

Return type:

pd.DataFrame

property tower_ids: ndarray#

The 1D array of met tower IDs. This is created from the asset data, or unique IDs from the tower data, if asset is undefined.

property n_towers: int#

The number of met towers contained in the data.

tower_df(tower_id: str) DataFrame[source]#

Filters tower on a single tower_id and returns the filtered data frame.

Parameters:

tower_id (str) – The ID of the met tower to retrieve its data.

Returns:

The met tower-specific data frame.

Return type:

pd.DataFrame

property asset_ids: ndarray#

The ID array of turbine and met tower IDs. This is created from the asset data, or unique IDs from both the SCADA data and tower data, if asset is undefined.

calculate_asset_distance_matrix() DataFrame[source]#

Calculates the distance between all assets on the site with np.inf for the distance between an asset and itself.

Returns:

Dataframe containing distances between each pair of assets

Return type:

pd.DataFrame

turbine_distance_matrix(turbine_id: str | None = None) DataFrame[source]#

Returns the distances between all turbines in the plant with np.inf for the distance between a turbine and itself.

Parameters:

turbine_id (str, optional) – Specific turbine ID for which the distances to other turbines are returned. If None, a matrix containing the distances between all pairs of turbines is returned. Defaults to None.

Returns:

Dataframe containing distances between each pair of turbines

Return type:

pd.DataFrame

tower_distance_matrix(tower_id: str | None = None) DataFrame[source]#

Returns the distances between all towers in the plant with np.inf for the distance between a tower and itself.

Parameters:

tower_id (str, optional) – Specific tower ID for which the distances to other towers are returned. If None, a matrix containing the distances between all pairs of towers is returned. Defaults to None.

Returns:

Dataframe containing distances between each pair of towers

Return type:

pd.DataFrame

calculate_asset_direction_matrix() DataFrame[source]#

Calculates the direction between all assets on the site with np.inf for the direction between an asset and itself, for all assets.

Returns:

Dataframe containing directions between each pair of assets (defined as the direction

from the asset given by the row index to the asset given by the column index, relative to north)

Return type:

pd.DataFrame

turbine_direction_matrix(turbine_id: str | None = None) DataFrame[source]#

Returns the directions between all turbines in the plant with np.inf for the direction between a turbine and itself.

Parameters:

turbine_id (str, optional) – Specific turbine ID for which the directions to other turbines are returned. If None, a matrix containing the directions between all pairs of turbines is returned. Defaults to None.

Returns:

Dataframe containing directions between each pair of turbines (defined as the

direction from the turbine given by the row index to the turbine given by the column index, relative to north)

Return type:

pd.DataFrame

tower_direction_matrix(tower_id: str | None = None) DataFrame[source]#

Returns the directions between all towers in the plant with np.inf for the direction between a tower and itself.

Parameters:

tower_id (str, optional) – Specific tower ID for which the directions to other towers are returned. If None, a matrix containing the directions between all pairs of towers is returned. Defaults to None.

Returns:

Dataframe containing directions between each pair of towers (defined as the

direction from the tower given by the row index to the tower given by the column index, relative to north)

Return type:

pd.DataFrame

calculate_asset_geometries() None[source]#

Calculates the asset distances and parses the asset geometries. This is intended for use during initialization and for when asset data is added after initialization

get_freestream_turbines(wd: float, freestream_method: str = 'sector', sector_width: float = 90.0)[source]#

Returns a list of freestream (unwaked) turbines for a given wind direction. Freestream turbines can be identified using different methods (“sector” or “IEC” methods). For the sector method, if there are any turbines upstream of a turbine within a fixed wind direction sector centered on the wind direction of interest, defined by the sector_width argument, the turbine is considered waked. The IEC method uses the freestream definition provided in Annex A of IEC 61400-12-1 (2005).

Parameters:
  • wd (float) – Wind direction to identify freestream turbines for (degrees)

  • freestream_method (str, optional) – Method used to identify freestream turbines (“sector” or “IEC”). Defaults to “sector”.

  • sector_width (float, optional) – Width of wind direction sector centered on the wind direction of interest used to determine whether a turbine is waked for the “sector” method (degrees). For a given turbine, if any other upstream turbines are located within the sector, then the turbine is considered waked. Defaults to 90 degrees.

Returns:

List of freestream turbine asset IDs

Return type:

list

calculate_nearest_neighbor(turbine_ids: list | ndarray = None, tower_ids: list | ndarray = None) None[source]#

Finds nearest turbine and met tower neighbors all of the available turbines and towers in asset or as defined in turbine_ids and tower_ids.

Parameters:
  • turbine_ids (list | np.ndarray, optional) – A list of turbine IDs, if not using all turbines in the data. Defaults to None.

  • tower_ids (list | np.ndarray, optional) – A list of met tower IDs, if not using all met towers in the data. Defaults to None.

Returns: None

Creates the “nearest_turbine_id” and “nearest_tower_id” column in asset.

nearest_turbine(asset_id: str) str[source]#

Finds the nearest turbine to the provided asset_id.

Parameters:

asset_id (str) – A valid asset asset_id.

Returns:

The turbine asset_id closest to the provided asset_id.

Return type:

str

nearest_tower(asset_id: str) str[source]#

Finds the nearest tower to the provided asset_id.

Parameters:

asset_id (str) – A valid asset asset_id.

Returns:

The tower asset_id closest to the provided asset_id.

Return type:

str

EYAGapAnalysis(eya_estimates: dict | EYAEstimate, oa_results: dict | OAResults) EYAGapAnalysis#

Performs a gap analysis between the estimated annual energy production (AEP) from an energy yield estimate (EYA) and the actual AEP as measured from an operational assessment (OA).

The gap analysis is based on comparing the following three key metrics:

  1. Availability loss

  2. Electrical loss

  3. Sum of turbine ideal energy

Here turbine ideal energy is defined as the energy produced during ‘normal’ or ‘ideal’ turbine operation, i.e., no downtime or considerable underperformance events. This value encompasses several different aspects of an EYA (wind resource estimate, wake losses,turbine performance, and blade degradation) and in most cases should have the largest impact in a gap analysis relative to the first two metrics.

This gap analysis method is fairly straighforward. Relevant EYA and OA metrics are passed in when defining the class, differences in EYA estimates and OA results are calculated, and then a ‘waterfall’ plot is created showing the differences between the EYA and OA-estimated AEP values and how they are linked from differences in the three key metrics.

Parameters:
  • plant (PlantData object) – PlantData object from which EYAGapAnalysis should draw data.

  • eya_estimates (EYAEstimate) – Numpy array with EYA estimates listed in required order

  • oa_results (OAResults) – Numpy array with OA results listed in required order.

ElectricalLosses(UQ: bool = False, num_sim: int = 20000, uncertainty_correction_threshold: ndarray[Any, dtype[float64]] | tuple[float, float] | float = (0.9, 0.995), uncertainty_meter: ndarray[Any, dtype[float64]] | tuple[float, float] | float = 0.005, uncertainty_scada: ndarray[Any, dtype[float64]] | tuple[float, float] | float = 0.005) ElectricalLosses#

A serial implementation of calculating the average monthly and annual electrical losses at a wind power plant, and the associated uncertainty. Energy output from the turbine SCADA meter and the wind plant revenue meter are used to estimate electrical losses.

First, the daily sums of turbine and revenue meter energy are calculated over the plant’s period of record where all turbines and the revenue meter contan every considered timestep. Electrical losses are then calculated as the difference between the total turbine energy production and the meter production over those concurrent days.

For uncertainty quantification, a Monte Carlo (MC) approach is used to sample the revenue meter data and SCADA data with a default 0.5% imposed uncertainty, alongside a sampled filtering parameter. The uncertainty in estimated electrical losses is quantified as the standard deviation of the distribution of losses obtained from the MC sampling.

If the revenue meter data is not provided on a daily or sub-daily basis (e.g. monthly), the the sum of daily turbine energy is corrected for any missing reported energy data from the turbines based on the ratio of expected number of data points per day to the actual data points available. The daily corrected sum of turbine energy is then summed on a monthly basis. Electrical loss is then the difference between the total corrected turbine energy production and meter production over those concurrent months.

Parameters:
  • plant (PlantData) – A openoa.plant.PlantData object that has been validated with at least :py:attr:`openoa.plant.PlantData.analysis_type = “ElectricalLosses”.

  • UQ (bool) – Indicator to perform (True) or not (False) uncertainty quantification.

  • num_sim (int) – Number of Monte Carlo simulations to perform.

  • uncertainty_meter (float) – Uncertainty imposed on the revenue meter data (for UQ = True case).

  • uncertainty_scada (float) – Uncertainty imposed on the scada data (for UQ = True case).

  • uncertainty_correction_threshold (tuple | float) – Data availability thresholds, in the range of (0, 1), under which months should be eliminated. If UQ = True, then a 2-element tuple containing an upper and lower bound for a randomly selected value should be given, otherwise, a scalar value should be provided.

MonteCarloAEP(reanalysis_products: list[str] | None = None, uncertainty_meter: float = 0.005, uncertainty_losses: float = 0.05, uncertainty_windiness: ndarray[Any, dtype[float64]] = (10.0, 20.0), uncertainty_loss_max: ndarray[Any, dtype[float64]] = (10.0, 20.0), outlier_detection: bool = False, uncertainty_outlier: ndarray[Any, dtype[float64]] = (1.0, 3.0), uncertainty_nan_energy: float = 0.01, time_resolution: str = 'MS', end_date_lt: str | Timestamp | None = None, reg_model: str = 'lin', ml_setup_kwargs: dict = {}, reg_temperature: bool = False, reg_wind_direction: bool = False) MonteCarloAEP#

A serial (Pandas-driven) implementation of the benchmark PRUF operational analysis implementation. This module collects standard processing and analysis methods for estimating plant level operational AEP and uncertainty.

The preprocessing should run in this order:
  1. Process revenue meter energy - creates monthly/daily data frame, gets revenue meter on monthly/daily basis, and adds data flag

  2. Process loss estimates - add monthly/daily curtailment and availabilty losses to monthly/daily data frame

  3. Process reanalysis data - add monthly/daily density-corrected wind speeds, temperature (if used) and wind direction (if used) from several reanalysis products to the monthly data frame

  4. Set up Monte Carlo - create the necessary Monte Carlo inputs to the OA process

  5. Run AEP Monte Carlo - run the OA process iteratively to get distribution of AEP results

The end result is a distribution of AEP results which we use to assess expected AEP and associated uncertainty

Parameters:
  • plant (PlantData) – PlantData object from which PlantAnalysis should draw data.

  • reg_temperature (bool) – Indicator to include temperature (True) or not (False) as a regression input. Defaults to False.

  • reg_wind_direction (bool) – Indicator to include wind direction (True) or not (False) as a regression input. Defaults to False.

  • reanalysis_products (list[str]) – List of reanalysis products to use for Monte Carlo sampling. Defaults to None, which pulls all the products contained in plant.reanalysis.

  • uncertainty_meter (float) – Uncertainty on revenue meter data. Defaults to 0.005.

  • uncertainty_losses (float) – Uncertainty on long-term losses. Defaults to 0.05.

  • uncertainty_windiness (tuple[int, int]) – number of years to use for the windiness correction. Defaults to (10, 20).

  • uncertainty_loss_max (tuple[int, int]) – Threshold for the combined availabilty and curtailment monthly loss threshold. Defaults to (10, 20).

  • outlier_detection (bool) – whether to perform (True) or not (False - default) outlier detection filtering. Defaults to False.

  • uncertainty_outlier (tuple[float, float]) – Min and max thresholds (Monte-Carlo sampled) for the outlier detection filter. At monthly resolution, this is the tuning constant for Huber’s t function for a robust linear regression. At daily/hourly resolution, this is the number of stdev of wind speed used as threshold for the bin filter. Defaults to (1, 3).

  • uncertainty_nan_energy (float) – Threshold to flag days/months based on NaNs. Defaults to 0.01.

  • time_resolution (string) – whether to perform the AEP calculation at monthly (“ME” or “MS”), daily (“D”) or hourly (“h”) time resolution. Defaults to “MS”.

  • end_date_lt (string or pandas.Timestamp) – The last date to use for the long-term correction. Note that only the component of the date corresponding to the time_resolution argument is considered. If None, the end of the last complete month of reanalysis data will be used. Defaults to None.

  • reg_model (string) – Which model to use for the regression (“lin” for linear, “gam” for, general additive, “gbm” for gradient boosting, or “etr” for extra treees). At monthly time resolution only linear regression is allowed because of the reduced number of data points. Defaults to “lin”.

  • ml_setup_kwargs (kwargs) – Keyword arguments to openoa.utils.machine_learning_setup.MachineLearningSetup class. Defaults to {}.

StaticYawMisalignment(turbine_ids: list[str] | None = None, UQ: bool = True, num_sim: int = 100, ws_bins: list[float] = [5.0, 6.0, 7.0, 8.0], ws_bin_width: float = 1.0, vane_bin_width: float = 1.0, min_vane_bin_count: int = 100, max_abs_vane_angle: float = 25.0, pitch_thresh: float = 0.5, num_power_bins: int = 25, min_power_filter: float = 0.01, max_power_filter: float | tuple[float, float] = (0.92, 0.98), power_bin_mad_thresh: float | tuple[float, float] = (4.0, 10.0), use_power_coeff: bool = False) StaticYawMisalignment#

A method for estimating static yaw misalignment for different wind speed bins for each specified wind turbine as well as the average static yaw misalignment over all wind speed bins using turbine-level SCADA data.

The method is comprised of the following core steps, which are performed for each specified wind turbine. If UQ is selected, the following steps are performed multiple times using Monte Carlo simulation to produce a distribution of static yaw misalignment estimates from which 95% confidence intervals can be derived:

  1. Timestamps containing power curve outliers are removed. Specifically, pitch angles are limited to a specified threshold to remove timestamps when turbines are operating in or near above-rated conditions where yaw misalignment has little impact on power performance. Next to increase the likelihood that power performance deviations are caused by yaw misalignment, a power curve outlier detection filter is used to remove timestamps when the turbine is operating abnormally. If UQ is selected, power curve outlier detection parameters will be chosen randomly for each Monte Carlo iteration.

  2. The filtered SCADA data are divided into the specified wind speed bins based on wind speed measured by the nacelle anemometer. If UQ is selected, the data corresponding to each wind speed bin are randomly resampled with replacement each Monte Carlo iteration (i.e., bootstrapping).

  3. For each wind speed bin, the power performance is binned by wind vane angle, where power performance can be defined as the raw power or a normalized coefficient power formed by dividing the raw power by the wind speed cubed.

  4. A cosine exponent curve as a function of wind vane angle is fit to the binned power performance values, where the free parameters are the amplitude, the exponent applied to the cosine, and the wind vane angle offset where the peak of the cosine curve is located.

  5. For each wind speed bin, the static yaw misalignment is estimated as the difference between the wind vane angle where power performance is maximized, based on the wind vane angle offset for the best-fit cosine curve, and the mean wind vane angle.

  6. The overall yaw misalignment is estimated as the average yaw misalignment over all wind speed bins.

Warning

This is a relatively simple method that has not yet been validated using data from wind turbines with known static yaw misalignments. Therefore, the results should be treated with caution. One known issue is that the method currently relies on nacelle wind speed measurements to determine the power performance as a function of wind vane angle. If the measured wind speed is affected by the amount of yaw misalignment, potential biases can exist in the estimated static yaw misalignment values.

Parameters:
  • plant (PlantData) – A openoa.plant.PlantData object that has been validated with at least openoa.plant.PlantData.analysis_type = “StaticYawMisalignment”.

  • turbine_ids (list, optional) – List of turbine IDs for which static yaw misalignment detection will be performed. If None, all turbines will be analyzed. Defaults to None.

  • UQ (bool, optional) – Dertermines whether to perform uncertainty quantification using Monte Carlo simulation (True) or provide a single yaw misalignment estimate (False). Defaults to True.

  • num_sim (int, optional) – Number of Monte Carlo iterations to perform. Only used if UQ = True. Defaults to 100.

  • ws_bins (float, optional) – Wind speed bin centers for which yaw misalignment detection will be performed (m/s). Defaults to [5.0, 6.0, 7.0, 8.0].

  • ws_bin_width (float, optional) – Wind speed bin size to use when detecting yaw misalignment for individual wind seed bins (m/s). Defaults to 1 m/s.

  • vane_bin_width (float, optional) – Wind vane bin size to use when detecting yaw misalignment (degrees). Defaults to 1 degree.

  • min_vane_bin_count (int, optional) – Minimum number of data points needed in a wind vane bin for it to be included when detecting yaw misalignment. Defaults to 100.

  • max_abs_vane_angle (float, optional) – Maximum absolute wind vane angle considered when detecting yaw misalignment. Defaults to 25 degrees.

  • pitch_thresh (float, optional) – Maximum blade pitch angle considered when detecting yaw misalignment. Defaults to 0.5 degrees.

  • num_power_bins (int, optional) – Number of power bins to use for power curve bin filtering to remove outlier data points. Defaults to 25.

  • min_power_filter (float, optional) – Minimum power threshold, defined as a fraction of rated power, to which the power curve bin filter should be applied. Defaults to 0.01.

  • max_power_filter (tuple | float, optional) – Maximum power threshold, defined as a fraction of rated power, to which the power curve bin filter should be applied. This should be a tuple when UQ = True (values are Monte-Carlo sampled within the specified range) or a single value when UQ = False. If undefined (None), a value of 0.95 will be used if UQ = False and values of (0.92, 0.98) will be used if UQ = True. Defaults to None.

  • power_bin_mad_thresh (tuple | float, optional) – The filter threshold for each power bin used to identify abnormal operation, expressed as the number of median absolute deviations from the median wind speed. This should be a tuple when UQ = True (values are Monte-Carlo sampled within the specified range) or a single value when UQ = False. If undefined (None), a value of 7.0 will be used if UQ = False and values of (4.0, 13.0) will be used if UQ = True. Defaults to None.

  • use_power_coeff (bool, optional) – If True, power performance as a function of wind vane angle will be quantified by normalizing power by the cube of the wind speed, approximating the power coefficient. If False, only power will be used. Defaults to False.

TurbineLongTermGrossEnergy(UQ: bool = True, num_sim: int = 20000, reanalysis_products=None, uncertainty_scada: float = 0.005, wind_bin_threshold: ndarray[Any, dtype[float64]] = (1.0, 3.0), max_power_filter: ndarray[Any, dtype[float64]] = (0.8, 0.9), correction_threshold: ndarray[Any, dtype[float64]] = (0.85, 0.95)) TurbineLongTermGrossEnergy#

Calculates long-term gross energy for each turbine in a wind farm using methods implemented in the utils subpackage for data processing and analysis.

The method proceeds as follows:

  1. Filter turbine data for normal operation

  2. Calculate daily means of wind speed, wind direction, and air density from reanalysis products

  3. Calculate daily sums of energy from each turbine

  4. Fit daily data (features are atmospheric variables, response is turbine power) using a generalized additive model (GAM)

  5. Apply model results to long-term atmospheric varaibles to calculate long term gross energy for each turbine

A Monte Carlo approach is implemented to obtain distribution of results, from which uncertainty can be quantified for the long-term gross energy estimate. A pandas DataFrame of long-term gross energy values is produced, containing each turbine in the wind farm. Note that this gross energy metric does not back out losses associated with waking or turbine performance. Rather, gross energy in this context is what turbine would have produced under normal operation (i.e. excluding downtime and underperformance).

Required schema of PlantData:

  • _scada_freq

  • reanalysis products with columns [‘time’, ‘WMETR_HorWdSpdU’, ‘WMETR_HorWdSpdV’, ‘WMETR_HorWdSpd’, ‘WMETR_AirDen’]

  • scada with columns: [‘time’, ‘asset_id’, ‘WMET_HorWdSpd’, ‘WTUR_W’, ‘WTUR_SupWh’]

Parameters:
  • UQ (bool) – Indicator to perform (True) or not (False) uncertainty quantification.

  • num_sim (int) – Number of simulations to run when UQ is True, otherwise set to 1. Defaults to 20000.

  • uncertainty_scada (float) – Uuncertainty imposed to the SCADA data when UQ is True only Defaults to 0.005.

  • reanalysis_products(objlist[str]) : List of reanalysis products to use for Monte Carlo sampling. Defaults to None, which pulls all the products contained in plant.reanalysis.

  • wind_bin_threshold (tuple) – The filter threshold for each vertical bin, expressed as number of standard deviations from the median in each bin. When UQ is True, then this should be a tuple of the lower and upper limits of this threshold, otherwise a single value should be used. Defaults to (1.0, 3.0)

  • max_power_filter (tuple) – Maximum power threshold, in the range (0, 1], to which the bin filter should be applied. When UQ is True, then this should be a tuple of the lower and upper limits of this filter, otherwise a single value should be used. Defaults to (0.8, 0.9).

  • correction_threshold (tuple) – The threshold, in the range of (0, 1], above which daily scada energy data should be corrected. When UQ is True, then this should be a tuple of the lower and upper limits of this threshold, otherwise a single value should be used. Defaults to (0.85, 0.95)

WakeLosses(wind_direction_col: str = 'WMET_HorWdDir', wind_direction_data_type: str = 'scada', wind_direction_asset_ids: list[str] | None = None, UQ: bool = True, num_sim: int = 100, start_date: str | Timestamp | None = None, end_date: str | Timestamp | None = None, reanalysis_products: list[str] | None = None, end_date_lt: str | Timestamp | None = None, wd_bin_width: float = 5.0, freestream_sector_width: float = (50.0, 110.0), freestream_power_method: str = 'mean', freestream_wind_speed_method: str = 'mean', correct_for_derating: bool = True, derating_filter_wind_speed_start: float = (4.0, 5.0), max_power_filter: float = (0.92, 0.98), wind_bin_mad_thresh: float = (4.0, 13.0), wd_bin_width_LT_corr: float = 5.0, ws_bin_width_LT_corr: float = 1.0, num_years_LT: int = (10, 20), assume_no_wakes_high_ws_LT_corr: bool = True, no_wakes_ws_thresh_LT_corr: float = 13.0) WakeLosses#

A serial implementation of a method for estimating wake losses from SCADA data. Wake losses are estimated for the entire wind plant as well as for each individual turbine for a) the period of record for which data are available, and b) the estimated long-term wind conditions the wind plant will experience based on historical reanalysis wind resource data.

The method is comprised of the following core steps:
  1. Calculate a representative wind plant-level wind direction at each time step using the mean wind direction of the specified wind turbines or meteorological (met) towers. Note that time steps for which any necessary plant-level or turbine-level data are missing are discarded.

    1. If UQ is selected, wake losses are calculated multiple times using a Monte Carlo approach with randomly chosen analysis parameters and randomly sampled, with replacement, time steps for each iteration. The remaining steps described below are performed for each Monte Carlo iteration. If UQ is not used, wake losses are calculated once using the specified analysis parameters for the full set of available time steps.

  2. Identify the set of derated, curtailed, or unavailable turbines (i.e., turbines whose power production is limited not by wake losses but by operating mode) for each time step using a power curve outlier detection method.

  3. Calculate the average wind speed and power production for the set of normally operating (i.e., not derated) freestream turbines for each time step.

    1. Freestream turbines are those without any upstream turbines located within a user-specified sector of wind directions centered on the representative plant-level wind direction.

  4. Calculate the POR wake losses for the wind plant by comparing the potential energy production (sum of the mean freestream power production at each time step multiplied by the number of turbines in the wind power plant) to the actual energy production (sum of the actual wind plant power production at each time step). This procedure is then used to estimate the wake losses for each individual wind turbine.

    1. If correct_for_derating is True, then the potential power production of the wind plant is assumed to be the actual power produced by the derated turbines plus the mean power production of the freestream turbines for all other turbines in the wind plant. Again, a similar procedure is used to estimate individual turbine wake losses.

  5. Finally, estimate the long-term corrected wake losses using the long-term historical reanalysis data. Note that the long-term correction is determined for each reanalysis product specified by the user. If UQ is used, a random reanalysis product is selected each iteration. If UQ is not selected, the long-term corrected wake losses are calculated as the average wake losses determined for all reanalysis products.

    1. Calculate the long-term occurence frequencies for a set of wind direction and wind speed bins based on the hourly reanalysis data (typically, 10-20 years).

    2. Next, using a linear regression, compare the mean freestream wind speeds calculated from the SCADA data to the wind speeds from the reanalysis data and correct to remove biases.

    3. Compute the average potential and actual wind power plant production using the representative wind plant wind directions from the SCADA or met tower data in conjunction with the corrected freestream wind speeds for each wind direction and wind speed bin.

    4. Estimate the long-term corrected wake losses by comparing the long-term corrected potential and actual energy production. These are computed by weighting the average potential and actual power production for each wind condition bin with the long-term frequencies.

    5. Repeat to estimate the long-term corrected wake losses for each individual turbine.

Parameters:
  • plant (PlantData) – A openoa.plant.PlantData object that has been validated with at least openoa.plant.PlantData.analysis_type = “WakeLosses”.

  • wind_direction_col (string, optional) – Column name to use for wind direction. Defaults to “WMET_HorWdDir”

  • wind_direction_data_type (string, optional) – Data type to use for wind directions (“scada” for turbine measurements or “tower” for meteorological tower measurements). Defaults to “scada”.

  • wind_direction_asset_ids (list, optional) – List of asset IDs (turbines or met towers) used to calculate the average wind direction at each time step. If None, all assets of the corresponding data type will be used. Defaults to None.

  • UQ (bool, optional) – Dertermines whether to perform uncertainty quantification using Monte Carlo simulation (True) or provide a single wake loss estimate (False). Defaults to True.

  • start_date (pandas.Timestamp or string, optional) – Start datetime for wake loss analysis. If None, the earliest SCADA datetime will be used. Default is None.

  • end_date (pandas.Timestamp or string, optional) – End datetime for wake loss analysis. If None, the latest SCADA datetime will be used. Default is None.

  • reanalysis_products (list, optional) – List of reanalysis products to use for long-term correction. If UQ = True, a single product will be selected form this list each Monte Carlo iteration. Defaults to [“merra2”, “era5”].

  • end_date_lt (string or pandas.Timestamp) – The last date to use for the long-term correction. If None, the most recent date common to all reanalysis products will be used.

  • wd_bin_width (float, optional) – Wind direction bin size when identifying freestream wind turbines (degrees). Defaults to 5 degrees.

  • freestream_sector_width (tuple | float, optional) – Wind direction sector size to use when identifying freestream wind turbines (degrees). If no turbines are located upstream of a particular turbine within the sector, the turbine will be classified as a freestream turbine. When UQ = True, then this should be a tuple of the lower and upper bounds for the Monte Carlo sampling, and when UQ = False this should be a single value. If None, then a default value of 90 degrees will be used if UQ = False and a default value of (50, 110) will be used if UQ = True. Defaults to None.

  • freestream_power_method (str, optional) – Method used to determine the representative power prouction of the freestream turbines (“mean”, “median”, “max”). Defaults to “mean”.

  • freestream_wind_speed_method (str, optional) – Method used to determine the representative wind speed of the freestream turbines (“mean”, “median”). Defaults to “mean”.

  • correct_for_derating (bool, optional) – Indicates whether derated, curtailed, or otherwise unavailable turbines should be flagged and excluded from the calculation of ideal freestream wind plant power production for a given time stamp. If True, ideal freestream power production will be calculated as the sum of the derated turbine powers added to the mean power of the freestream turbines in normal operation multiplied by the number of turbines operating normally in the wind plant. Defaults to True.

  • derating_filter_wind_speed_start (tuple | float, optional) – The wind speed above which turbines will be flagged as derated/curtailed/shutdown if power is less than 1% of rated power (m/s). Only used when correct_for_derating is True. This should be a tuple when UQ = True (values are Monte-Carlo sampled within the specified range) or a single value when UQ = False. If undefined (None), a value of 4.5 m/s will be used if UQ = False and values of (4.0, 5.0) will be used if UQ = True. Defaults to None.

  • max_power_filter (tuple | float, optional) – Maximum power threshold, defined as a fraction of rated power, to which the power curve bin filter should be applied. Only used when correct_for_derating = True. This should be a tuple when UQ = True (values are Monte-Carlo sampled within the specified range) or a single value when UQ = False. If undefined (None), a value of 0.95 will be used if UQ = False and values of (0.92, 0.98) will be used if UQ = True. Defaults to None.

  • wind_bin_mad_thresh (tuple | float, optional) – The filter threshold for each power bin used to identify derated/curtailed/shutdown turbines, expressed as the number of median absolute deviations above the median wind speed. Only used when correct_for_derating is True. This should be a tuple when UQ = True (values are Monte-Carlo sampled within the specified range) or a single value when UQ = False. If undefined (None), a value of 7.0 will be used if UQ = False and values of (4.0, 13.0) will be used if UQ = True. Defaults to None.

  • wd_bin_width_LT_corr (float, optional) – Size of wind direction bins used to calculate long-term frequencies from historical reanalysis data and correct wake losses during the period of record (degrees). Defaults to 5 degrees.

  • ws_bin_width_LT_corr (float, optional) – Size of wind speed bins used to calculate long-term frequencies from historical reanalysis data and correct wake losses during the period of record (m/s). Defaults to 1 m/s.

  • num_years_LT (tuple | int, optional) – Number of years of historical reanalysis data to use for long-term correction. This should be a tuple when UQ = True (values are Monte-Carlo sampled within the specified range) or a single value when UQ = False. If undefined (None), a value of 20 will be used if UQ = False and values of (10, 20) will be used if UQ = True. Defaults to None.

  • assume_no_wakes_high_ws_LT_corr (bool, optional) – If True, wind direction and wind speed bins for which operational data are missing above a certain wind speed threshold are corrected by assigning the wind turbines’ rated power to both the actual and potential power production variables during the long term-correction process. This assumes there are no wake losses above the wind speed threshold. Defaults to True.

  • no_wakes_ws_thresh_LT_corr (float, optional) – The wind speed threshold (inclusive) above which rated power is assigned to both the actual and potential power production variables if operational data are missing for any wind direction and wind speed bin during the long term-correction process. This wind speed corresponds to the wind speed measured at freestream wind turbines. Only used if assume_no_wakes_high_ws_LT_corr = True. Defaults to 13 m/s.

  • min_ws_bin_lin_reg (float, optional) – The minimum wind speed bin to consider when finding linear regression from SCADA freestream wind speeds to reanalysis wind speeds. Defaults to 3.0

  • bin_count_thresh_lin_reg (int, optional) – The minimum number of samples required in a wind speed bin to include when finding linear regression from SCADA freestream wind speeds to reanalysis wind speeds. Defaults to 50.

attach_eia_data(api_key: str, plant_id: str, file_path: str | Path, plant_file: str | Path, plant_sheet: str | Path, wind_file: str | Path, wind_sheet: str | Path)#

Assign EIA meta data to PlantData object, which is by default an empty dictionary.

Parameters:
  • project (PlantData) – PlantData object for a particular project

  • api_key (str) – 32-character user-specific API key, obtained from EIA.

  • plant_id (str) – 5-character EIA power plant code.

  • file_path (str) – Directory with EIA metadata .xlsx files.

  • plant_file (str | Path) – Name of the plant metadata Excel file in file_path. Formerly hard-coded to: “2___Plant_Y2017.xlsx”.

  • plant_sheet (str) – The name of the sheet containing the data in plant_file.

  • wind_file (str | Path) – Name of the wind metadata Excel file in file_path. Formerly hard-coded to: “”3_2_Wind_Y2017.xlsx”.

  • wind_sheet (str) – The name of the sheet containing the data in plant_file.

Returns:

(None)

PlantMetaData API#

Without the metadata schema provided in each data category’s metadata class, and compiled through PlantMetaData the data standardization provided by PlantData would not be possible.

class openoa.schema.PlantMetaData(latitude=0, longitude=0, reference_system: str = 'epsg:4326', reference_longitude: float | None = None, utm_zone: int | None = None, capacity=0, scada: dict = {}, meter: dict = {}, tower: dict = {}, status: dict = {}, curtail: dict = {}, asset: dict = {}, reanalysis: dict[str, dict] = {'product': {}})[source]#

Bases: FromDictMixin

Composese the metadata/validation requirements from each of the individual data types that can compose a PlantData object.

Parameters:
  • latitude (float) – The wind power plant’s center point latitude.

  • longitude (float) – The wind power plant’s center point longitude.

  • reference_system (str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.

  • utm_zone (int, optional) – UTM zone. If set to None (default), then calculated from the longitude.

  • reference_longitude (float, optional) – Reference longitude for calculating the UTM zone. If None (default), then taken as the average longitude of all assets when the geometry is parsed.

  • capacity (float) – The capacity of the plant in MW

  • scada (SCADAMetaData) – A dictionary containing the SCADAMetaData column mapping and frequency parameters. See SCADAMetaData for more details.

  • meter (MeterMetaData) – A dictionary containing the MeterMetaData column mapping and frequency parameters. See MeterMetaData for more details.

  • tower (TowerMetaData) – A dictionary containing the TowerMetaData column mapping and frequency parameters. See TowerMetaData for more details.

  • status (StatusMetaData) – A dictionary containing the StatusMetaData column mapping parameters. See StatusMetaData for more details.

  • curtail (CurtailMetaData) – A dictionary containing the CurtailMetaData column mapping and frequency parameters. See CurtailMetaData for more details.

  • asset (AssetMetaData) – A dictionary containing the AssetMetaData column mapping parameters. See AssetMetaData for more details.

  • reanalysis (dict[str, ReanalysisMetaData]) – A dictionary containing the reanalysis type (as keys, such as “era5” or “merra2”) and ReanalysisMetaData column mapping and frequency parameters for each type of reanalysis data provided. See ReanalysisMetaData for more details.

Method generated by attrs for class PlantMetaData.

property column_map: dict[str, dict]#

Provides the column mapping for all of the available data types with the name of each data type as the key and the dictionary mapping as the values.

property dtype_map: dict[str, dict]#

Provides the column dtype matching for all of the available data types with the name of each data type as the keys, and the column dtype mapping as values.

property coordinates: tuple[float, float]#

Returns the latitude, longitude pair for the wind power plant.

Returns:

The (latitude, longitude) pair

Return type:

tuple[float, float]

classmethod from_json(metadata_file: str | Path) PlantMetaData[source]#

Loads the metadata from a JSON file.

Parameters:

metadata_file (str | Path) – The full path and file name of the JSON file.

Raises:

FileExistsError – Raised if the file doesn’t exist at the provided location.

Returns:

PlantMetaData

classmethod from_yaml(metadata_file: str | Path) PlantMetaData[source]#

Loads the metadata from a YAML file with a PyYAML encoding.

Parameters:

metadata_file (str | Path) – The full path and file name of the YAML file.

Raises:

FileExistsError – Raised if the file doesn’t exist at the provided location.

Returns:

PlantMetaData

classmethod load(data: str | Path | dict | PlantMetaData) PlantMetaData[source]#

Loads the metadata from either a dictionary or file such as a JSON or YAML file.

Parameters:

metadata_file (str | Path | dict) – Either a pre-loaded dictionary or the full path and file name of the JSON or YAML file.

Raises:
  • ValueError – Raised if the file name doesn’t reflect a JSON or YAML encoding.

  • ValueError – Raised if the data provided isn’t of the correct data type.

Returns:

PlantMetaData

frequency_requirements(analysis_types: list[str | None]) dict[str, set[str]][source]#

Creates a frequency requirements dictionary for each data type with the name as the key and a set of valid frequency fields as the values.

Parameters:

analysis_types (list[str | None]) – The analyses the data is intended to be used for, which will determine what data need to be checked.

Returns:

The dictionary of data type name and valid frequencies

for the datetime stamps.

Return type:

dict[str, set[str]]

class openoa.schema.SCADAMetaData(time: str = 'time', asset_id: str = 'asset_id', WTUR_W: str = 'WTUR_W', WMET_HorWdSpd: str = 'WMET_HorWdSpd', WMET_HorWdDir: str = 'WMET_HorWdDir', WMET_HorWdDirRel: str = 'WMET_HorWdDirRel', WTUR_TurSt: str = 'WTUR_TurSt', WROT_BlPthAngVal: str = 'WROT_BlPthAngVal', WMET_EnvTmp: str = 'WMET_EnvTmp', frequency: str = '10min')[source]#

Bases: FromDictMixin

A metadata schematic to create the necessary column mappings and other validation components, or other data about the SCADA data, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • time (str) – The datetime stamp for the SCADA data, by default “time”. This data should be of type: np.datetime64[ns], or able to be converted to a pandas DatetimeIndex. Additional columns describing the datetime stamps are: frequency

  • asset_id (str) – The turbine identifier column in the SCADA data, by default “asset_id”. This data should be of type: str.

  • WTUR_W (str) – The power produced, in kW, column in the SCADA data, by default “WTUR_W”. This data should be of type: float.

  • WMET_HorWdSpd (str) – The measured windspeed, in m/s, column in the SCADA data, by default “WMET_HorWdSpd”. This data should be of type: float.

  • WMET_HorWdDir (str) – The measured wind direction, in degrees, column in the SCADA data, by default “WMET_HorWdDir”. This data should be of type: float.

  • WMET_HorWdDirRel (str) – The measured wind direction relative to the nacelle orientation (i.e., the wind vane measurement), in degrees, column in the SCADA data, by default “WMET_HorWdDirRel”. This data should be of type: float.

  • WTUR_TurSt (str) – The status code column in the SCADA data, by default “WTUR_TurSt”. This data should be of type: str.

  • WROT_BlPthAngVal (str) – The pitch, in degrees, column in the SCADA data, by default “WROT_BlPthAngVal”. This data should be of type: float.

  • WMET_EnvTmp (str) – The temperature column in the SCADA data, by default “WMET_EnvTmp”. This data should be of type: float.

  • frequency (str) – The frequency of time in the SCADA data, by default “10min”. The input should align with the Pandas frequency offset aliases.

Method generated by attrs for class SCADAMetaData.

class openoa.schema.MeterMetaData(time: str = 'time', MMTR_SupWh: str = 'MMTR_SupWh', frequency: str = '10min')[source]#

Bases: FromDictMixin

A metadata schematic to create the necessary column mappings and other validation components, or other data about energy meter data, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • time (str) – The datetime stamp for the meter data, by default “time”. This data should be of type: np.datetime64[ns], or able to be converted to a pandas DatetimeIndex. Additional columns describing the datetime stamps are: frequency

  • MMTR_SupWh (str) – The energy produced, in kWh, column in the meter data, by default “MMTR_SupWh”. This data should be of type: float.

  • frequency (str) – The frequency of time in the meter data, by default “10min”. The input should align with the Pandas frequency offset aliases.

Method generated by attrs for class MeterMetaData.

class openoa.schema.TowerMetaData(time: str = 'time', asset_id: str = 'asset_id', WMET_HorWdSpd: str = 'WMET_HorWdSpd', WMET_HorWdDir: str = 'WMET_HorWdDir', WMET_EnvTmp: str = 'WMET_EnvTmp', frequency: str = '10min')[source]#

Bases: FromDictMixin

A metadata schematic to create the necessary column mappings and other validation components, or other data about meteorological tower (met tower) data, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • time (str) – The datetime stamp for the met tower data, by default “time”. This data should be of type: np.datetime64[ns], or able to be converted to a pandas DatetimeIndex. Additional columns describing the datetime stamps are: frequency

  • asset_id (str) – The met tower identifier column in the met tower data, by default “asset_id”. This data should be of type: str.

  • WMET_HorWdSpd (str) – The measured windspeed, in m/s, column in the SCADA data, by default “WMET_HorWdSpd”. This data should be of type: float.

  • WMET_HorWdDir (str) – The measured wind direction, in degrees, column in the SCADA data, by default “WMET_HorWdDir”. This data should be of type: float.

  • WMET_EnvTmp (str) – The temperature column in the SCADA data, by default “WMET_EnvTmp”. This data should be of type: float.

  • frequency (str) – The frequency of time in the met tower data, by default “10min”. The input should align with the Pandas frequency offset aliases.

Method generated by attrs for class TowerMetaData.

class openoa.schema.CurtailMetaData(time: str = 'time', IAVL_ExtPwrDnWh: str = 'IAVL_ExtPwrDnWh', IAVL_DnWh: str = 'IAVL_DnWh', frequency: str = '10min')[source]#

Bases: FromDictMixin

A metadata schematic to create the necessary column mappings and other validation components, or other data about the plant curtailment data, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • time (str) – The datetime stamp for the curtailment data, by default “time”. This data should be of type: np.datetime64[ns], or able to be converted to a pandas DatetimeIndex. Additional columns describing the datetime stamps are: frequency

  • IAVL_ExtPwrDnWh (str) – The curtailment, in kWh, column in the curtailment data, by default “IAVL_ExtPwrDnWh”. This data should be of type: float.

  • IAVL_DnWh (str) – The availability, in kWh, column in the curtailment data, by default “IAVL_DnWh”. This data should be of type: float.

  • frequency (str) – The frequency of time in the met tower data, by default “10min”. The input should align with the Pandas frequency offset aliases.

Method generated by attrs for class CurtailMetaData.

class openoa.schema.StatusMetaData(time: str = 'time', asset_id: str = 'asset_id', status_id: str = 'status_id', status_code: str = 'status_code', status_text: str = 'status_text', frequency: str = '10min')[source]#

Bases: FromDictMixin

A metadata schematic to create the necessary column mappings and other validation components, or other data about the turbine status log data, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • time (str) – The datetime stamp for the status data, by default “time”. This data should be of type: np.datetime64[ns], or able to be converted to a pandas DatetimeIndex. Additional columns describing the datetime stamps are: frequency

  • asset_id (str) – The turbine identifier column in the status data, by default “asset_id”. This data should be of type: str.

  • status_id (str) – The status code identifier column in the status data, by default “asset_id”. This data should be of type: str.

  • status_code (str) – The status code column in the status data, by default “asset_id”. This data should be of type: str.

  • status_text (str) – The status text description column in the status data, by default “asset_id”. This data should be of type: str.

  • frequency (str) – The frequency of time in the met tower data, by default “10min”. The input should align with the Pandas frequency offset aliases.

Method generated by attrs for class StatusMetaData.

class openoa.schema.AssetMetaData(asset_id: str = 'asset_id', latitude: str = 'latitude', longitude: str = 'longitude', rated_power: str = 'rated_power', hub_height: str = 'hub_height', rotor_diameter: str = 'rotor_diameter', elevation: str = 'elevation', type: str = 'type')[source]#

Bases: FromDictMixin

A metadata schematic to create the necessary column mappings and other validation components, or other data about the site’s asset metadata, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • asset_id (str) – The asset identifier column in the asset metadata, by default “asset_id” This data should be of type: str.

  • latitude (str) – The asset’s latitudinal position, in WGS84, column in the asset metadata, by default “latitude”. This data should be of type: float.

  • longitude (str) – The asset’s longitudinal position, in WGS84, column in the asset metadata, by default “longitude”. This data should be of type: float.

  • rated_power (str) – The asset’s rated power, in kW, column in the asset metadata, by default “rated_power”. This data should be of type: float.

  • hub_height (str) – The asset’s hub height, in m, column in the asset metadata, by default “hub_height”. This data should be of type: float.

  • elevation (str) – The asset’s elevation above sea level, in m, column in the asset metadata, by default “elevation”. This data should be of type: float.

  • type (str) – The type of asset column in the asset metadata, by default “type”. This data should be of type: str.

Method generated by attrs for class AssetMetaData.

class openoa.schema.ReanalysisMetaData(time: str = 'time', WMETR_HorWdSpd: str = 'WMETR_HorWdSpd', WMETR_HorWdSpdU: str = 'WMETR_HorWdSpdU', WMETR_HorWdSpdV: str = 'WMETR_HorWdSpdV', WMETR_HorWdDir: str = 'WMETR_HorWdDir', WMETR_EnvTmp: str = 'WMETR_EnvTmp', WMETR_AirDen: str = 'WMETR_AirDen', WMETR_EnvPres: str = 'surface_pressure', frequency: str = '10min')[source]#

Bases: FromDictMixin

A metadata schematic for each of the reanalsis products to be used for operationa analyses to create the necessary column mappings and other validation components, or other data about the site’s asset metadata, that will contribute to a larger plant metadata schema/routine.

Parameters:
  • time (str) – The datetime stamp for the curtailment data, by default “time”. This data should be of type: np.datetime64[ns], or able to be converted to a pandas DatetimeIndex. Additional columns describing the datetime stamps are: frequency

  • WMETR_HorWdSpd (str) – The reanalysis non-directional windspeed data column name, in m/s, by default “WMETR_HorWdSpd”.

  • WMETR_HorWdSpdU (str) – The reanalysis u-direction windspeed data column name, in m/s, by default “WMETR_HorWdSpdU”.

  • WMETR_HorWdSpdV (str) – The reanalysis v-directional windspeed data column name, in m/s, by default “WMETR_HorWdSpdV”.

  • WMETR_HorWdDir (str) – The reanalysis windspeed horizontal direction data column name, in degrees, by default “WMETR_HorWdDir”.

  • WMETR_EnvTmp (str) – The temperature data column name in the renalysis data, in degrees Kelvin, by default “WMETR_EnvTmp”.

  • WMETR_AirDen (str) – The air density reanalysis data column name, in kg/m^3, by default “WMETR_AirDen”.

  • WMETR_EnvPres (str) – The surface air pressure reanalysis data column name, in Pa, by default “WMETR_EnvPres”.

  • frequency (str) – The frequency of the timestamps in the time column, by default “10min”.

Method generated by attrs for class ReanalysisMetaData.