Utils#

The utils subpackage provides module-level methods that operate on Pandas DataFrame`s and `Series. These modules and their methods are used throughout the OpenOA codebase, and can be imported and used individually into your own scripts.

Quality Assurance#

Provides the Quality Assurance (QA) methods for SCADA data checking.

openoa.utils.qa.determine_offset_dst(df: pd.DataFrame, local_tz: str) → pd.DataFrames[source]#

Creates a column of “utc_offset” and “is_dst”.

Parameters:

df (pd.DataFrame) – The dataframe object to manipulate with a tz-aware pandas.DatetimeIndex.
local_tz( – obj: ‘String’): The pytz-compatible timezone for the input time_field, by default UTC. This should be in the format of “Country/City” or “Region/City” such as “America/Denver” or “Europe/Paris”.

Returns:

The updated dataframe with “utc_offset” and “is_dst” columns created.

Return type:

(pd.DataFrame)

openoa.utils.qa.convert_datetime_column(df: DataFrame, time_col: str, local_tz: str, tz_aware: bool) → DataFrame[source]#

Converts the passed timestamp data to a pandas-encoded Datetime, and creates a corresponding localized and UTC timestamp using the time_field column name with either “localized” or “utc”, respectively. The _df object then uses the local timezone timestamp for its index.

Parameters:

df( – obj: pd.DataFrame): The SCADA pd.DataFrame
time_col( – obj: string): The string name of datetime stamp column in df.
local_tz( – obj: ‘string’): The pytz-compatible timezone for the input time_field, by default UTC. This should be in the format of “Country/City” or “Region/City” such as “America/Denver” or “Europe/Paris”.
tz_aware( – obj: bool): Indicator for if the provided data in time_col has the timezone information embedded (True), or not (False).

Returns:

obj: pd.DataFrame): The updated pd.DataFrame with an index of pd.DatetimeIndex with: UTC time-encoding, and the following new columns: - time_col`_utc: A UTC-converted timestamp column - :py:attr:`time_col`_localized: The fully converted and localized timestamp column - utc_offset: The difference, in hours between the localized and UTC time - is_dst: Indicator for whether of not the timestamp is considered to be DST (``True`) or not (False)

Return type:

(

openoa.utils.qa.duplicate_time_identification(df: DataFrame, time_col: str, id_col: str) → tuple[Series, None | Series, None | Series][source]#

Identifies the time duplications on the modified SCADA data frame to highlight the duplications from the original time data (time_col), the UTC timestamps, and the localized timestamps, if the latter are available.

Parameters:

df ( – obj: pd.DataFrame): The resulting SCADA dataframe from convert_datetime_column(), otherwise the UTC and localized column checks will return None.
time_col ( – obj: str): The string name of the timestamp column.
id_col ( – obj: str): The string name of the turbine asset_id column, to ensure that duplicates aren’t based off multiple turbine’s data.

Returns:

The dataframe subsets with duplicate: timestamps based on the original timestamp column, the localized timestamp column (None if the column does not exist), and the UTC-converted timestamp column (None if the column does not exist).

Return type:

(tuple[pd.Series, None | pd.Series, None | pd.Series])

openoa.utils.qa.gap_time_identification(df: DataFrame, time_col: str, freq: str) → tuple[Series, None | Series, None | Series][source]#

Identifies the time gaps on the modified SCADA data frame to highlight the missing timestamps from the original time data (time_col), the UTC timestamps, and the localized timestamps, if the latter are available.

Parameters:

df ( – obj: pd.DataFrame): The resulting SCADA dataframe from convert_datetime_column(), otherwise the UTC and localized column checks will return 1.
time_col ( – obj: str): The string name of the timestamp column.
freq ( – obj: str): The expected frequency of the timestamps, which should align with the pandas timestamp conventions (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).

Returns:

The dataframe subsets with duplicate: timestamps based on the original timestamp column, the localized timestamp column (None if the column does not exist), and the UTC-converted timestamp column (None if the column does not exist).

Return type:

(tuple[pd.Series, None | pd.Series, None | pd.Series])

openoa.utils.qa.describe(df: DataFrame, **kwargs) → DataFrame[source]#

Thin wrapper for pd.DataFrame.describe(), but transposes the results to be easier to read.

Parameters:

df ( – obj: pd.DataFrame): The resulting SCADA dataframe from convert_datetime_column(), otherwise the UTC and localized column checks will return None.
kwargs ( – obj: dict): Dictionary of additional arguments to pass to df.describe().

Returns:

The results of df.describe().T.

Return type:

pd.DataFrame

openoa.utils.qa.daylight_savings_plot(df: DataFrame, local_tz: str, id_col: str, time_col: str, power_col: str, freq: str, hour_window: int = 3)[source]#

Produce a timeseries plot showing daylight savings events for each year of the SCADA data frame, highlighting potential duplications and gaps with the original timestamps compared against the UTC-converted timestamps.

Parameters:

df ( – obj: pd.DataFrame): The resulting SCADA dataframe from convert_datetime_column().
local_tz( – obj: ‘String’): The pytz-compatible timezone for the input time_field, by default UTC. This should be in the format of “Country/City” or “Region/City” such as “America/Denver” or “Europe/Paris”.
id_col ( – obj: str): The string name of the turbine asset_id column in df, to ensure that duplicates aren’t based off multiple turbine’s data.
time_col ( – obj: str): The string name of the timestamp column in df.
power_col( – obj: ‘str’): String name of the power column in df.
freq ( – obj: str): The expected frequency of the timestamps, which should align with the pandas timestamp conventions (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).
hour_window( – obj: ‘int’): number of hours, before and after the Daylight Savings Time transitions to view in the plot, by default 3.

openoa.utils.qa.wtk_coordinate_indices(fn: h5pyd.File, latitude: float, longitude: float) → tuple[float, float][source]#

Finds the nearest x/y coordinates for a given latitude and longitude using the Proj4 library to find the nearest valid point in the Wind Toolkit coordinates database, and converts it to an (x, y) pair.

… note:: This relies on the Wind Toolkit HSDS API and h5pyd must be installed.

Parameters:

fn ( – obj: h5pyd.File): The h5pyd file to be used for coordinate extraction.
latitude ( – obj: float): The latitude of the wind power plant’s center.
longitude ( – obj: float): The longitude of the wind power plant’s center.

Returns:

The nearest valid x and y coordinates to the provided latitude and: longitude.

Return type:

tuple[float, float]

openoa.utils.qa.wtk_diurnal_prep(latitude: float, longitude: float, fn: str = '/nrel/wtk-us.h5', start_date: str = '2007-01-01', end_date: str = '2013-12-31') → Series[source]#

Links to the WIND Toolkit (WTK) data on AWS as a data source to capture the wind speed data and calculate the diurnal hourly averages.

Parameters:

latitude ( – obj: float): The latitude of the wind power plant’s center.
longitude ( – obj: float): The longitude of the wind power plant’s center.
fn ( – obj: str, optional): The path and name of the WTK API file. Defaults to “/nrel/wtk-us.h5”.
start_date ( – obj: str, optional): Starting date for the WTK data. Defaults to “2007-01-01”.
end_date ( – obj: str, optional): Ending date for the WTK data. Defaults to “2013-12-31”.

Raises:

IndexError – Raised if the latitude and longitude are not found within the WTK data set.

Returns:

The diurnal hourly average wind speed.

Return type:

pd.Series

openoa.utils.qa.wtk_diurnal_plot(wtk_df: DataFrame | None, scada_df: DataFrame, time_col: str, power_col: str, *, latitude: float = 0, longitude: float = 0, fn: str = '/nrel/wtk-us.h5', start_date: str = '2007-01-01', end_date: str = '2013-12-31', return_fig: bool = False) → None[source]#

Plots the WTK diurnal wind profile alongside the hourly power averages from the scada_df

Parameters:

wtk_df ( – obj: pd.DataFrame | None): The WTK diurnal profile data produced in wtk_diurnal_prep. If None, then this method will be run internally as the following keyword arguments are provided: latitude, longitude, fn, start_date, and end_date.
scada_df ( – obj: pd.DataFrame | None): The SCADA data that was produced in convert_datetime_column().
time_col ( – obj: str): The name of the time column in scada_df.
power_col ( – obj: str): The name of the power column in scada_df
latitude ( – obj: float): The latitude of the wind power plant’s center.
longitude ( – obj: float): The longitude of the wind power plant’s center.
fn ( – obj: str, optional): WTK API file path and location. Defaults to “/nrel/wtk-us.h5”.
start_date ( – obj: str | None, optional): Starting date for the WTK data. If None, then it uses the starting date of scada_df. Defaults to None.
end_date ( – obj: str | None, optional): Ending date for the WTK data. If None, then it uses the ending date of scada_df. Defaults to None.
return_fig (String) – Indicator for if the figure and axes objects should be returned, by default False.

Filters#

This module provides functions for flagging pandas data series based on a range of criteria. The functions are largely intended for application in wind plant operational energy analysis, particularly wind speed vs. power curves.

Flag data for which the specified data is outside the provided range of [lower, upper].

Parameters:

data (pandas.Series | pandas.DataFrame) – data frame containing the column to be flagged; can either be a pandas.Series or pandas.DataFrame. If a pandas.DataFrame, a list of threshold values and columns (if checking a subset of the columns) must be provided.
col (list[str]) – column(s) in :pyattr:`data` to be flagged, by default None. Only required when the data is a pandas.DataFrame and a subset of the columns will be checked. Must be the same length as lower and upper.
lower (float | list[float]) – lower threshold (inclusive) for each element of data, if it’s a pd.Series, or the list of lower thresholds for each column in col. If the same threshold is applied to each column, then pass the single value, otherwise, it must be the same length as col and upper.
upper (float | list[float]) – upper threshold (inclusive) for each element of data, if it’s a pd.Series, or the list of upper thresholds for each column in col. If the same threshold is applied to each column, then pass the single value, otherwise, it must be the same length as lower and col.

Returns:

Series or DataFrame (depending on data type) with: boolean entries.

Return type:

pandas.Series | pandas.DataFrame

openoa.utils.filters.unresponsive_flag(data: DataFrame | Series, threshold: int = 3, col: list[str] | None = None) → Series | DataFrame[source]#

Flag time stamps for which the reported data does not change for threshold repeated intervals.

Parameters:

data (pandas.Series | pandas.DataFrame) – data frame containing the column to be flagged; can either be a pandas.Series or pandas.DataFrame. If a pandas.DataFrame, a list of threshold values and columns (if checking a subset of the columns) must be provided.
col (list[str]) – column(s) in data to be flagged, by default None. Only required when the data is a pandas.DataFrame and a subset of the columns will be checked. Must be the same length as lower and upper.
threshold (int) – number of intervals over which measurment does not change for each element of data, regardless if it’s a pd.Series or pd.DataFrame. Defaults to 3.

Returns:

Series or DataFrame (depending on data type) with: boolean entries.

Return type:

pandas.Series | pandas.DataFrame

openoa.utils.filters.std_range_flag(data: DataFrame | Series, threshold: float | list[float] = 2.0, col: list[str] | None = None) → Series | DataFrame[source]#

Flag time stamps for which the measurement is outside of the threshold number of standard deviations: from the mean across the data.

… note:: This method does not distinguish between asset IDs.

Parameters:

data (pandas.Series | pandas.DataFrame) – data frame containing the column to be flagged; can either be a pandas.Series or pandas.DataFrame. If a pandas.DataFrame, a list of threshold values and columns (if checking a subset of the columns) must be provided.
col (list[str]) – column(s) in data to be flagged, by default None. Only required when the data is a pandas.DataFrame and a subset of the columns will be checked. Must be the same length as lower and upper.
threshold (float | list[float]) – multiplicative factor on the standard deviation of data, if it’s a pd.Series, or the list of multiplicative factors on the standard deviation for each column in col. If the same factor is applied to each column, then pass the single value, otherwise, it must be the same length as col and upper.

Returns:

Series or DataFrame (depending on data type) with: boolean entries.

Return type:

pandas.Series | pandas.DataFrame

openoa.utils.filters.window_range_flag(window_col: str | Series = None, window_start: float = -inf, window_end: float = inf, value_col: str | Series = None, value_min: float = -inf, value_max: float = inf, data: DataFrame = None) → Series[source]#

Flag time stamps for which measurement in window_col are within the range: [window_start, window_end], and the measurements in value_col are outside of the range [value_min, value_max].

Parameters:

data (pandas.DataFrame) – data frame containing the columns window_col and value_col, by default None.
window_col (str | pandas.Series) – Name of the column or used to define the window range or the data as a pandas Series, by default None.
window_start (float) – minimum value for the inclusive window, by default -np.inf.
window_end (float) – maximum value for the inclusive window, by default np.inf.
value_col (str | pandas.Series) – Name of the column used to define the value range or the data as a pandas Series, by default None.
value_max (float) – upper threshold for the inclusive data range; default np.inf
value_min (float) – lower threshold for the inclusive data range; default -np.inf

Returns:

Series with boolean entries.

Return type:

pandas.Series

openoa.utils.filters.bin_filter(bin_col: Series | str, value_col: Series | str, bin_width: float, threshold: float = 2, center_type: str = 'mean', bin_min: float = None, bin_max: float = None, threshold_type: str = 'std', direction: str = 'all', data: DataFrame = None)[source]#

Flag time stamps for which data in value_col when binned by data in bin_col into bins of width bin_width are outside the threhsold bin. The center_type of each bin can be either the median or mean, and flagging can be applied directionally (i.e. above or below the center, or both)

Parameters:

bin_col (pandas.Series | str) – The Series or column in data to be used for binning.
value_col (pandas.Series) – The Series or column in data to be flagged.
bin_width (float) – Width of bin in units of bin_col
threshold (float) – Outlier threshold (multiplicative factor of std of value_col in bin)
bin_min (float) – Minimum bin value below which flag should not be applied
bin_max (float) – Maximum bin value above which flag should not be applied
threshold_type (str) – Option to apply a ‘std’, ‘scalar’, or ‘mad’ (median absolute deviation) based threshold
center_type (str) – Option to use a ‘mean’ or ‘median’ center for each bin
direction (str) – Option to apply flag only to data ‘above’ or ‘below’ the mean, by default ‘all’
data (pd.DataFrame) – DataFrame containing both bin_col and value_col, if data are part of the same DataFrame, by default None.

Returns:

Array-like object with boolean entries.

Return type:

pandas.Series(bool)

openoa.utils.filters.cluster_mahalanobis_2d(data_col1: Series | str, data_col2: Series | str, n_clusters: int = 13, dist_thresh: float = 3.0, data: DataFrame = None) → Series[source]#

K-means clustering of data into n_cluster clusters; Mahalanobis distance evaluated for each cluster and points with distances outside of dist_thresh are flagged; distinguishes between asset IDs.

Parameters:

data_col1 (pandas.Series | str) – Series or column data corresponding to the first data column in a 2D cluster analysis
data_col2 (pandas.Series | str) – Series or column data corresponding to the second data column in a 2D cluster analysis
n_clusters (int) – ‘ number of clusters to use
dist_thresh (float) – maximum Mahalanobis distance within each cluster for data to be remain unflagged
data (pd.DataFrame) – DataFrame containing both data_col1 and data_col2, if data are part of the same DataFrame, by default None.

Returns:

Array-like object with boolean entries.

Return type:

pandas.Series(bool)

Power Curve#

This module provides methods to fit power curve models and use them to make predictions about ‘ideal’ power generation.

This module holds ready-to-use power curve functions. They take windspeed and power columns as arguments and return a python function which can be used to evaluate the power curve at arbitrary locations.

openoa.utils.power_curve.functions.IEC(windspeed_col: str | Series, power_col: str | Series, bin_width: float = 0.5, windspeed_start: float = 0, windspeed_end: float = 30.0, data: DataFrame = None) → Callable[source]#

Use IEC 61400-12-1-2 method for creating a binned wind-speed power curve. Power is set to zero for values outside the cutoff range: [windspeed_start, windspeed_end].

Parameters:

windspeed_col (str | pandas.Series) – Windspeed data, or the name of the column in data.
power_col (str | pandas.Series) – Power data, or the name of the column in data.
bin_width (float) – Width of windspeed bin. Defaults to 0.5 m/s, per the standard.
windspeed_start (float) – Left edge of first windspeed bin. Defaults to 0.0.
windspeed_end (float) – Right edge of last windspeed bin. Defaults to 30.0
data (pandas.DataFrame, optional) – a pandas DataFrame containing windspeed_col and power_col. Defaults to None.

Returns:

Python function of type (Array[float] -> Array[float]) implementing the power curve.

Return type:

Callable

openoa.utils.power_curve.functions.logistic_5_parametric(windspeed_col: str | Series, power_col: str | Series, data: DataFrame = None) → Callable[source]#

In this case, the function fits the 5 parameter logistics function to observed data via a least-squares optimization (i.e. minimizing the sum of the squares of the residual between the points as evaluated by the parameterized function and the points of observed data).

Extra: The present implementation follows the filtering method reported in:

M. Yesilbudaku Partitional clustering-based outlier detection for power curve optimization of wind turbines 2016 IEEE International Conference on Renewable Energy Research and Applications (ICRERA), Birmingham, 2016, pp. 1080-1084.

and the power curve method developed and reviewed in:

M Lydia, AI Selvakumar, SS Kumar, GEP. Kumar Advanced algorithms for wind turbine power curve modeling IEEE Trans Sustainable Energy, 4 (2013), pp. 827-835

M. Lydia, S.S. Kumar, I. Selvakumar, G.E. Prem Kumar A comprehensive review on wind turbine power curve modeling techniques Renew. Sust. Energy Rev., 30 (2014), pp. 452-460

Parameters:

windspeed_col (str | pandas.Series) – Windspeed data, or the name of the column in data.
power_col (str | pandas.Series) – Power data, or the name of the column in data.
data (pandas.DataFrame, optional) – a pandas DataFrame containing windspeed_col and power_col. Defaults to None.

Returns:

Python function of type (Array[float] -> Array[float]) implementing the power curve.

Return type:

function

openoa.utils.power_curve.functions.gam(windspeed_col: str | Series, power_col: str | Series, n_splines: int = 20, data: DataFrame = None) → Callable[source]#

Use the generalized additive model, pygam.LinearGAM to fit power to wind speed.

Parameters:

windspeed_col (str | pandas.Series) – Windspeed data, or the name of the column in data.
power_col (str | pandas.Series) – Power data, or the name of the column in data.
n_splines (int) – Number of splines to use in the fit. Defaults to 20.
data (pandas.DataFrame, optional) – a pandas DataFrame containing windspeed_col and power_col. Defaults to None.

Returns:

Python function of type (Array[float] -> Array[float]) implementing the power curve.

Return type:

Callable

openoa.utils.power_curve.functions.gam_3param(windspeed_col: str | Series, wind_direction_col: str | Series, air_density_col: str | Series, power_col: str | Series, n_splines: int = 20, data: DataFrame = None) → Callable[source]#

Use a generalized additive model to fit power to wind speed, wind direction and air density.

Parameters:

windspeed_col (str | pandas.Series) – Windspeed data, or the name of the column in data.
wind_direction_col (str | pandas.Series) – Wind direction data, or the name of the column in data.
air_density_col (str | pandas.Series) – Air density data, or the name of the column in data.
power_col (str | pandas.Series) – Power data, or the name of the column in data.
n_splines (int) – Number of splines to use in the fit. Defaults to 20.
data (pandas.DataFrame, optional) – a pandas DataFrame containing windspeed_col, wind_direction_col, air_density_col, and power_col. Defaults to None.

Returns:

Python function of type (Array[float] -> Array[float]) implementing the power curve.

Return type:

Callable

Imputing#

This module provides methods for filling in null data with interpolated (imputed) values.

openoa.utils.imputing.asset_correlation_matrix(data: DataFrame, value_col: str) → DataFrame[source]#

Create a correlation matrix on a MultiIndex DataFrame with time (or a different alignment value) and asset_id values as its indices, respectively.

Parameters:

data (pandas.DataFrame) – input data frame such as PlantData.scada that uses a MultiIndex with a timestamp and asset_id column for indices, in that order.
value_col (str) – the column containing the data values to be used when assessing correlation

Returns:

Correlation matrix with <id_col> as index and column names

Return type:

pandas.DataFrame

openoa.utils.imputing.impute_data(target_col: str, reference_col: str, target_data: DataFrame | None = None, reference_data: DataFrame | None = None, align_col: str | None = None, method: str = 'linear', degree: int = 1, data: DataFrame | None = None) → Series[source]#

Replaces NaN data in a target Pandas series with imputed data from a reference Panda series based on a linear regression relationship.

Steps include:

Merge the target and reference data frames on <align_col>, which is shared between the two
Determine the linear regression relationship between the target and reference data series
Apply that relationship to NaN data in the target series for which there is finite data in the reference series
Return the imputed results as well as the index matching the target data frame

Parameters:

target_col (str) – the name of the column in either data or target_data to be imputed.
reference_col (str) – the name of the column in either data or reference_data to be used for imputation.
data (pandas.DataFrame) – input data frame such as PlantData.scada that uses a MultiIndex with a timestamp and asset_id column for indices, in that order, by default None.
target_data (pandas.DataFrame) – the DataFrame with NaN data to be imputed.
reference_data (pandas.DataFrame) – the DataFrame to be used in imputation
align_col (str) – the name of the column that to join target_data and reference_data.

Returns:

Copy of target_data_col series with NaN occurrences imputed where possible.

Return type:

pandas.Series

openoa.utils.imputing.impute_all_assets_by_correlation(data: DataFrame, impute_col: str, reference_col: str, asset_id_col: str = 'asset_id', r2_threshold: float = 0.7, method: str = 'linear', degree: int = 1)[source]#

Imputes NaN data in a Pandas data frame to the best extent possible by considering available data across different assets in the data frame. Highest correlated assets are prioritized in the imputation process.

Steps include:

Establish correlation matrix of specified data between different assets
For each asset in the data frame, sort neighboring assets by correlation strength
Then impute asset data based on available data in the highest correlated neighbor
If NaN data still remains in asset, move on to next highest correlated neighbor, etc.
Continue until either:
1. There are no NaN data remaining in asset data
2. There are no more neighbors to consider
3. The neighboring asset does not meet the specified correlation threshold, r2_threshold

Parameters:

data (pandas.DataFrame) – input data frame such as PlantData.scada that uses a MultiIndex with a timestamp and asset_id column for indices, in that order.
impute_col (str) – the name of the column in data to be imputed.
reference_col (str) – the name of the column in data to be used in imputation.
asset_id_col( – obj:`str): The name of the asset_id column, should be one of the turinbe or tower index column names. Defaults to the turbine column name “asset_id”.
r2_threshold (float) – the correlation threshold for a neighboring assets to be considered valid for use in imputation, by default 0.7.
method (str) – The imputation method, should be one of “linear” or “polynomial”, by default “linear”.
degree (int) – The polynomial degree, i.e. linear is a 1 degree polynomial, by default 1

Returns:

The imputation results

Return type:

pandas.Series

Timeseries#

This module provides useful functions for processing timeseries data

openoa.utils.timeseries.offset_to_seconds(offset: int | float | str | datetime64) → int | float[source]#

Converts pandas datetime offset alias to its corresponding number of seconds.

Parameters:: offset (int | float | str | numpy.datetime64) – The pandas offset alias or numpy timestamp to be converted to seconds. If a number (int or float) is passed, then it must be in nanoseconds, the Pandas default.
Returns:: The number of seconds corresponding to offset.
Return type:: int | float

openoa.utils.timeseries.determine_frequency_seconds(data: DataFrame, index_col: str | None = None) → int | float[source]#

Calculates the most common time difference between all non-duplicate timestamps and returns that difference in seconds.

Parameters:

data (pandas.DataFrame) – The pandas DataFrame to determine the DatetimeIndex frequency.
index_col (str | None, optional) – The name of the index column if data uses a MultiIndex, otherwise leave as None. Defaults to None.

Returns:

The number of seconds corresponding to offset.

Return type:

int | float

openoa.utils.timeseries.determine_frequency(data: DataFrame, index_col: str | None = None) → str | int | float[source]#

Gets the offset alias from the datetime index of data, or calculates the most common time difference between all non-duplicate timestamps.

Parameters:

data (pandas.DataFrame) – The pandas DataFrame to determine the DatetimeIndex frequency.
index_col (str | None, optional) – The name of the index column if data uses a MultiIndex, otherwise leave as None. Defaults to None.

Returns:

The offset string or number of seconds between timestamps.

Return type:

str | int | float

openoa.utils.timeseries.convert_local_to_utc(d: str | datetime, tz_string: str) → datetime[source]#

Convert timestamps in local time to UTC. The function can only act on a single timestamp at a time, so for example use the .apply function in Pandas:

date_utc = df[‘time’].apply(convert_local_to_utc, args = (‘US/Pacific’,))

Also note that this function doesn’t solve the end of DST when times between 1:00-2:00 are repeated in November. Those dates are left repeated in UTC time and need to be shifted manually.

The function does address the missing 2:00-3:00 times at the start of DST in March

Parameters:

d (datetime.datetime) – the local date, tzinfo must not be set
tz_string (str) – the local timezone

Returns:

the local date converted to UTC time

Return type:

datetime.datetime

openoa.utils.timeseries.convert_dt_to_utc(dt_col: Series | str, tz_string: str, data: DataFrame = None) → Series[source]#

Converts a pandas Series of timestamps, string-formatted or datetime.datetime objects: that are in a local timezone tz_string to a UTC encoded pandas Series.

Parameters:

dt_col (pandas.Series | str) – A pandas Series of datetime objects or string-encoded timestamps, or a the name of the column in data.
tz_string (str) – The string name for the expected timezone of the provided timestamps in dt_col.
data (pandas.DataFrame, optional) – The pandas DataFrame containing the timestamp column: dt_col. Defaults to None.

Returns:

_description_

Return type:

pd.Series

openoa.utils.timeseries.find_time_gaps(dt_col: Series | str, freq: str, data: DataFrame = None) → Series[source]#

Finds gaps in dt_col based on the expected frequency, freq, and returns them.

Parameters:

dt_col (pandas.Series) – Pandas Series of datetime.datetime objects or the name of the column in data.
freq (string) – The expected frequency of the timestamps, which should align with the pandas timestamp conventions (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).
data (pandas.DataFrame, optional) – The pandas DataFrame containing the timestamp column: dt_col. Defaults to None.

Returns:

Series of missing time stamps in datetime.datetime format

Return type:

pandas.Series

openoa.utils.timeseries.find_duplicate_times(dt_col: Series | str, data: DataFrame = None)[source]#

Find duplicate input data and report them. The first duplicated item is not reported, only subsequent duplicates.

Parameters:

dt_col (pandas.Series | str) – Pandas series of datetime.datetime objects or the name of the column in data.
data (pandas.DataFrame, optional) – The pandas DataFrame containing the timestamp column: dt_col. Defaults to None.

Returns:

Duplicates from input data

Return type:

pandas.Series

openoa.utils.timeseries.gap_fill_data_frame(data: DataFrame, dt_col: str, freq: str) → DataFrame[source]#

Insert any missing timestamps into data while filling the data columns with NaNs.

Parameters:

data (pandas.DataFrame) – The dataframe with potentially missing timestamps.
dt_col (str) – Name of the column in ‘data’ with timestamps.
freq (str) – The expected frequency of the timestamps.

Returns:

output data frame with NaN data for the data gaps

Return type:

pandas.DataFrame

openoa.utils.timeseries.percent_nan(col: Series | str, data: DataFrame = None)[source]#

Return percentage of data that are Nan or 1 if the series is empty.

Parameters:

col (pandas.Series) – The pandas Series to be checked for NaNs, or the name of the column in data.
data (pandas.DataFrame, optional) – The pandas DataFrame containing the timestamp column: col. Defaults to None.

Returns:

Percentage of NaN data in the data series

Return type:

float

openoa.utils.timeseries.num_days(dt_col: Series | str, data: DataFrame = None) → int[source]#

Calculates the number of non-duplicate days in dt_col.

Parameters:

dt_col (pandas.Series | str) – A pandas Series with a timeseries index to be checked for the number of days contained in the data.
data (pandas.DataFrame, optional) – The pandas DataFrame containing the timestamp column: dt_col and having a timeseries index. Defaults to None.

Returns:

Number of days in the data

Return type:

int

openoa.utils.timeseries.num_hours(dt_col: Series | str, *, data: DataFrame = None) → int[source]#

Calculates the number of non-duplicate hours in dt_col.

Parameters:

dt_col (pandas.Series | str) – A pandas Series of timeseries data to be checked for the number of hours contained in the data
data (pandas.DataFrame, optional) – The pandas DataFrame containing the timestamp column: dt_col. Defaults to None.

Returns:

Number of hours in the data

Return type:

int

Met Data Processing#

This module provides methods for processing meteorological data.

openoa.utils.met_data_processing.wrap_180(x: float | ndarray | Series | DataFrame)[source]#

Converts an angle, an array of angles, or a pandas Series or DataFrame of angles in degrees to the range -180 to +180 degrees.

Parameters:

x (float | np.ndarray | pd.Series | pd.DataFrame) – Input angle(s) (degrees)

Returns:

The input angle(s) converted to the range -180 to +180 degrees, returned: as a float or numpy array (degrees)

Return type:

float | np.ndarray

openoa.utils.met_data_processing.circular_mean(x: DataFrame | Series | ndarray, axis: int = 0)[source]#

Compute circular mean of wind direction data for a pandas Series or 1-dimensional numpy array, or along any dimension of a multi-dimensional pandas DataFrame or numpy array

Parameters:

x (pd.DataFrame | pd.Series | np.ndarray) – A pandas DataFrame or Series, or a numpy array containing wind direction data in degrees.
axis (int) – The axis to which the circular mean will be applied. This value must be less than the number of dimensions in x. Defaults to 0.

Returns:

The circular mean of the wind directions along the specified: axis between 0 and 360 degrees (degrees).

Return type:

pd.Series | float | np.ndarray

openoa.utils.met_data_processing.compute_wind_direction(u: Series | str, v: Series | str, data: DataFrame = None) → Series[source]#

Compute wind direction given u and v wind vector components

Parameters:

u (pandas.Series | str) – A pandas Series of the zonal component of the wind, in m/s, or the name of the column in data.
v (pandas.Series | str) – A pandas Series of the meridional component of the wind, in m/s, or the name of the column in data.
data (pandas.DataFrame) – The pandas DataFrame containg the columns u and v.

Returns:

wind direction; units of degrees

Return type:

pandas.Series

openoa.utils.met_data_processing.compute_u_v_components(wind_speed: Series | str, wind_dir: Series | str, data: DataFrame = None) → Series[source]#

Compute vector components of the horizontal wind given wind speed and direction

Parameters:

wind_speed (pandas.Series | str) – A pandas Series of the horizontal wind speed, in m/s, or the name of the column in data.
wind_dir (pandas.Series | str) – A pandas Series of the wind direction, in degrees, or the name of the column in data.
data (pandas.DataFrame) – The pandas DataFrame containg the columns wind_speed and wind_dir.

Raises:

ValueError – Raised if any of the wind_speed or wind_dir values are negative.

Returns:

u(pandas.Series): the zonal component of the wind; units of m/s. v(pandas.Series): the meridional component of the wind; units of m/s

Return type:

(tuple)

openoa.utils.met_data_processing.compute_air_density(temp_col: Series | str, pres_col: Series | str, humi_col: Series | str = None, data: DataFrame = None)[source]#

Calculate air density from the ideal gas law based on the definition provided by IEC 61400-12 given pressure, temperature and relative humidity.

This function assumes temperature and pressure are reported in standard units of measurement (i.e. Kelvin for temperature, Pascal for pressure, humidity has no dimension).

Humidity values are optional. According to the IEC a humiditiy of 50% (0.5) is set as default value.

Parameters:

temp_col (pandas.Series | str) – A pandas Series of the temperature values, in Kelvin, or the name of the column in data.
pres_col (pandas.Series | str) – A pandas Series of the pressure values, in Pascals, or the name of the column in data.
humi_col (pandas.Series | str) – An optional pandas Series of the relative humidity values, as a decimal in the range (0, 1), or the name of the column in data. Defaults to None.
data (pandas.DataFrame) – The pandas DataFrame containg the columns temp_col and pres_col, and optionally humi_col.

Raises:

ValueError – Raised if any of the temp_col or pres_col, or humi_col values are negative.

Returns:

Rho, calcualted air density; units of kg/m3

Return type:

pandas.Series

openoa.utils.met_data_processing.pressure_vertical_extrapolation(p0: Series | str, temp_avg: Series | str, z0: Series | str, z1: Series | str, data: DataFrame = None) → Series[source]#

Extrapolate pressure from height z0 to height z1 given the average temperature in the layer. The hydostatic equation is used to peform the extrapolation.

Parameters:

p0 (pandas.Series) – A pandas Series of the pressure at height z0, in Pascals, or the name of the column in data.
temp_avg (pandas.Series) – A pandas Series of the mean temperature between z0 and z1, in Kelvin, or the name of the column in data.
z0 (pandas.Series) – A pandas Series of the height above surface, in meters, or the name of the column in data.
z1 (pandas.Series) – A pandas Series of the extrapolation height, in meters, or the name of the column in data.
data (pandas.DataFrame) – The pandas DataFrame containg the columns p0, temp_avg, z0, and z1.

Raises:

ValueError – Raised if any of the p0 or temp_avg values are negative.

Returns:

p1, extrapolated pressure at z1, in Pascals

Return type:

pandas.Series

openoa.utils.met_data_processing.air_density_adjusted_wind_speed(wind_col: Series | str, density_col: Series | str, data: DataFrame = None) → Series[source]#

Apply air density correction to wind speed measurements following IEC-61400-12-1 standard

Parameters:

wind_col (pandas.Series | str) – A pandas Series containing the wind speed data, in m/s, or the name of the column in data
density_col (pandas.Series | str) – A pandas Series containing the air density data, in kg/m3, or the name of the column in data
data (pandas.DataFrame) – The pandas DataFrame containg the columns wind_col and density_col.

Returns:

density-adjusted wind speeds, in m/s

Return type:

pandas.Series

openoa.utils.met_data_processing.compute_turbulence_intensity(mean_col: Series | str, std_col: Series | str, data: DataFrame = None) → Series[source]#

Compute turbulence intensity

Parameters:

mean_col (pandas.Series | str) – A pandas Series containing the wind speed mean data, in m/s, or the name of the column in data.
std_col (pandas.Series | str) – A pandas Series containing the wind speed standard deviation data, in m/s, or the name of the column in data.
data (pandas.DataFrame) – The pandas DataFrame containg the columns :py:attr:mean_col and std_col.

Returns:

turbulence intensity, (unitless ratio)

Return type:

pd.Series

openoa.utils.met_data_processing.compute_shear(data: DataFrame, ws_heights: dict[str, float], return_reference_values: bool = False) → Series | tuple[Series, float, Series][source]#

Computes shear coefficient between wind speed measurements using the power law. The shear coefficient is obtained by evaluating the expression for an OLS regression coefficient.

Parameters:

data (pandas.DataFrame) – A pandas DataFrame with wind speed columns that correspond to the keys of ws_heights.
ws_heights (dict[str, float]) – A dictionary with wind speed column names of data as keys and their respective sensor heights (m) as values.
return_reference_values( – obj: bool): If True, this function returns a three element tuple where the first element is the array of shear exponents, the second element is the reference height (float), and the third element is the array of reference wind speeds. These reference values can be used for extrapolating wind speed. Defaults to False.

Returns:

If: return_reference_values is False, return just the shear coefficient (unitless), else return the shear coefficent (unitless), reference height (m), and reference wind speed (m/s).

Return type:

pandas.Series | tuple[pandas.Series, float, pandas.Series]

openoa.utils.met_data_processing.extrapolate_windspeed(v1: Series | str, z1: float, z2: float, shear: Series | str, data: DataFrame = None)[source]#

Extrapolates wind speed vertically using the Power Law.

Parameters:

v1( – obj: pandas.Series | float | str): A pandas Series of the wind speed measurements at the reference height, or the name of the column in data.
z1 (float) – Height of reference wind speed measurements; units in meters
z2 (float) – Target extrapolation height; units in meters
shear( – obj: pandas.Series | float | str): A pandas Series of the shear values, or the name of the column in data.
data (pandas.DataFrame) – The pandas DataFrame containg the columns v1 and shear.

Returns:

obj: (pandas.Series | numpy.array | float): Wind speed extrapolated to target height.

openoa.utils.met_data_processing.compute_veer(wind_a: Series | str, height_a: float, wind_b: Series | str, height_b: float, data: DataFrame = None)[source]#

Compute veer between wind direction measurements

Parameters:

wind_a (pandas.Series | str) – A pandas Series containing the wind direction mean data, in degrees, or the name of the column in data.
height_a (float) – sensor height for wind_a
wind_b (pandas.Series | str) – A pandas Series containing the wind direction mean data, in degrees, or the name of the column in data.
height_b (float) – sensor height for wind_b
data (pandas.DataFrame) – The pandas DataFrame containg the columns wind_a, and wind_b.

Returns:

veer (deg/m)

Return type:

veer(array)

Metadata Fetch#

This module fetches metadata of wind farms

Read in EIA data of wind farm of interest:

from EIA API for monthly productions, return monthly net energy generation time series
from local Excel files for wind farm metadata, return dictionary of metadata

Parameters:

api_key (str) – 32-character user-specific API key, obtained from EIA.
plant_id (str) – 5-character EIA power plant code.
file_path (str) – Directory with EIA metadata .xlsx files.
plant_file (str | Path) – Name of the plant metadata Excel file in file_path. Formerly hard-coded to: “2___Plant_Y2017.xlsx”.
plant_sheet (str) – The name of the sheet containing the data in plant_file. Formerly hard-coded as “Plant”.
wind_file (str | Path) – Name of the wind metadata Excel file in file_path. Formerly hard-coded to: “”3_2_Wind_Y2017.xlsx”.
wind_sheet (str) – The name of the sheet containing the data in plant_file. Formerly hard-coded as “Operable”.

Returns:

monthly net energy generation in MWh dictionary: metadata of the wind farm with ‘plant_id’

Return type:

pandas.Series

Assign EIA meta data to PlantData object, which is by default an empty dictionary.

Parameters:

project (PlantData) – PlantData object for a particular project
api_key (str) – 32-character user-specific API key, obtained from EIA.
plant_id (str) – 5-character EIA power plant code.
file_path (str) – Directory with EIA metadata .xlsx files.
plant_file (str | Path) – Name of the plant metadata Excel file in file_path. Formerly hard-coded to: “2___Plant_Y2017.xlsx”.
plant_sheet (str) – The name of the sheet containing the data in plant_file.
wind_file (str | Path) – Name of the wind metadata Excel file in file_path. Formerly hard-coded to: “”3_2_Wind_Y2017.xlsx”.
wind_sheet (str) – The name of the sheet containing the data in plant_file.

Returns:

(None)

Unit Conversion#

This module provides basic methods for unit conversion and calculation of basic wind plant variables

openoa.utils.unit_conversion.convert_power_to_energy(power_col: str | Series, sample_rate_min='10min', data: DataFrame = None) → Series[source]#

Compute energy [kWh] from power [kw] and return the data column

Parameters:

power_col (str | pandas.Series) – The power data, in kW, or the name of the column in data.
sample_rate_min (float) – Sampling rate as a pandas offset alias, in minutes, to use for conversion. Defaults to “10min.
data (pandas.DataFrame) – The pandas DataFrame containing the col power_col.

Returns:

Energy in kWh that matches the length of the input data frame :py:attr:’df’

Return type:

pandas.Series

openoa.utils.unit_conversion.compute_gross_energy(net_energy: str | Series, availability: str | Series, curtailment: str | Series, availability_type: str = 'frac', curtailment_type: str = 'frac', data: str | DataFrame = None)[source]#

Computes gross energy for a wind plant or turbine by adding reported availability and curtailment losses to reported net energy.

Parameters:

net_energy (str | pandas.Series) – A pandas Series, the name of the columnn in data corresponding to the reported net energy for wind plant or turbine.
availability (str | pandas.Series) – A pandas Series, the name of the columnn in data corresponding to the reported availability losses for wind plant or turbine
curtailment (str | pandas.Series) – A pandas Series, the name of the columnn in data corresponding to the reported curtailment losses for wind plant or turbine
availability_type (str) – Either one of “frac” or “energy” corresponding to if the data provided in availability is in the range of [0, 1], or representing the energy lost.
curtailment_type (str) – Either one of “frac” or “energy” corresponding to if the data provided in curtailment is in the range of [0, 1], or representing the energy lost.
data (pd.DataFrame, optional) – The pandas DataFrame containing the net_energy, availability, and curtailment columns.

Returns:

Calculated gross energy for wind plant or turbine

Return type:

gross(pandas.Series)

openoa.utils.unit_conversion.convert_feet_to_meter(variable: str | Series, data: DataFrame = None)[source]#

Compute variable in [meter] from [feet] and return the data column

Parameters:

variable (str | pandas.Series) – A pandas Series, the name of the columnn in data corresponding to the data needing to be converted to meters.
data (pandas.DataFrame) – The pandas DataFrame containing the column variable.
variable (string) – variable in feet

Returns:

variable in meters

Return type:

pandas.Series

Plotting#

This module provides helpful functions for creating various plots

openoa.utils.plot.set_styling() → None[source]#: Sets some of the matplotlib plotting styling to be consistent throughout any module where plotting is implemented.

openoa.utils.plot.map_wgs84_to_cartesian(longitude_origin: ndarray[Any, dtype[float64]] | float, latitude_origin: ndarray[Any, dtype[float64]] | float, longitude_points: ndarray[Any, dtype[float64]] | Series | float, latitude_points: ndarray[Any, dtype[float64]] | Series | float) → tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]] | tuple[Series, Series] | tuple[float, float][source]#

Maps WGS-84 latitude and longitude to local cartesian coordinates using an origin coordinate pair.

Parameters:

longitude_origin (numpy array of shape (1, ) | float) – longitude of cartesian coordinate system origin.
latitude_origin (numpy array of shape (1, ) | float) – latitude of cartesian coordinate system origin.
longitude_points (numpy array of shape (n, ) | pd.Series | float) – longitude(s) of points of interest.
latitude_points (numpy array of shape (n, ) | pd.Series | float) – latitude(s) of points of interest.

Returns:

Tuple representing cartesian coordinates (x, y); returned as a tuple of numpy arrays, pandas Series, or scalars, dependent upon the originally passed data.

openoa.utils.plot.luminance(rgb: tuple[int, int, int])[source]#

Calculates the brightness of an rgb 255 color. See https://en.wikipedia.org/wiki/Relative_luminance

Parameters:: rgb (tuple) – Tuple of red, gree, and blue values in the range of 0-255.
Returns:: relative luminance.
Return type:: luminance(int)

Example

>>> rgb = (255,127,0)
>>> luminance(rgb)
0.5687976470588235

>>> luminance((0,50,255))
0.21243529411764706

openoa.utils.plot.color_to_rgb(color: str | tuple[int, int, int])[source]#

Converts named colors, hex and normalised RGB to 255 RGB values

Parameters:: color (color) – RGB, HEX or named color.
Returns:: 255 RGB values.
Return type:: rgb(tuple)

Example

>>> color_to_rgb("Red")
(255, 0, 0)

>>> color_to_rgb((1,1,0))
(255,255,0)

>>> color_to_rgb("#ff00ff")
(255,0,255)

openoa.utils.plot.plot_windfarm(asset_df, tile_name='OpenMap', plot_width=800, plot_height=800, marker_size=14, figure_kwargs={}, marker_kwargs={})[source]#

Plot the windfarm spatially on a map using the Bokeh plotting libaray.

Parameters:

asset_df (pd.DataFrame) – PlantData.asset object containing the asset metadata.
tile_name (str) – tile set to be used for the underlay, e.g. OpenMap, ESRI, OpenTopoMap
plot_width (int) – width of plot
plot_height (int) – height of plot
marker_size (int) – size of markers
figure_kwargs (dict) – additional figure options for advanced users, see Bokeh docs
marker_kwargs (dict) – additional marker options for advanced users, see Bokeh docs. We have some custom behavior around the “fill_color” attribute. If “fill_color” is not defined, OpenOA will use an internally defined color pallete. If “fill_color” is the name of a column in the asset table, OpenOA will use the value of that column as the marker color. Otherwise, “fill_color” is passed through to Bokeh.

Returns:

windfarm map

Return type:

Bokeh_plot(axes handle)

Example

import pandas as pd
from bokeh.plotting import figure, output_file, show

from openoa.utils.plot import plot_windfarm

from examples import project_ENGIE

# Load plant object
project = project_ENGIE.prepare("../examples/data/la_haute_borne")

# Create the bokeh wind farm plot
show(plot_windfarm(project.asset, tile_name="ESRI", plot_width=600, plot_height=600))

openoa.utils.plot.plot_by_id(df: DataFrame, id_col: str, x_axis: str, y_axis: str, max_cols: int = 4, xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), xlabel: str | None = None, ylabel: str | None = None, return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}) → None[source]#

Function to plot any two fields against each other in a dataframe with unique plots for each asset_id.

Parameters:

df (pd.DataFrame) – The dataframe for comparing values.
id_col (str) – The asset_id column (or index column) in df.
x_axis (str) – Independent variable to plot, should align with a column in df.
y_axis (str) – Dependent variable to plot, should align with a column in df.
max_cols (int, optional) – The maximum number of columns in the plot. Defaults to 4.
xlim (tuple[float, float], optional) – A tuple of the x-axis (min, max) values. Defaults to (None, None).
ylim (tuple[float, float], optional) – A tuple of the y-axis (min, max) values. Defaults to (None, None).
xlabel (str | None) – The x-axis label, if None, then x_axis will be used. Defaults to None.
ylabel (str | None) – The y-axis label, if None, then x_axis will be used. Defaults to None.
return_fig (bool, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.
figure_kwargs (dict, optional) – Additional keyword arguments that should be passed to plt.figure(). Defaults to {}.
plot_kwargs (dict, optional) – Additional keyword arguments that should be passed to ax.scatter. Defaults to {}.

Returns:

(None)

openoa.utils.plot.column_histograms(df: DataFrame, columns: list | None = None, return_fig: bool = False)[source]#

Produces a histogram plot for each numeric column in df.

Parameters:

df (pd.DataFrame) – The dataframe for plotting.
return_fig (bool) – Indicator for if the figure and axes objects should be returned, by default False.

Returns:

(None)

openoa.utils.plot.plot_power_curve(wind_speed: Series, power: Series, flag: ndarray | Series, flag_labels: tuple[str, str] = ('Flagged Readings', 'Power Curve'), xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), legend: bool = False, return_fig: bool = False, figure_kwargs: dict = {}, legend_kwargs: dict = {}, scatter_kwargs: dict = {}) → None | tuple[Figure, Axes][source]#

Plots the individual points on a power curve, with an optional flag filtering for singling out readings in the figure. If flag is all false values then no overlaid flagge scatter points will be created.

Parameters:

wind_speed (pandas.Series) – A pandas Series or numpy array of the recorded wind speeds, in m/s.
power (pandas.Series | np.ndarray) – A pandas Series or numpy array of the recorded power, in kW.
flag (numpy.ndarray | pd.Series) – A pandas Series or numpy array of booleans for which points to flag in the windspeed and power data.
flag_labels (tuple[str, str], optional) – The labels to give to the scatter points, corresponding to the flagged points and raw points, respectively. Defaults to (“Flagged Readings”, “Power Curve”).
xlim (tuple[float, float], optional) – A tuple of the x-axis (min, max) values. Defaults to (None, None).
ylim (tuple[float, float], optional) – A tuple of the y-axis (min, max) values. Defaults to (None, None).
legend (bool, optional) – Set to True to place a legend in the figure, otherwise set to False. Defaults to False.
return_fig (bool, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.
figure_kwargs (dict, optional) – Additional keyword arguments that should be passed to plt.figure(). Defaults to {}.
scatter_kwargs (dict, optional) – Additional keyword arguments that should be passed to ax.scatter(). Defaults to {}.
legend_kwargs (dict, optional) – Additional keyword arguments that should be passed to ax.legend(). Defaults to {}.

Returns:

_description_

Return type:

None | tuple[plt.Figure, plt.Axes]

openoa.utils.plot.plot_monthly_reanalysis_windspeed(data: dict[str, DataFrame], windspeed_col: str, plant_por: tuple[datetime, datetime], normalize: bool = True, xlim: tuple[datetime, datetime] = (None, None), ylim: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}, legend_kwargs: dict = {}) → None | tuple[Figure, Axes][source]#

Make a plot of the normalized annual average wind speeds from reanalysis data to show general trends for each, and highlighting the period of record for the plant data.

Parameters:

data (dict[pandas.DataFrame]) – The dictionary of reanalysis dataframes.
windspeed_col (str) – The name of the column for the windspeed data to be plot.
plot_por (tuple[datetime.datetime, datetime.datetime]) – The start and end datetimes for a plant’s period of record (POR).
normalize (bool) – Indicator of if the windspeeds shoudld be normalized (True), or not (False). Defaults to True.
xlim (tuple[datetime.datetime, datetime.datetime], optional) – A tuple of datetimes representing the x-axis plotting display limits. Defaults to (None, None).
ylim (tuple[float, float], optional) – A tuple of the y-axis plotting display limits. Defaults to (None, None).
return_fig (bool, optional) – Flag to return the figure and axes objects. Defaults to False.
figure_kwargs (dict, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to {}.
plot_kwargs (dict, optional) – Additional plotting keyword arguments that are passed to ax.plot(). Defaults to {}.
legend_kwargs (dict, optional) – Additional legend keyword arguments that are passed to ax.legend(). Defaults to {}.

Returns:

If return_fig is: True, then the figure and axes objects are returned for further tinkering/saving.

Return type:

None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes]

openoa.utils.plot.plot_plant_energy_losses_timeseries(data: DataFrame, energy_col: str, loss_cols: list[str], energy_label: str, loss_labels: list[str], xlim: tuple[datetime, datetime] = (None, None), ylim_energy: tuple[float, float] = (None, None), ylim_loss: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}, legend_kwargs: dict = {})[source]#

Plot timeseries of energy, and the loss categories of interest.

Parameters:

data (pandas.DataFrame) – A pandas DataFrame containing energy production and losses.
energy_col (str) – The name of the column in data containing the energy production.
loss_cols (list[str]) – The name(s) of the column(s) in data containing the loss data.
energy_label (str) – The legend label and y-axis label for the energy plot.
loss_labels (list[str]) – The legend labels losses plot.
xlim (tuple[datetime.datetime, datetime.datetime], optional) – A tuple of datetimes representing the x-axis plotting display limits. Defaults to None.
ylim_energy (tuple[float, float], optional) – A tuple of the y-axis plotting display limits for the gross energy plot (top figure). Defaults to None.
ylim_loss (tuple[float, float], optional) – A tuple of the y-axis plotting display limits for the loss plot (bottom figure). Defaults to (None, None).
return_fig (bool, optional) – Flag to return the figure and axes objects. Defaults to False.
figure_kwargs (dict, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to {}.
plot_kwargs (dict, optional) – Additional plotting keyword arguments that are passed to ax.plot(). Defaults to {}.
legend_kwargs (dict, optional) – Additional legend keyword arguments that are passed to ax.legend(). Defaults to {}.

Returns:

If return_fig is True, then the figure and axes objects are returned for further tinkering/saving.

Return type:

None | tuple[matplotlib.pyplot.Figure, tuple[matplotlib.pyplot.Axes, matplotlib.pyplot.Axes]]

openoa.utils.plot.plot_distributions(data: DataFrame, which: list[str], xlabels: list[str], xlim: tuple[tuple[float, float], ...] | None = None, ylim: tuple[tuple[float, float], ...] | None = None, return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}, annotate_kwargs: dict = {}, title: str | None = None) → None | tuple[Figure, Axes][source]#

Plot a distribution of AEP values from the Monte-Carlo OA method

Parameters:

aep (pandas.DataFrame) – The pandas DataFrame of results data.
which – (list[str]): The list of columns in data that should have their distributions plot.
xlabels – (obj:list[str]): The list of x-axis labels
xlim (tuple[tuple[float, float], ...], optional) – A tuple of tuples (or None) corresponding to each of elements of which that get passed to ax.set_xlim(). Defaults to None.
ylim (tuple[tuple[float, float], ...], optional) – A tuple of tuples (or None) corresponding to each of elements of which that get passed to ax.set_ylim(). Defaults to None.
return_fig (bool, optional) – Flag to return the figure and axes objects. Defaults to False.
figure_kwargs (dict, optional) – Additional figure instantiation keyword arguments that are passed to `plt.figure()`. Defaults to {}.
plot_kwargs (dict, optional) – Additional plotting keyword arguments that are passed to ax.hist(). Defaults to {}.
annotate_kwargs (dict, optional) – Additional annotation keyword arguments that are passed to ax.annotate(). Defaults to {}.
title ( – str:, optional): Title to place over all subplots.

Returns:

If return_fig is: True, then the figure and axes objects are returned for further tinkering/saving.

Return type:

None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes]

openoa.utils.plot.plot_boxplot(x: Series, y: Series, xlabel: str, ylabel: str, ylim: tuple[float | None, float | None] = (None, None), with_points: bool = False, points_label: str | None = None, return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs_box: dict = {}, plot_kwargs_points: dict = {}, legend_kwargs: dict = {}) → None | tuple[Figure, Axes][source]#

Plot box plots of AEP results sliced by a specified Monte Carlo parameter

Parameters:

x (pandas.Series) – The data that splits the results in y.
y (pandas.Series) – The resulting data to be splity by x.
xlabel (str) – The x-axis label.
ylabel (str) – The y-axis label.
ylim (tuple[float, float], optional) – A tuple of the y-axis plotting display limits. Defaults to None.
with_points (bool, optional) – Flag to plot the individual points like a seaborn swarmplot. Defaults to False.
points_label (bool | None, optional) – Legend label for the points, if plotting. Defaults to None.
return_fig (bool, optional) – Flag to return the figure and axes objects. Defaults to False.
figure_kwargs (dict, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to {}.
plot_kwargs_box (dict, optional) – Additional plotting keyword arguments that are passed to ax.boxplot(). Defaults to {}.
plot_kwargs_points (dict, optional) – Additional plotting keyword arguments that are passed to ax.boxplot(). Defaults to {}.
legend_kwargs (dict, optional) – Additional legend keyword arguments that are passed to ax.legend(). Defaults to {}.

Returns:

If return_fig is: True, then the figure object, axes object, and a dictionary of the boxplot objects are returned for further tinkering/saving.

Return type:

None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes, dict]

openoa.utils.plot.plot_waterfall(data: list[float] | ndarray[Any, dtype[float64]], index: list[str], ylabel: str | None = None, ylim: tuple[float, float] = (None, None), return_fig: bool = False, plot_kwargs: dict = {}, figure_kwargs: dict = {}) → None | tuple[source]#

Produce a waterfall plot showing the progression from the EYA estimates to the calculated OA estimates of AEP.

Parameters:

data (array-like) – data to be used to create waterfall.
index (list) – List of string values to be used for x-axis labels, which should have one more value than the number of points in data to account for the calculated OA total.
ylabel (str) – The y-axis label. Defaults to None.
ylim (tuple[float | None, float | None]) – The y-axis minimum and maximum display range. Defaults to (None, None).
return_fig (bool, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.
figure_kwargs (dict, optional) – Additional keyword arguments that should be passed to plt.figure(). Defaults to {}.
plot_kwargs (dict, optional) – Additional keyword arguments that should be passed to ax.plot(). Defaults to {}.
legend_kwargs (dict, optional) – Additional keyword arguments that should be passed to` ax.legend()`. Defaults to {}.

Returns:

If return_fig, then return the figure: and axes objects in addition to showing the plot.

Return type:

None | tuple[plt.Figure, plt.Axes]

openoa.utils.plot.plot_power_curves(data: dict[str, DataFrame], power_col: str, windspeed_col: str, flag_col: str | None = None, turbines: list[str] | None = None, flag_labels: tuple[str, str] = ('Flagged Readings', 'Power Curve'), max_cols: int = 3, xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), legend: bool = False, return_fig: bool = False, figure_kwargs: dict = {}, legend_kwargs: dict = {}, plot_kwargs: dict = {})[source]#

Plots a series of power curves for a dictionary of turbine data, allowing for an optional filtering for singling out readings in the figure.

Parameters:

data (dict[str, pd.DataFrame]) – The dictionary of turbine IDs and and SCADA data.
wind_speed_col (pandas.Series) – A pandas Series or numpy array of the recorded wind speeds, in m/s.
power_col (pandas.Series | np.ndarray) – A pandas Series or numpy array of the recorded power, in kW.
flag_col (np.ndarray | pd.Series) – A pandas Series or numpy array of booleans for which points to flag in the windspeed and power data.
turbines (list[str], optional) – The list of turbines to be plot, if not all of the keys in data.
flag_labels (tuple[str, str], optional) – The labels to give to the scatter points, corresponding to the flagged readings and raw readings, respectively. Defaults to (“Flagged Readings”, “Power Curve”).
max_cols (int, optional) – The maximum number of columns in the plot. Defaults to 3.
xlim (tuple[float, float], optional) – A tuple of the x-axis (min, max) values. Defaults to (None, None).
ylim (tuple[float, float], optional) – A tuple of the y-axis (min, max) values. Defaults to (None, None).
legend (bool, optional) – Set to True to place a legend in the figure, otherwise set to False. Defaults to False.
return_fig (bool, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.
figure_kwargs (dict, optional) – Additional keyword arguments that should be passed to plt.figure(). Defaults to {}.
plot_kwargs (dict, optional) – Additional keyword arguments that should be passed to ax.scatter(). Defaults to {}.
legend_kwargs (dict, optional) – Additional keyword arguments that should be passed to ax.legend(). Defaults to {}.

Returns:

Returns the figure and axes objects if: return_fig is True.

Return type:

None | tuple[plt.Figure, plt.Axes]

openoa.utils.plot.plot_wake_losses(bins: ndarray[Any, dtype[float64]], efficiency_data_por: ndarray[Any, dtype[float64]], efficiency_data_lt: ndarray[Any, dtype[float64]], energy_data_por: ndarray[Any, dtype[float64]] | None = None, energy_data_lt: ndarray[Any, dtype[float64]] | None = None, bin_axis_label: str = 'wd', turbine_id: str | None = None, xlim: tuple[float, float] = (None, None), ylim_efficiency: tuple[float, float] = (None, None), ylim_energy: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict | None = None, plot_kwargs_line: dict = {}, plot_kwargs_fill: dict = {}, legend_kwargs: dict = {})[source]#

Plots wake losses in the form of wind farm efficiency as well as normalized wind plant energy production for both the period of record and with the long-term correction as a function of either wind direction or wind speed. If the data arguments contain two dimensions, 95% confidence intervals will be plotted for each variable.

Parameters:

bins (np.ndarray) – Wind direction or wind speed bin values representing the x-axis in the plots.
efficiency_data_por (np.ndarray) – 1D or 2D array containing wind farm or wind turbine efficiency for the period of record for each bin in the bins argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.
efficiency_data_lt (np.ndarray) – 1D or 2D array containing long-term corrected wind farm or wind turbine efficiency for each bin in the bins argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.
energy_data_por (np.ndarray, optional) – Optional 1D or 2D array containing normalized energy production for the period of record for each bin in the bins argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted. If a value of None is provided, normalized energy will not be plotted. Defaults to None.
energy_data_lt (np.ndarray, optional) – Optional 1D or 2D array containing normalized long-term corrected energy production for each bin in the bins argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted. If a value of None is provided, normalized energy will not be plotted. Defaults to None.
bin_axis_label (str, optional) – The label to use for the bin variable (x) axis. Defaults to None.
turbine_id (str, optional) – Name of turbine if data are provided for a single wind turbine. Used to determine title and plot axis labels. Defaults to None.
xlim (tuple[float, float], optional) – A tuple of floats representing the x-axis wind direction plotting display limits (degrees). Defaults to (None, None).
ylim_efficiency (tuple[float, float], optional) – A tuple of the y-axis plotting display limits for the wind farm efficiency plot (top plot). Defaults to (None, None).
ylim_energy (tuple[float, float], optional) – If energy_data_por and energy_data_lt arguments are provided, a tuple of the y-axis plotting display limits for the wind farm energy distribution plot (bottom plot). Defaults to (None, None).
return_fig (bool, optional) – Flag to return the figure and axes objects. Defaults to False.
figure_kwargs (dict, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to None.
plot_kwargs_line (dict, optional) – Additional plotting keyword arguments that are passed to ax.plot() for plotting lines for the wind farm efficiency and, if energy_data_por and energy_data_lt arguments are provided, energy distributions subplots. Defaults to {}.
plot_kwargs_fill (dict, optional) – If UQ is True, additional plotting keyword arguments that are passed to ax.fill_between() for plotting shading regions for 95% confidence intervals for the wind farm efficiency and, if energy_data_por and energy_data_lt arguments are provided, energy distributions subplots. Defaults to {}.
legend_kwargs (dict, optional) – Additional legend keyword arguments that are passed to ax.legend() for the wind farm efficiency and, if energy_data_por and energy_data_lt arguments are provided, energy distributions subplots. Defaults to {}.

Returns:

If return_fig is True, then the figure and axes object(s), corresponding to the wake loss plot or, if energy_data_por and energy_data_lt arguments are provided, wake loss and normalized energy plots, are returned for further tinkering/saving.

Return type:

None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes] | tuple[matplotlib.pyplot.Figure, tuple [matplotlib.pyplot.Axes, matplotlib.pyplot.Axes]]

openoa.utils.plot.plot_yaw_misalignment(ws_bins: list[float], vane_bins: list[float], power_values_vane_ws: ndarray[Any, dtype[float64]], curve_fit_params_ws: ndarray[Any, dtype[float64]], mean_vane_angle_ws: ndarray[Any, dtype[float64]], yaw_misalignment_ws: ndarray[Any, dtype[float64]], turbine_id: str, power_performance_label: str = 'Normalized Cp (-)', xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict | None = None, plot_kwargs_curve: dict = {}, plot_kwargs_line: dict = {}, plot_kwargs_fill: dict = {}, legend_kwargs: dict = {})[source]#

Plots power performance vs. wind vane angle along with the best-fit cosine curve for each wind speed bin for a single turbine. The mean wind vane angle and the wind vane angle where power performance is maximized are shown for each wind speed bin. Additionally, the yaw misalignments for each wind speed bin as well as the mean yaw misalignment avergaged over all wind speed bins are listed. If UQ is used, 95% confidence intervals will be plotted for the binned power performance values and listed for the yaw misalignment estiamtes.

Parameters:

ws_bins (list[float]) – Wind speed bin values for which yaw misalignment plots are produced (m/s).
vane_bins (list[float]) – Wind vane angle bin values for which power performance values are plotted (degrees).
power_values_vane_ws (np.ndarray) – 2D or 3D array containing power performance data for each wind speed bin in the ws_bins argument (first dimension if a 2D array) and each wind vane bin in the vane_bins argument (second dimension if a 2D array). If a 3D array is provided, the first dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.
curve_fit_params_ws (np.ndarray) – 2D or 3D array containing optimal cosine curve fit parameters (magnitude, offset (degrees), and cosine exponent) for each wind speed bin in the ws_bins argument (first dimension if a 2D array). If a 3D array is provided, the first dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted. The last dimension contains the optimal curve fit parameters.
mean_vane_angle_ws (np.ndarray) – Array containing mean wind vane angles for each wind speed bin in the ws_bins argument (degrees).
yaw_misalignment_ws (np.ndarray) – 1D or 2D array containing yaw misalignment values for each wind speed bin in the ws_bins argument (degrees). If a 2D array is provided, the first dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.
turbine_id (str, optional) – Name of turbine for which yaw misalignment data are provided. Used to determine title and plot axis labels. Defaults to None.
power_performance_label (str, optional) – The label to use for the power performance (y) axis. Defaults to “Normalized Cp (-)”.
xlim (tuple[float, float], optional) – A tuple of floats representing the x-axis wind vane angle plotting display limits (degrees). Defaults to (None, None).
ylim (tuple[float, float], optional) – A tuple of the y-axis plotting display limits for the power performance vs. wind vane plots. Defaults to (None, None).
return_fig (bool, optional) – Flag to return the figure and axes objects. Defaults to False.
figure_kwargs (dict, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to None.
plot_kwargs_curve (dict, optional) – Additional plotting keyword arguments that are passed to ax.plot() for plotting lines for the power performance vs. wind vane plots. Defaults to {}.
plot_kwargs_line (dict, optional) – Additional plotting keyword arguments that are passed to ax.plot() for plotting vertical lines indicating mean vane angle and vane angle where power is maximized. Defaults to {}.
plot_kwargs_fill (dict, optional) – If UQ is True, additional plotting keyword arguments that are passed to ax.fill_between() for plotting shading regions for 95% confidence intervals for power performance vs. wind vane. Defaults to {}.
legend_kwargs (dict, optional) – Additional legend keyword arguments that are passed to ax.legend() for the power performance vs. wind vane plots. Defaults to {}.

Returns:

If return_fig is True, then the figure and axes object(s) corresponding to the yaw misalignment plots are returned for further tinkering/saving.

Return type:

None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes]