Utils#
The utils subpackage provides module-level methods that operate on Pandas DataFrame`s and `Series. These modules and their methods are used throughout the OpenOA codebase, and can be imported and used individually into your own scripts.
Quality Assurance#
Provides the Quality Assurance (QA) methods for SCADA data checking.
- openoa.utils.qa.determine_offset_dst(df: pd.DataFrame, local_tz: str) pd.DataFrames [source]#
Creates a column of “utc_offset” and “is_dst”.
- Parameters:
df (
pd.DataFrame
) – The dataframe object to manipulate with a tz-awarepandas.DatetimeIndex
.local_tz( – obj: ‘String’): The
pytz
-compatible timezone for the input time_field, by default UTC. This should be in the format of “Country/City” or “Region/City” such as “America/Denver” or “Europe/Paris”.
- Returns:
The updated dataframe with “utc_offset” and “is_dst” columns created.
- Return type:
(
pd.DataFrame
)
- openoa.utils.qa.convert_datetime_column(df: DataFrame, time_col: str, local_tz: str, tz_aware: bool) DataFrame [source]#
Converts the passed timestamp data to a pandas-encoded Datetime, and creates a corresponding localized and UTC timestamp using the
time_field
column name with either “localized” or “utc”, respectively. The_df
object then uses the local timezone timestamp for its index.- Parameters:
df( – obj: pd.DataFrame): The SCADA
pd.DataFrame
time_col( – obj: string): The string name of datetime stamp column in
df
.local_tz( – obj: ‘string’): The
pytz
-compatible timezone for the inputtime_field
, by default UTC. This should be in the format of “Country/City” or “Region/City” such as “America/Denver” or “Europe/Paris”.tz_aware( – obj: bool): Indicator for if the provided data in
time_col
has the timezone information embedded (True
), or not (False
).
- Returns:
- obj: pd.DataFrame): The updated
pd.DataFrame
with an index ofpd.DatetimeIndex
with UTC time-encoding, and the following new columns: -
time_col`_utc: A UTC-converted timestamp column - :py:attr:`time_col`_localized: The fully converted and localized timestamp column - utc_offset: The difference, in hours between the localized and UTC time - is_dst: Indicator for whether of not the timestamp is considered to be DST (``True`
) or not (False
)
- obj: pd.DataFrame): The updated
- Return type:
(
- openoa.utils.qa.duplicate_time_identification(df: DataFrame, time_col: str, id_col: str) tuple[Series, None | Series, None | Series] [source]#
Identifies the time duplications on the modified SCADA data frame to highlight the duplications from the original time data (
time_col
), the UTC timestamps, and the localized timestamps, if the latter are available.- Parameters:
df ( – obj: pd.DataFrame): The resulting SCADA dataframe from
convert_datetime_column()
, otherwise the UTC and localized column checks will returnNone
.time_col ( – obj: str): The string name of the timestamp column.
id_col ( – obj: str): The string name of the turbine asset_id column, to ensure that duplicates aren’t based off multiple turbine’s data.
- Returns:
- The dataframe subsets with duplicate
timestamps based on the original timestamp column, the localized timestamp column (
None
if the column does not exist), and the UTC-converted timestamp column (None
if the column does not exist).
- Return type:
(tuple[pd.Series, None | pd.Series, None | pd.Series])
- openoa.utils.qa.gap_time_identification(df: DataFrame, time_col: str, freq: str) tuple[Series, None | Series, None | Series] [source]#
Identifies the time gaps on the modified SCADA data frame to highlight the missing timestamps from the original time data (time_col), the UTC timestamps, and the localized timestamps, if the latter are available.
- Parameters:
df ( – obj: pd.DataFrame): The resulting SCADA dataframe from
convert_datetime_column()
, otherwise the UTC and localized column checks will return 1.time_col ( – obj: str): The string name of the timestamp column.
freq ( – obj: str): The expected frequency of the timestamps, which should align with the pandas timestamp conventions (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).
- Returns:
- The dataframe subsets with duplicate
timestamps based on the original timestamp column, the localized timestamp column (
None
if the column does not exist), and the UTC-converted timestamp column (None
if the column does not exist).
- Return type:
(tuple[pd.Series, None | pd.Series, None | pd.Series])
- openoa.utils.qa.describe(df: DataFrame, **kwargs) DataFrame [source]#
Thin wrapper for
pd.DataFrame.describe()
, but transposes the results to be easier to read.- Parameters:
df ( – obj: pd.DataFrame): The resulting SCADA dataframe from
convert_datetime_column()
, otherwise the UTC and localized column checks will returnNone
.kwargs ( – obj: dict): Dictionary of additional arguments to pass to
df.describe()
.
- Returns:
The results of
df.describe().T
.- Return type:
pd.DataFrame
- openoa.utils.qa.daylight_savings_plot(df: DataFrame, local_tz: str, id_col: str, time_col: str, power_col: str, freq: str, hour_window: int = 3)[source]#
Produce a timeseries plot showing daylight savings events for each year of the SCADA data frame, highlighting potential duplications and gaps with the original timestamps compared against the UTC-converted timestamps.
- Parameters:
df ( – obj: pd.DataFrame): The resulting SCADA dataframe from
convert_datetime_column()
.local_tz( – obj: ‘String’): The
pytz
-compatible timezone for the inputtime_field
, by default UTC. This should be in the format of “Country/City” or “Region/City” such as “America/Denver” or “Europe/Paris”.id_col ( – obj: str): The string name of the turbine asset_id column in
df
, to ensure that duplicates aren’t based off multiple turbine’s data.time_col ( – obj: str): The string name of the timestamp column in
df
.power_col( – obj: ‘str’): String name of the power column in
df
.freq ( – obj: str): The expected frequency of the timestamps, which should align with the pandas timestamp conventions (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).
hour_window( – obj: ‘int’): number of hours, before and after the Daylight Savings Time transitions to view in the plot, by default 3.
- openoa.utils.qa.wtk_coordinate_indices(fn: h5pyd.File, latitude: float, longitude: float) tuple[float, float] [source]#
Finds the nearest x/y coordinates for a given latitude and longitude using the Proj4 library to find the nearest valid point in the Wind Toolkit coordinates database, and converts it to an (x, y) pair.
… note:: This relies on the Wind Toolkit HSDS API and h5pyd must be installed.
- Parameters:
fn ( – obj: h5pyd.File): The h5pyd file to be used for coordinate extraction.
latitude ( – obj: float): The latitude of the wind power plant’s center.
longitude ( – obj: float): The longitude of the wind power plant’s center.
- Returns:
- The nearest valid x and y coordinates to the provided latitude and
longitude.
- Return type:
tuple[float, float]
- openoa.utils.qa.wtk_diurnal_prep(latitude: float, longitude: float, fn: str = '/nrel/wtk-us.h5', start_date: str = '2007-01-01', end_date: str = '2013-12-31') Series [source]#
Links to the WIND Toolkit (WTK) data on AWS as a data source to capture the wind speed data and calculate the diurnal hourly averages.
- Parameters:
latitude ( – obj: float): The latitude of the wind power plant’s center.
longitude ( – obj: float): The longitude of the wind power plant’s center.
fn ( – obj: str, optional): The path and name of the WTK API file. Defaults to “/nrel/wtk-us.h5”.
start_date ( – obj: str, optional): Starting date for the WTK data. Defaults to “2007-01-01”.
end_date ( – obj: str, optional): Ending date for the WTK data. Defaults to “2013-12-31”.
- Raises:
IndexError – Raised if the latitude and longitude are not found within the WTK data set.
- Returns:
The diurnal hourly average wind speed.
- Return type:
pd.Series
- openoa.utils.qa.wtk_diurnal_plot(wtk_df: DataFrame | None, scada_df: DataFrame, time_col: str, power_col: str, *, latitude: float = 0, longitude: float = 0, fn: str = '/nrel/wtk-us.h5', start_date: str = '2007-01-01', end_date: str = '2013-12-31', return_fig: bool = False) None [source]#
Plots the WTK diurnal wind profile alongside the hourly power averages from the
scada_df
- Parameters:
wtk_df ( – obj: pd.DataFrame | None): The WTK diurnal profile data produced in wtk_diurnal_prep. If None, then this method will be run internally as the following keyword arguments are provided:
latitude
,longitude
,fn
,start_date
, andend_date
.scada_df ( – obj: pd.DataFrame | None): The SCADA data that was produced in
convert_datetime_column()
.time_col ( – obj: str): The name of the time column in
scada_df
.power_col ( – obj: str): The name of the power column in
scada_df
latitude ( – obj: float): The latitude of the wind power plant’s center.
longitude ( – obj: float): The longitude of the wind power plant’s center.
fn ( – obj: str, optional): WTK API file path and location. Defaults to “/nrel/wtk-us.h5”.
start_date ( – obj: str | None, optional): Starting date for the WTK data. If None, then it uses the starting date of
scada_df
. Defaults to None.end_date ( – obj: str | None, optional): Ending date for the WTK data. If None, then it uses the ending date of
scada_df
. Defaults to None.return_fig (
String
) – Indicator for if the figure and axes objects should be returned, by default False.
Filters#
This module provides functions for flagging pandas data series based on a range of criteria. The functions are largely intended for application in wind plant operational energy analysis, particularly wind speed vs. power curves.
- openoa.utils.filters.range_flag(data: DataFrame | Series, lower: float | list[float], upper: float | list[float], col: list[str] | None = None) Series | DataFrame [source]#
Flag data for which the specified data is outside the provided range of [lower, upper].
- Parameters:
data (
pandas.Series
| pandas.DataFrame) – data frame containing the column to be flagged; can either be apandas.Series
orpandas.DataFrame
. If apandas.DataFrame
, a list of threshold values and columns (if checking a subset of the columns) must be provided.col (
list[str]
) – column(s) in :pyattr:`data` to be flagged, by default None. Only required when the data is apandas.DataFrame
and a subset of the columns will be checked. Must be the same length aslower
andupper
.lower (
float
| list[float]) – lower threshold (inclusive) for each element ofdata
, if it’s apd.Series
, or the list of lower thresholds for each column in col. If the same threshold is applied to each column, then pass the single value, otherwise, it must be the same length ascol
andupper
.upper (
float
| list[float]) – upper threshold (inclusive) for each element ofdata
, if it’s apd.Series
, or the list of upper thresholds for each column incol
. If the same threshold is applied to each column, then pass the single value, otherwise, it must be the same length aslower
andcol
.
- Returns:
- Series or DataFrame (depending on
data
type) with boolean entries.
- Series or DataFrame (depending on
- Return type:
pandas.Series
| pandas.DataFrame
- openoa.utils.filters.unresponsive_flag(data: DataFrame | Series, threshold: int = 3, col: list[str] | None = None) Series | DataFrame [source]#
Flag time stamps for which the reported data does not change for threshold repeated intervals.
- Parameters:
data (
pandas.Series
| pandas.DataFrame) – data frame containing the column to be flagged; can either be a pandas.Series orpandas.DataFrame
. If apandas.DataFrame
, a list of threshold values and columns (if checking a subset of the columns) must be provided.col (
list[str]
) – column(s) in data to be flagged, by default None. Only required when the data is apandas.DataFrame
and a subset of the columns will be checked. Must be the same length aslower
andupper
.threshold (
int
) – number of intervals over which measurment does not change for each element ofdata
, regardless if it’s apd.Series
orpd.DataFrame
. Defaults to 3.
- Returns:
- Series or DataFrame (depending on
data
type) with boolean entries.
- Series or DataFrame (depending on
- Return type:
pandas.Series
| pandas.DataFrame
- openoa.utils.filters.std_range_flag(data: DataFrame | Series, threshold: float | list[float] = 2.0, col: list[str] | None = None) Series | DataFrame [source]#
- Flag time stamps for which the measurement is outside of the threshold number of standard deviations
from the mean across the data.
… note:: This method does not distinguish between asset IDs.
- Parameters:
data (
pandas.Series
| pandas.DataFrame) – data frame containing the column to be flagged; can either be apandas.Series
orpandas.DataFrame
. If apandas.DataFrame
, a list of threshold values and columns (if checking a subset of the columns) must be provided.col (
list[str]
) – column(s) indata
to be flagged, by default None. Only required when thedata
is a pandas.DataFrame and a subset of the columns will be checked. Must be the same length aslower
andupper
.threshold (
float
| list[float]) – multiplicative factor on the standard deviation ofdata
, if it’s apd.Series
, or the list of multiplicative factors on the standard deviation for each column incol
. If the same factor is applied to each column, then pass the single value, otherwise, it must be the same length ascol
andupper
.
- Returns:
- Series or DataFrame (depending on
data
type) with boolean entries.
- Series or DataFrame (depending on
- Return type:
pandas.Series
| pandas.DataFrame
- openoa.utils.filters.window_range_flag(window_col: str | Series = None, window_start: float = -inf, window_end: float = inf, value_col: str | Series = None, value_min: float = -inf, value_max: float = inf, data: DataFrame = None) Series [source]#
Flag time stamps for which measurement in window_col are within the range: [window_start, window_end], and the measurements in value_col are outside of the range [value_min, value_max].
- Parameters:
data (
pandas.DataFrame
) – data frame containing the columnswindow_col
and value_col, by default None.window_col (
str
| pandas.Series) – Name of the column or used to define the window range or the data as a pandas Series, by default None.window_start (
float
) – minimum value for the inclusive window, by default -np.inf.window_end (
float
) – maximum value for the inclusive window, by default np.inf.value_col (
str
| pandas.Series) – Name of the column used to define the value range or the data as a pandas Series, by default None.value_max (
float
) – upper threshold for the inclusive data range; default np.infvalue_min (
float
) – lower threshold for the inclusive data range; default -np.inf
- Returns:
Series with boolean entries.
- Return type:
pandas.Series
- openoa.utils.filters.bin_filter(bin_col: Series | str, value_col: Series | str, bin_width: float, threshold: float = 2, center_type: str = 'mean', bin_min: float = None, bin_max: float = None, threshold_type: str = 'std', direction: str = 'all', data: DataFrame = None)[source]#
Flag time stamps for which data in value_col when binned by data in bin_col into bins of width bin_width are outside the threhsold bin. The center_type of each bin can be either the median or mean, and flagging can be applied directionally (i.e. above or below the center, or both)
- Parameters:
bin_col (
pandas.Series
| str) – The Series or column indata
to be used for binning.value_col (
pandas.Series
) – The Series or column indata
to be flagged.bin_width (
float
) – Width of bin in units ofbin_col
threshold (
float
) – Outlier threshold (multiplicative factor of std of value_col in bin)bin_min (
float
) – Minimum bin value below which flag should not be appliedbin_max (
float
) – Maximum bin value above which flag should not be appliedthreshold_type (
str
) – Option to apply a ‘std’, ‘scalar’, or ‘mad’ (median absolute deviation) based thresholdcenter_type (
str
) – Option to use a ‘mean’ or ‘median’ center for each bindirection (
str
) – Option to apply flag only to data ‘above’ or ‘below’ the mean, by default ‘all’data (
pd.DataFrame
) – DataFrame containing bothbin_col
andvalue_col
, if data are part of the same DataFrame, by default None.
- Returns:
Array-like object with boolean entries.
- Return type:
pandas.Series(bool)
- openoa.utils.filters.cluster_mahalanobis_2d(data_col1: Series | str, data_col2: Series | str, n_clusters: int = 13, dist_thresh: float = 3.0, data: DataFrame = None) Series [source]#
K-means clustering of data into n_cluster clusters; Mahalanobis distance evaluated for each cluster and points with distances outside of dist_thresh are flagged; distinguishes between asset IDs.
- Parameters:
data_col1 (
pandas.Series
| str) – Series or columndata
corresponding to the first data column in a 2D cluster analysisdata_col2 (
pandas.Series
| str) – Series or columndata
corresponding to the second data column in a 2D cluster analysisn_clusters (
int
) – ‘ number of clusters to usedist_thresh (
float
) – maximum Mahalanobis distance within each cluster for data to be remain unflaggeddata (
pd.DataFrame
) – DataFrame containing bothdata_col1
anddata_col2
, if data are part of the same DataFrame, by default None.
- Returns:
Array-like object with boolean entries.
- Return type:
pandas.Series(bool)
Power Curve#
This module provides methods to fit power curve models and use them to make predictions about ‘ideal’ power generation.
This module holds ready-to-use power curve functions. They take windspeed and power columns as arguments and return a python function which can be used to evaluate the power curve at arbitrary locations.
- openoa.utils.power_curve.functions.IEC(windspeed_col: str | Series, power_col: str | Series, bin_width: float = 0.5, windspeed_start: float = 0, windspeed_end: float = 30.0, data: DataFrame = None) Callable [source]#
Use IEC 61400-12-1-2 method for creating a binned wind-speed power curve. Power is set to zero for values outside the cutoff range: [
windspeed_start
,windspeed_end
].- Parameters:
windspeed_col (
str
| pandas.Series) – Windspeed data, or the name of the column indata
.power_col (
str
| pandas.Series) – Power data, or the name of the column indata
.bin_width (
float
) – Width of windspeed bin. Defaults to 0.5 m/s, per the standard.windspeed_start (
float
) – Left edge of first windspeed bin. Defaults to 0.0.windspeed_end (
float
) – Right edge of last windspeed bin. Defaults to 30.0data (
pandas.DataFrame
, optional) – a pandas DataFrame containingwindspeed_col
andpower_col
. Defaults to None.
- Returns:
Python function of type (Array[float] -> Array[float]) implementing the power curve.
- Return type:
Callable
- openoa.utils.power_curve.functions.logistic_5_parametric(windspeed_col: str | Series, power_col: str | Series, data: DataFrame = None) Callable [source]#
In this case, the function fits the 5 parameter logistics function to observed data via a least-squares optimization (i.e. minimizing the sum of the squares of the residual between the points as evaluated by the parameterized function and the points of observed data).
Extra: The present implementation follows the filtering method reported in:
M. Yesilbudaku Partitional clustering-based outlier detection for power curve optimization of wind turbines 2016 IEEE International Conference on Renewable Energy Research and Applications (ICRERA), Birmingham, 2016, pp. 1080-1084.
and the power curve method developed and reviewed in:
M Lydia, AI Selvakumar, SS Kumar, GEP. Kumar Advanced algorithms for wind turbine power curve modeling IEEE Trans Sustainable Energy, 4 (2013), pp. 827-835
M. Lydia, S.S. Kumar, I. Selvakumar, G.E. Prem Kumar A comprehensive review on wind turbine power curve modeling techniques Renew. Sust. Energy Rev., 30 (2014), pp. 452-460
- Parameters:
windspeed_col (
str
| pandas.Series) – Windspeed data, or the name of the column indata
.power_col (
str
| pandas.Series) – Power data, or the name of the column indata
.data (
pandas.DataFrame
, optional) – a pandas DataFrame containingwindspeed_col
andpower_col
. Defaults to None.
- Returns:
Python function of type (Array[float] -> Array[float]) implementing the power curve.
- Return type:
function
- openoa.utils.power_curve.functions.gam(windspeed_col: str | Series, power_col: str | Series, n_splines: int = 20, data: DataFrame = None) Callable [source]#
Use the generalized additive model,
pygam.LinearGAM
to fit power to wind speed.- Parameters:
windspeed_col (
str
| pandas.Series) – Windspeed data, or the name of the column indata
.power_col (
str
| pandas.Series) – Power data, or the name of the column indata
.n_splines (
int
) – Number of splines to use in the fit. Defaults to 20.data (
pandas.DataFrame
, optional) – a pandas DataFrame containingwindspeed_col
andpower_col
. Defaults to None.
- Returns:
Python function of type (Array[float] -> Array[float]) implementing the power curve.
- Return type:
Callable
- openoa.utils.power_curve.functions.gam_3param(windspeed_col: str | Series, wind_direction_col: str | Series, air_density_col: str | Series, power_col: str | Series, n_splines: int = 20, data: DataFrame = None) Callable [source]#
Use a generalized additive model to fit power to wind speed, wind direction and air density.
- Parameters:
windspeed_col (
str
| pandas.Series) – Windspeed data, or the name of the column indata
.wind_direction_col (
str
| pandas.Series) – Wind direction data, or the name of the column indata
.air_density_col (
str
| pandas.Series) – Air density data, or the name of the column indata
.power_col (
str
| pandas.Series) – Power data, or the name of the column indata
.n_splines (
int
) – Number of splines to use in the fit. Defaults to 20.data (
pandas.DataFrame
, optional) – a pandas DataFrame containingwindspeed_col
,wind_direction_col
,air_density_col
, andpower_col
. Defaults to None.
- Returns:
Python function of type (Array[float] -> Array[float]) implementing the power curve.
- Return type:
Callable
Imputing#
This module provides methods for filling in null data with interpolated (imputed) values.
- openoa.utils.imputing.asset_correlation_matrix(data: DataFrame, value_col: str) DataFrame [source]#
Create a correlation matrix on a MultiIndex DataFrame with time (or a different alignment value) and asset_id values as its indices, respectively.
- Parameters:
data (
pandas.DataFrame
) – input data frame such asPlantData.scada
that uses a MultiIndex with a timestamp and asset_id column for indices, in that order.value_col (
str
) – the column containing the data values to be used when assessing correlation
- Returns:
Correlation matrix with <id_col> as index and column names
- Return type:
pandas.DataFrame
- openoa.utils.imputing.impute_data(target_col: str, reference_col: str, target_data: DataFrame | None = None, reference_data: DataFrame | None = None, align_col: str | None = None, method: str = 'linear', degree: int = 1, data: DataFrame | None = None) Series [source]#
Replaces NaN data in a target Pandas series with imputed data from a reference Panda series based on a linear regression relationship.
Steps include:
Merge the target and reference data frames on <align_col>, which is shared between the two
Determine the linear regression relationship between the target and reference data series
Apply that relationship to NaN data in the target series for which there is finite data in the reference series
Return the imputed results as well as the index matching the target data frame
- Parameters:
target_col (
str
) – the name of the column in eitherdata
ortarget_data
to be imputed.reference_col (
str
) – the name of the column in eitherdata
orreference_data
to be used for imputation.data (
pandas.DataFrame
) – input data frame such asPlantData.scada
that uses a MultiIndex with a timestamp and asset_id column for indices, in that order, by default None.target_data (
pandas.DataFrame
) – theDataFrame
with NaN data to be imputed.reference_data (
pandas.DataFrame
) – theDataFrame
to be used in imputationalign_col (
str
) – the name of the column that to jointarget_data
andreference_data
.
- Returns:
Copy of target_data_col series with NaN occurrences imputed where possible.
- Return type:
pandas.Series
- openoa.utils.imputing.impute_all_assets_by_correlation(data: DataFrame, impute_col: str, reference_col: str, asset_id_col: str = 'asset_id', r2_threshold: float = 0.7, method: str = 'linear', degree: int = 1)[source]#
Imputes NaN data in a Pandas data frame to the best extent possible by considering available data across different assets in the data frame. Highest correlated assets are prioritized in the imputation process.
Steps include:
Establish correlation matrix of specified data between different assets
For each asset in the data frame, sort neighboring assets by correlation strength
Then impute asset data based on available data in the highest correlated neighbor
If NaN data still remains in asset, move on to next highest correlated neighbor, etc.
- Continue until either:
There are no NaN data remaining in asset data
There are no more neighbors to consider
The neighboring asset does not meet the specified correlation threshold,
r2_threshold
- Parameters:
data (
pandas.DataFrame
) – input data frame such asPlantData.scada
that uses a MultiIndex with a timestamp and asset_id column for indices, in that order.impute_col (
str
) – the name of the column in data to be imputed.reference_col (
str
) – the name of the column in data to be used in imputation.asset_id_col( – obj:`str): The name of the asset_id column, should be one of the turinbe or tower index column names. Defaults to the turbine column name “asset_id”.
r2_threshold (
float
) – the correlation threshold for a neighboring assets to be considered valid for use in imputation, by default 0.7.method (
str
) – The imputation method, should be one of “linear” or “polynomial”, by default “linear”.degree (
int
) – The polynomial degree, i.e. linear is a 1 degree polynomial, by default 1
- Returns:
The imputation results
- Return type:
pandas.Series
Timeseries#
This module provides useful functions for processing timeseries data
- openoa.utils.timeseries.offset_to_seconds(offset: int | float | str | datetime64) int | float [source]#
Converts pandas datetime offset alias to its corresponding number of seconds.
- Parameters:
offset (
int
|float
|str
|numpy.datetime64
) – The pandas offset alias or numpy timestamp to be converted to seconds. If a number (int or float) is passed, then it must be in nanoseconds, the Pandas default.- Returns:
The number of seconds corresponding to
offset
.- Return type:
int
| float
- openoa.utils.timeseries.determine_frequency_seconds(data: DataFrame, index_col: str | None = None) int | float [source]#
Calculates the most common time difference between all non-duplicate timestamps and returns that difference in seconds.
- Parameters:
data (
pandas.DataFrame
) – The pandas DataFrame to determine the DatetimeIndex frequency.index_col (
str
| None, optional) – The name of the index column ifdata
uses a MultiIndex, otherwise leave as None. Defaults to None.
- Returns:
The number of seconds corresponding to
offset
.- Return type:
int
| float
- openoa.utils.timeseries.determine_frequency(data: DataFrame, index_col: str | None = None) str | int | float [source]#
Gets the offset alias from the datetime index of
data
, or calculates the most common time difference between all non-duplicate timestamps.- Parameters:
data (
pandas.DataFrame
) – The pandas DataFrame to determine the DatetimeIndex frequency.index_col (
str
| None, optional) – The name of the index column ifdata
uses a MultiIndex, otherwise leave as None. Defaults to None.
- Returns:
The offset string or number of seconds between timestamps.
- Return type:
str
|int
|float
- openoa.utils.timeseries.convert_local_to_utc(d: str | datetime, tz_string: str) datetime [source]#
Convert timestamps in local time to UTC. The function can only act on a single timestamp at a time, so for example use the .apply function in Pandas:
date_utc = df[‘time’].apply(convert_local_to_utc, args = (‘US/Pacific’,))
Also note that this function doesn’t solve the end of DST when times between 1:00-2:00 are repeated in November. Those dates are left repeated in UTC time and need to be shifted manually.
The function does address the missing 2:00-3:00 times at the start of DST in March
- Parameters:
d (
datetime.datetime
) – the local date, tzinfo must not be settz_string (
str
) – the local timezone
- Returns:
the local date converted to UTC time
- Return type:
datetime.datetime
- openoa.utils.timeseries.convert_dt_to_utc(dt_col: Series | str, tz_string: str, data: DataFrame = None) Series [source]#
- Converts a pandas
Series
of timestamps, string-formatted ordatetime.datetime
objects that are in a local timezone
tz_string
to a UTC encoded pandasSeries
.
- Parameters:
dt_col (
pandas.Series
| str) – A pandasSeries
of datetime objects or string-encoded timestamps, or a the name of the column in data.tz_string (str) – The string name for the expected timezone of the provided timestamps in
dt_col
.data (
pandas.DataFrame
, optional) – The pandasDataFrame
containing the timestamp column:dt_col
. Defaults to None.
- Returns:
_description_
- Return type:
pd.Series
- Converts a pandas
- openoa.utils.timeseries.find_time_gaps(dt_col: Series | str, freq: str, data: DataFrame = None) Series [source]#
Finds gaps in dt_col based on the expected frequency, freq, and returns them.
- Parameters:
dt_col (
pandas.Series
) – PandasSeries
ofdatetime.datetime
objects or the name of the column indata
.freq (
string
) – The expected frequency of the timestamps, which should align with the pandas timestamp conventions (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).data (
pandas.DataFrame
, optional) – The pandasDataFrame
containing the timestamp column:dt_col
. Defaults to None.
- Returns:
Series of missing time stamps in
datetime.datetime
format- Return type:
pandas.Series
- openoa.utils.timeseries.find_duplicate_times(dt_col: Series | str, data: DataFrame = None)[source]#
Find duplicate input data and report them. The first duplicated item is not reported, only subsequent duplicates.
- Parameters:
dt_col (
pandas.Series
| str) – Pandas series ofdatetime.datetime
objects or the name of the column indata
.data (
pandas.DataFrame
, optional) – The pandas DataFrame containing the timestamp column:dt_col
. Defaults to None.
- Returns:
Duplicates from input data
- Return type:
pandas.Series
- openoa.utils.timeseries.gap_fill_data_frame(data: DataFrame, dt_col: str, freq: str) DataFrame [source]#
Insert any missing timestamps into
data
while filling the data columns with NaNs.- Parameters:
data (
pandas.DataFrame
) – The dataframe with potentially missing timestamps.dt_col (
str
) – Name of the column in ‘data’ with timestamps.freq (
str
) – The expected frequency of the timestamps.
- Returns:
output data frame with NaN data for the data gaps
- Return type:
pandas.DataFrame
- openoa.utils.timeseries.percent_nan(col: Series | str, data: DataFrame = None)[source]#
Return percentage of data that are Nan or 1 if the series is empty.
- Parameters:
col (
pandas.Series
) – The pandas Series to be checked for NaNs, or the name of the column indata
.data (
pandas.DataFrame
, optional) – The pandasDataFrame
containing the timestamp column:col
. Defaults to None.
- Returns:
Percentage of NaN data in the data series
- Return type:
float
- openoa.utils.timeseries.num_days(dt_col: Series | str, data: DataFrame = None) int [source]#
Calculates the number of non-duplicate days in
dt_col
.- Parameters:
dt_col (
pandas.Series
| str) – A pandasSeries
with a timeseries index to be checked for the number of days contained in the data.data (
pandas.DataFrame
, optional) – The pandasDataFrame
containing the timestamp column:dt_col
and having a timeseries index. Defaults to None.
- Returns:
Number of days in the data
- Return type:
int
- openoa.utils.timeseries.num_hours(dt_col: Series | str, *, data: DataFrame = None) int [source]#
Calculates the number of non-duplicate hours in dt_col.
- Parameters:
dt_col (
pandas.Series
| str) – A pandasSeries
of timeseries data to be checked for the number of hours contained in the datadata (
pandas.DataFrame
, optional) – The pandas DataFrame containing the timestamp column:dt_col
. Defaults to None.
- Returns:
Number of hours in the data
- Return type:
int
Met Data Processing#
This module provides methods for processing meteorological data.
- openoa.utils.met_data_processing.wrap_180(x: float | ndarray | Series | DataFrame)[source]#
Converts an angle, an array of angles, or a pandas Series or DataFrame of angles in degrees to the range -180 to +180 degrees.
- Parameters:
x (float | np.ndarray | pd.Series | pd.DataFrame) – Input angle(s) (degrees)
- Returns:
- The input angle(s) converted to the range -180 to +180 degrees, returned
as a float or numpy array (degrees)
- Return type:
float | np.ndarray
- openoa.utils.met_data_processing.circular_mean(x: DataFrame | Series | ndarray, axis: int = 0)[source]#
Compute circular mean of wind direction data for a pandas Series or 1-dimensional numpy array, or along any dimension of a multi-dimensional pandas DataFrame or numpy array
- Parameters:
x (pd.DataFrame | pd.Series | np.ndarray) – A pandas DataFrame or Series, or a numpy array containing wind direction data in degrees.
axis (int) – The axis to which the circular mean will be applied. This value must be less than the number of dimensions in
x
. Defaults to 0.
- Returns:
- The circular mean of the wind directions along the specified
axis between 0 and 360 degrees (degrees).
- Return type:
pd.Series | float | np.ndarray
- openoa.utils.met_data_processing.compute_wind_direction(u: Series | str, v: Series | str, data: DataFrame = None) Series [source]#
Compute wind direction given u and v wind vector components
- Parameters:
u (
pandas.Series
| str) – A pandasSeries
of the zonal component of the wind, in m/s, or the name of the column indata
.v (
pandas.Series
| str) – A pandasSeries
of the meridional component of the wind, in m/s, or the name of the column indata
.data (
pandas.DataFrame
) – The pandasDataFrame
containg the columnsu
andv
.
- Returns:
wind direction; units of degrees
- Return type:
pandas.Series
- openoa.utils.met_data_processing.compute_u_v_components(wind_speed: Series | str, wind_dir: Series | str, data: DataFrame = None) Series [source]#
Compute vector components of the horizontal wind given wind speed and direction
- Parameters:
wind_speed (
pandas.Series
| str) – A pandasSeries
of the horizontal wind speed, in m/s, or the name of the column indata
.wind_dir (
pandas.Series
| str) – A pandas Series of the wind direction, in degrees, or the name of the column indata
.data (
pandas.DataFrame
) – The pandasDataFrame
containg the columnswind_speed
andwind_dir
.
- Raises:
ValueError – Raised if any of the
wind_speed
orwind_dir
values are negative.- Returns:
u(pandas.Series): the zonal component of the wind; units of m/s. v(pandas.Series): the meridional component of the wind; units of m/s
- Return type:
(tuple)
- openoa.utils.met_data_processing.compute_air_density(temp_col: Series | str, pres_col: Series | str, humi_col: Series | str = None, data: DataFrame = None)[source]#
Calculate air density from the ideal gas law based on the definition provided by IEC 61400-12 given pressure, temperature and relative humidity.
This function assumes temperature and pressure are reported in standard units of measurement (i.e. Kelvin for temperature, Pascal for pressure, humidity has no dimension).
Humidity values are optional. According to the IEC a humiditiy of 50% (0.5) is set as default value.
- Parameters:
temp_col (
pandas.Series
| str) – A pandasSeries
of the temperature values, in Kelvin, or the name of the column indata
.pres_col (
pandas.Series
| str) – A pandas Series of the pressure values, in Pascals, or the name of the column indata
.humi_col (
pandas.Series
| str) – An optional pandas Series of the relative humidity values, as a decimal in the range (0, 1), or the name of the column indata
. Defaults to None.data (
pandas.DataFrame
) – The pandasDataFrame
containg the columnstemp_col
andpres_col
, and optionallyhumi_col
.
- Raises:
ValueError – Raised if any of the
temp_col
orpres_col
, orhumi_col
values are negative.- Returns:
Rho, calcualted air density; units of kg/m3
- Return type:
pandas.Series
- openoa.utils.met_data_processing.pressure_vertical_extrapolation(p0: Series | str, temp_avg: Series | str, z0: Series | str, z1: Series | str, data: DataFrame = None) Series [source]#
Extrapolate pressure from height z0 to height z1 given the average temperature in the layer. The hydostatic equation is used to peform the extrapolation.
- Parameters:
p0 (
pandas.Series
) – A pandasSeries
of the pressure at height z0, in Pascals, or the name of the column in data.temp_avg (
pandas.Series
) – A pandasSeries
of the mean temperature between z0 and z1, in Kelvin, or the name of the column indata
.z0 (
pandas.Series
) – A pandasSeries
of the height above surface, in meters, or the name of the column indata
.z1 (
pandas.Series
) – A pandasSeries
of the extrapolation height, in meters, or the name of the column indata
.data (
pandas.DataFrame
) – The pandasDataFrame
containg the columnsp0
,temp_avg
,z0
, andz1
.
- Raises:
ValueError – Raised if any of the
p0
ortemp_avg
values are negative.- Returns:
p1
, extrapolated pressure atz1
, in Pascals- Return type:
pandas.Series
- openoa.utils.met_data_processing.air_density_adjusted_wind_speed(wind_col: Series | str, density_col: Series | str, data: DataFrame = None) Series [source]#
Apply air density correction to wind speed measurements following IEC-61400-12-1 standard
- Parameters:
wind_col (
pandas.Series
| str) – A pandas Series containing the wind speed data, in m/s, or the name of the column indata
density_col (
pandas.Series
| str) – A pandas Series containing the air density data, in kg/m3, or the name of the column indata
data (
pandas.DataFrame
) – The pandas DataFrame containg the columnswind_col
anddensity_col
.
- Returns:
density-adjusted wind speeds, in m/s
- Return type:
pandas.Series
- openoa.utils.met_data_processing.compute_turbulence_intensity(mean_col: Series | str, std_col: Series | str, data: DataFrame = None) Series [source]#
Compute turbulence intensity
- Parameters:
mean_col (
pandas.Series
| str) – A pandasSeries
containing the wind speed mean data, in m/s, or the name of the column indata
.std_col (
pandas.Series
| str) – A pandasSeries
containing the wind speed standard deviation data, in m/s, or the name of the column indata
.data (
pandas.DataFrame
) – The pandas DataFrame containg the columns :py:attr:mean_col andstd_col
.
- Returns:
turbulence intensity, (unitless ratio)
- Return type:
pd.Series
- openoa.utils.met_data_processing.compute_shear(data: DataFrame, ws_heights: dict[str, float], return_reference_values: bool = False) Series | tuple[Series, float, Series] [source]#
Computes shear coefficient between wind speed measurements using the power law. The shear coefficient is obtained by evaluating the expression for an OLS regression coefficient.
- Parameters:
data (
pandas.DataFrame
) – A pandasDataFrame
with wind speed columns that correspond to the keys ofws_heights
.ws_heights (
dict[str, float]
) – A dictionary with wind speed column names ofdata
as keys and their respective sensor heights (m) as values.return_reference_values( – obj: bool): If True, this function returns a three element tuple where the first element is the array of shear exponents, the second element is the reference height (float), and the third element is the array of reference wind speeds. These reference values can be used for extrapolating wind speed. Defaults to False.
- Returns:
- If
return_reference_values
is False, return just the shear coefficient (unitless), else return the shear coefficent (unitless), reference height (m), and reference wind speed (m/s).
- Return type:
pandas.Series
|tuple[pandas.Series, float, pandas.Series]
- openoa.utils.met_data_processing.extrapolate_windspeed(v1: Series | str, z1: float, z2: float, shear: Series | str, data: DataFrame = None)[source]#
Extrapolates wind speed vertically using the Power Law.
- Parameters:
v1( – obj: pandas.Series | float | str): A pandas
Series
of the wind speed measurements at the reference height, or the name of the column indata
.z1 (
float
) – Height of reference wind speed measurements; units in metersz2 (
float
) – Target extrapolation height; units in metersshear( – obj: pandas.Series | float | str): A pandas
Series
of the shear values, or the name of the column indata
.data (
pandas.DataFrame
) – The pandasDataFrame
containg the columnsv1
andshear
.
- Returns:
obj: (pandas.Series | numpy.array | float): Wind speed extrapolated to target height.
- openoa.utils.met_data_processing.compute_veer(wind_a: Series | str, height_a: float, wind_b: Series | str, height_b: float, data: DataFrame = None)[source]#
Compute veer between wind direction measurements
- Parameters:
wind_a (
pandas.Series
| str) – A pandasSeries
containing the wind direction mean data, in degrees, or the name of the column indata
.height_a (
float
) – sensor height forwind_a
wind_b (
pandas.Series
| str) – A pandasSeries
containing the wind direction mean data, in degrees, or the name of the column indata
.height_b (
float
) – sensor height forwind_b
data (
pandas.DataFrame
) – The pandasDataFrame
containg the columnswind_a
, andwind_b
.
- Returns:
veer (deg/m)
- Return type:
veer(
array
)
Metadata Fetch#
This module fetches metadata of wind farms
- openoa.utils.metadata_fetch.fetch_eia(api_key: str, plant_id: str, file_path: str | Path, plant_file: str | Path, plant_sheet: str | Path, wind_file: str | Path, wind_sheet: str | Path)[source]#
- Read in EIA data of wind farm of interest:
from EIA API for monthly productions, return monthly net energy generation time series
from local Excel files for wind farm metadata, return dictionary of metadata
- Parameters:
api_key (
str
) – 32-character user-specific API key, obtained from EIA.plant_id (
str
) – 5-character EIA power plant code.file_path (
str
) – Directory with EIA metadata .xlsx files.plant_file (
str
| Path) – Name of the plant metadata Excel file infile_path
. Formerly hard-coded to: “2___Plant_Y2017.xlsx”.plant_sheet (
str
) – The name of the sheet containing the data inplant_file
. Formerly hard-coded as “Plant”.wind_file (
str
| Path) – Name of the wind metadata Excel file infile_path
. Formerly hard-coded to: “”3_2_Wind_Y2017.xlsx”.wind_sheet (
str
) – The name of the sheet containing the data inplant_file
. Formerly hard-coded as “Operable”.
- Returns:
monthly net energy generation in MWh
dictionary
: metadata of the wind farm with ‘plant_id’- Return type:
pandas.Series
- openoa.utils.metadata_fetch.attach_eia_data(project: PlantData, api_key: str, plant_id: str, file_path: str | Path, plant_file: str | Path, plant_sheet: str | Path, wind_file: str | Path, wind_sheet: str | Path)[source]#
Assign EIA meta data to PlantData object, which is by default an empty dictionary.
- Parameters:
project (
PlantData
) – PlantData object for a particular projectapi_key (
str
) – 32-character user-specific API key, obtained from EIA.plant_id (
str
) – 5-character EIA power plant code.file_path (
str
) – Directory with EIA metadata .xlsx files.plant_file (
str
| Path) – Name of the plant metadata Excel file infile_path
. Formerly hard-coded to: “2___Plant_Y2017.xlsx”.plant_sheet (
str
) – The name of the sheet containing the data inplant_file
.wind_file (
str
| Path) – Name of the wind metadata Excel file infile_path
. Formerly hard-coded to: “”3_2_Wind_Y2017.xlsx”.wind_sheet (
str
) – The name of the sheet containing the data inplant_file
.
- Returns:
(None)
Unit Conversion#
This module provides basic methods for unit conversion and calculation of basic wind plant variables
- openoa.utils.unit_conversion.convert_power_to_energy(power_col: str | Series, sample_rate_min='10min', data: DataFrame = None) Series [source]#
Compute energy [kWh] from power [kw] and return the data column
- Parameters:
power_col (
str
|pandas.Series
) – The power data, in kW, or the name of the column indata
.sample_rate_min (
float
) – Sampling rate as a pandas offset alias, in minutes, to use for conversion. Defaults to “10min.data (
pandas.DataFrame
) – The pandas DataFrame containing the colpower_col
.
- Returns:
Energy in kWh that matches the length of the input data frame :py:attr:’df’
- Return type:
pandas.Series
- openoa.utils.unit_conversion.compute_gross_energy(net_energy: str | Series, availability: str | Series, curtailment: str | Series, availability_type: str = 'frac', curtailment_type: str = 'frac', data: str | DataFrame = None)[source]#
Computes gross energy for a wind plant or turbine by adding reported
availability
andcurtailment
losses to reported net energy.- Parameters:
net_energy (
str
| pandas.Series) – A pandas Series, the name of the columnn indata
corresponding to the reported net energy for wind plant or turbine.availability (
str
| pandas.Series) – A pandas Series, the name of the columnn indata
corresponding to the reported availability losses for wind plant or turbinecurtailment (
str
| pandas.Series) – A pandas Series, the name of the columnn indata
corresponding to the reported curtailment losses for wind plant or turbineavailability_type (
str
) – Either one of “frac” or “energy” corresponding to if the data provided inavailability
is in the range of [0, 1], or representing the energy lost.curtailment_type (
str
) – Either one of “frac” or “energy” corresponding to if the data provided incurtailment
is in the range of [0, 1], or representing the energy lost.data (
pd.DataFrame
, optional) – The pandas DataFrame containing the net_energy, availability, and curtailment columns.
- Returns:
Calculated gross energy for wind plant or turbine
- Return type:
gross(
pandas.Series
)
- openoa.utils.unit_conversion.convert_feet_to_meter(variable: str | Series, data: DataFrame = None)[source]#
Compute variable in [meter] from [feet] and return the data column
- Parameters:
variable (
str
| pandas.Series) – A pandas Series, the name of the columnn indata
corresponding to the data needing to be converted to meters.data (
pandas.DataFrame
) – The pandas DataFrame containing the columnvariable
.variable (
string
) – variable in feet
- Returns:
variable
in meters- Return type:
pandas.Series
Plotting#
This module provides helpful functions for creating various plots
- openoa.utils.plot.set_styling() None [source]#
Sets some of the matplotlib plotting styling to be consistent throughout any module where plotting is implemented.
- openoa.utils.plot.map_wgs84_to_cartesian(longitude_origin: ndarray[Any, dtype[float64]] | float, latitude_origin: ndarray[Any, dtype[float64]] | float, longitude_points: ndarray[Any, dtype[float64]] | Series | float, latitude_points: ndarray[Any, dtype[float64]] | Series | float) tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]] | tuple[Series, Series] | tuple[float, float] [source]#
Maps WGS-84 latitude and longitude to local cartesian coordinates using an origin coordinate pair.
- Parameters:
longitude_origin (
numpy array of shape (1, ) | float
) – longitude of cartesian coordinate system origin.latitude_origin (
numpy array of shape (1, ) | float
) – latitude of cartesian coordinate system origin.longitude_points (
numpy array of shape (n, ) | pd.Series | float
) – longitude(s) of points of interest.latitude_points (
numpy array of shape (n, ) | pd.Series | float
) – latitude(s) of points of interest.
- Returns:
Tuple representing cartesian coordinates (x, y); returned as a tuple of numpy arrays, pandas Series, or scalars, dependent upon the originally passed data.
- openoa.utils.plot.luminance(rgb: tuple[int, int, int])[source]#
Calculates the brightness of an rgb 255 color. See https://en.wikipedia.org/wiki/Relative_luminance
- Parameters:
rgb (
tuple
) – Tuple of red, gree, and blue values in the range of 0-255.- Returns:
relative luminance.
- Return type:
luminance(
int
)
Example
>>> rgb = (255,127,0) >>> luminance(rgb) 0.5687976470588235 >>> luminance((0,50,255)) 0.21243529411764706
- openoa.utils.plot.color_to_rgb(color: str | tuple[int, int, int])[source]#
Converts named colors, hex and normalised RGB to 255 RGB values
- Parameters:
color (
color
) – RGB, HEX or named color.- Returns:
255 RGB values.
- Return type:
rgb(
tuple
)
Example
>>> color_to_rgb("Red") (255, 0, 0) >>> color_to_rgb((1,1,0)) (255,255,0) >>> color_to_rgb("#ff00ff") (255,0,255)
- openoa.utils.plot.plot_windfarm(asset_df, tile_name='OpenMap', plot_width=800, plot_height=800, marker_size=14, figure_kwargs={}, marker_kwargs={})[source]#
Plot the windfarm spatially on a map using the Bokeh plotting libaray.
- Parameters:
asset_df (
pd.DataFrame
) – PlantData.asset object containing the asset metadata.tile_name (
str
) – tile set to be used for the underlay, e.g. OpenMap, ESRI, OpenTopoMapplot_width (
int
) – width of plotplot_height (
int
) – height of plotmarker_size (
int
) – size of markersfigure_kwargs (
dict
) – additional figure options for advanced users, see Bokeh docsmarker_kwargs (
dict
) – additional marker options for advanced users, see Bokeh docs. We have some custom behavior around the “fill_color” attribute. If “fill_color” is not defined, OpenOA will use an internally defined color pallete. If “fill_color” is the name of a column in the asset table, OpenOA will use the value of that column as the marker color. Otherwise, “fill_color” is passed through to Bokeh.
- Returns:
windfarm map
- Return type:
Bokeh_plot(
axes handle
)
Example
import pandas as pd from bokeh.plotting import figure, output_file, show from openoa.utils.plot import plot_windfarm from examples import project_ENGIE # Load plant object project = project_ENGIE.prepare("../examples/data/la_haute_borne") # Create the bokeh wind farm plot show(plot_windfarm(project.asset, tile_name="ESRI", plot_width=600, plot_height=600))
- openoa.utils.plot.plot_by_id(df: DataFrame, id_col: str, x_axis: str, y_axis: str, max_cols: int = 4, xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), xlabel: str | None = None, ylabel: str | None = None, return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}) None [source]#
Function to plot any two fields against each other in a dataframe with unique plots for each asset_id.
- Parameters:
df (
pd.DataFrame
) – The dataframe for comparing values.id_col (
str
) – The asset_id column (or index column) in df.x_axis (
str
) – Independent variable to plot, should align with a column indf
.y_axis (
str
) – Dependent variable to plot, should align with a column indf
.max_cols (
int
, optional) – The maximum number of columns in the plot. Defaults to 4.xlim (
tuple[float, float]
, optional) – A tuple of the x-axis (min, max) values. Defaults to (None, None).ylim (
tuple[float, float]
, optional) – A tuple of the y-axis (min, max) values. Defaults to (None, None).xlabel (
str
| None) – The x-axis label, if None, thenx_axis
will be used. Defaults to None.ylabel (
str
| None) – The y-axis label, if None, thenx_axis
will be used. Defaults to None.return_fig (
bool
, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.figure_kwargs (
dict
, optional) – Additional keyword arguments that should be passed to plt.figure(). Defaults to {}.plot_kwargs (
dict
, optional) – Additional keyword arguments that should be passed to ax.scatter. Defaults to {}.
- Returns:
(
None
)
- openoa.utils.plot.column_histograms(df: DataFrame, columns: list | None = None, return_fig: bool = False)[source]#
Produces a histogram plot for each numeric column in
df
.- Parameters:
df (
pd.DataFrame
) – The dataframe for plotting.return_fig (
bool
) – Indicator for if the figure and axes objects should be returned, by default False.
- Returns:
(None)
- openoa.utils.plot.plot_power_curve(wind_speed: Series, power: Series, flag: ndarray | Series, flag_labels: tuple[str, str] = ('Flagged Readings', 'Power Curve'), xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), legend: bool = False, return_fig: bool = False, figure_kwargs: dict = {}, legend_kwargs: dict = {}, scatter_kwargs: dict = {}) None | tuple[Figure, Axes] [source]#
Plots the individual points on a power curve, with an optional
flag
filtering for singling out readings in the figure. If flag is all false values then no overlaid flagge scatter points will be created.- Parameters:
wind_speed (
pandas.Series
) – A pandas Series or numpy array of the recorded wind speeds, in m/s.power (
pandas.Series
| np.ndarray) – A pandas Series or numpy array of the recorded power, in kW.flag (
numpy.ndarray
| pd.Series) – A pandas Series or numpy array of booleans for which points to flag in the windspeed and power data.flag_labels (
tuple[str, str]
, optional) – The labels to give to the scatter points, corresponding to the flagged points and raw points, respectively. Defaults to (“Flagged Readings”, “Power Curve”).xlim (
tuple[float, float]
, optional) – A tuple of the x-axis (min, max) values. Defaults to (None, None).ylim (
tuple[float, float]
, optional) – A tuple of the y-axis (min, max) values. Defaults to (None, None).legend (
bool
, optional) – Set to True to place a legend in the figure, otherwise set to False. Defaults to False.return_fig (
bool
, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.figure_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toplt.figure()
. Defaults to {}.scatter_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toax.scatter()
. Defaults to {}.legend_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toax.legend()
. Defaults to {}.
- Returns:
_description_
- Return type:
None | tuple[plt.Figure, plt.Axes]
- openoa.utils.plot.plot_monthly_reanalysis_windspeed(data: dict[str, DataFrame], windspeed_col: str, plant_por: tuple[datetime, datetime], normalize: bool = True, xlim: tuple[datetime, datetime] = (None, None), ylim: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}, legend_kwargs: dict = {}) None | tuple[Figure, Axes] [source]#
Make a plot of the normalized annual average wind speeds from reanalysis data to show general trends for each, and highlighting the period of record for the plant data.
- Parameters:
data (
dict[pandas.DataFrame]
) – The dictionary of reanalysis dataframes.windspeed_col (
str
) – The name of the column for the windspeed data to be plot.plot_por (
tuple[datetime.datetime, datetime.datetime]
) – The start and end datetimes for a plant’s period of record (POR).normalize (
bool
) – Indicator of if the windspeeds shoudld be normalized (True), or not (False). Defaults to True.xlim (
tuple[datetime.datetime, datetime.datetime]
, optional) – A tuple of datetimes representing the x-axis plotting display limits. Defaults to (None, None).ylim (
tuple[float, float]
, optional) – A tuple of the y-axis plotting display limits. Defaults to (None, None).return_fig (
bool
, optional) – Flag to return the figure and axes objects. Defaults to False.figure_kwargs (
dict
, optional) – Additional figure instantiation keyword arguments that are passed toplt.figure()
. Defaults to {}.plot_kwargs (
dict
, optional) – Additional plotting keyword arguments that are passed toax.plot()
. Defaults to {}.legend_kwargs (
dict
, optional) – Additional legend keyword arguments that are passed toax.legend()
. Defaults to {}.
- Returns:
- If
return_fig
is True, then the figure and axes objects are returned for further tinkering/saving.
- If
- Return type:
None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes]
- openoa.utils.plot.plot_plant_energy_losses_timeseries(data: DataFrame, energy_col: str, loss_cols: list[str], energy_label: str, loss_labels: list[str], xlim: tuple[datetime, datetime] = (None, None), ylim_energy: tuple[float, float] = (None, None), ylim_loss: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}, legend_kwargs: dict = {})[source]#
Plot timeseries of energy, and the loss categories of interest.
- Parameters:
data (
pandas.DataFrame
) – A pandas DataFrame containing energy production and losses.energy_col (
str
) – The name of the column indata
containing the energy production.loss_cols (
list[str]
) – The name(s) of the column(s) indata
containing the loss data.energy_label (
str
) – The legend label and y-axis label for the energy plot.loss_labels (
list[str]
) – The legend labels losses plot.xlim (
tuple[datetime.datetime, datetime.datetime]
, optional) – A tuple of datetimes representing the x-axis plotting display limits. Defaults to None.ylim_energy (
tuple[float, float]
, optional) – A tuple of the y-axis plotting display limits for the gross energy plot (top figure). Defaults to None.ylim_loss (
tuple[float, float]
, optional) – A tuple of the y-axis plotting display limits for the loss plot (bottom figure). Defaults to (None, None).return_fig (
bool
, optional) – Flag to return the figure and axes objects. Defaults to False.figure_kwargs (
dict
, optional) – Additional figure instantiation keyword arguments that are passed toplt.figure()
. Defaults to {}.plot_kwargs (
dict
, optional) – Additional plotting keyword arguments that are passed toax.plot()
. Defaults to {}.legend_kwargs (
dict
, optional) – Additional legend keyword arguments that are passed toax.legend()
. Defaults to {}.
- Returns:
If
return_fig
is True, then the figure and axes objects are returned for further tinkering/saving.- Return type:
None | tuple[matplotlib.pyplot.Figure, tuple[matplotlib.pyplot.Axes, matplotlib.pyplot.Axes]]
- openoa.utils.plot.plot_distributions(data: DataFrame, which: list[str], xlabels: list[str], xlim: tuple[tuple[float, float], ...] | None = None, ylim: tuple[tuple[float, float], ...] | None = None, return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs: dict = {}, annotate_kwargs: dict = {}, title: str | None = None) None | tuple[Figure, Axes] [source]#
Plot a distribution of AEP values from the Monte-Carlo OA method
- Parameters:
aep (
pandas.DataFrame
) – The pandas DataFrame of results data.which – (
list[str]
): The list of columns in data that should have their distributions plot.xlabels – (obj:list[str]): The list of x-axis labels
xlim (
tuple[tuple[float, float], ...]
, optional) – A tuple of tuples (or None) corresponding to each of elements ofwhich
that get passed toax.set_xlim()
. Defaults to None.ylim (
tuple[tuple[float, float], ...]
, optional) – A tuple of tuples (or None) corresponding to each of elements ofwhich
that get passed toax.set_ylim()
. Defaults to None.return_fig (
bool
, optional) – Flag to return the figure and axes objects. Defaults toFalse
.figure_kwargs (
dict
, optional) – Additional figure instantiation keyword arguments that are passed to`plt.figure()`
. Defaults to {}.plot_kwargs (
dict
, optional) – Additional plotting keyword arguments that are passed toax.hist()
. Defaults to {}.annotate_kwargs (
dict
, optional) – Additional annotation keyword arguments that are passed toax.annotate()
. Defaults to {}.title ( – str:, optional): Title to place over all subplots.
- Returns:
- If
return_fig
is True, then the figure and axes objects are returned for further tinkering/saving.
- If
- Return type:
None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes]
- openoa.utils.plot.plot_boxplot(x: Series, y: Series, xlabel: str, ylabel: str, ylim: tuple[float | None, float | None] = (None, None), with_points: bool = False, points_label: str | None = None, return_fig: bool = False, figure_kwargs: dict = {}, plot_kwargs_box: dict = {}, plot_kwargs_points: dict = {}, legend_kwargs: dict = {}) None | tuple[Figure, Axes] [source]#
Plot box plots of AEP results sliced by a specified Monte Carlo parameter
- Parameters:
x (
pandas.Series
) – The data that splits the results in y.y (
pandas.Series
) – The resulting data to be splity by x.xlabel (
str
) – The x-axis label.ylabel (
str
) – The y-axis label.ylim (
tuple[float, float]
, optional) – A tuple of the y-axis plotting display limits. Defaults to None.with_points (
bool
, optional) – Flag to plot the individual points like a seabornswarmplot
. Defaults to False.points_label (
bool
| None, optional) – Legend label for the points, if plotting. Defaults to None.return_fig (
bool
, optional) – Flag to return the figure and axes objects. Defaults to False.figure_kwargs (
dict
, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to {}.plot_kwargs_box (
dict
, optional) – Additional plotting keyword arguments that are passed toax.boxplot()
. Defaults to {}.plot_kwargs_points (
dict
, optional) – Additional plotting keyword arguments that are passed toax.boxplot()
. Defaults to {}.legend_kwargs (
dict
, optional) – Additional legend keyword arguments that are passed toax.legend()
. Defaults to {}.
- Returns:
- If
return_fig
is True, then the figure object, axes object, and a dictionary of the boxplot objects are returned for further tinkering/saving.
- If
- Return type:
None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes, dict]
- openoa.utils.plot.plot_waterfall(data: list[float] | ndarray[Any, dtype[float64]], index: list[str], ylabel: str | None = None, ylim: tuple[float, float] = (None, None), return_fig: bool = False, plot_kwargs: dict = {}, figure_kwargs: dict = {}) None | tuple [source]#
Produce a waterfall plot showing the progression from the EYA estimates to the calculated OA estimates of AEP.
- Parameters:
data (array-like) – data to be used to create waterfall.
index (
list
) – List of string values to be used for x-axis labels, which should have one more value than the number of points indata
to account for the calculated OA total.ylabel (
str
) – The y-axis label. Defaults to None.ylim (
tuple[float | None, float | None]
) – The y-axis minimum and maximum display range. Defaults to (None, None).return_fig (
bool
, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.figure_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toplt.figure()
. Defaults to {}.plot_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toax.plot()
. Defaults to {}.legend_kwargs (
dict
, optional) – Additional keyword arguments that should be passed to` ax.legend()`. Defaults to {}.
- Returns:
- If
return_fig
, then return the figure and axes objects in addition to showing the plot.
- If
- Return type:
None | tuple[plt.Figure, plt.Axes]
- openoa.utils.plot.plot_power_curves(data: dict[str, DataFrame], power_col: str, windspeed_col: str, flag_col: str | None = None, turbines: list[str] | None = None, flag_labels: tuple[str, str] = ('Flagged Readings', 'Power Curve'), max_cols: int = 3, xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), legend: bool = False, return_fig: bool = False, figure_kwargs: dict = {}, legend_kwargs: dict = {}, plot_kwargs: dict = {})[source]#
Plots a series of power curves for a dictionary of turbine data, allowing for an optional filtering for singling out readings in the figure.
- Parameters:
data (
dict[str, pd.DataFrame]
) – The dictionary of turbine IDs and and SCADA data.wind_speed_col (
pandas.Series
) – A pandas Series or numpy array of the recorded wind speeds, in m/s.power_col (
pandas.Series
|np.ndarray
) – A pandas Series or numpy array of the recorded power, in kW.flag_col (
np.ndarray
|pd.Series
) – A pandas Series or numpy array of booleans for which points to flag in the windspeed and power data.turbines (
list[str]
, optional) – The list of turbines to be plot, if not all of the keys indata
.flag_labels (
tuple[str, str]
, optional) – The labels to give to the scatter points, corresponding to the flagged readings and raw readings, respectively. Defaults to (“Flagged Readings”, “Power Curve”).max_cols (
int
, optional) – The maximum number of columns in the plot. Defaults to 3.xlim (
tuple[float, float]
, optional) – A tuple of the x-axis (min, max) values. Defaults to (None, None).ylim (
tuple[float, float]
, optional) – A tuple of the y-axis (min, max) values. Defaults to (None, None).legend (
bool
, optional) – Set to True to place a legend in the figure, otherwise set to False. Defaults to False.return_fig (
bool
, optional) – Set to True to return the figure and axes objects, otherwise set to False. Defaults to False.figure_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toplt.figure()
. Defaults to {}.plot_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toax.scatter()
. Defaults to {}.legend_kwargs (
dict
, optional) – Additional keyword arguments that should be passed toax.legend()
. Defaults to {}.
- Returns:
- Returns the figure and axes objects if
return_fig
is True.
- Return type:
None | tuple[plt.Figure, plt.Axes]
- openoa.utils.plot.plot_wake_losses(bins: ndarray[Any, dtype[float64]], efficiency_data_por: ndarray[Any, dtype[float64]], efficiency_data_lt: ndarray[Any, dtype[float64]], energy_data_por: ndarray[Any, dtype[float64]] | None = None, energy_data_lt: ndarray[Any, dtype[float64]] | None = None, bin_axis_label: str = 'wd', turbine_id: str | None = None, xlim: tuple[float, float] = (None, None), ylim_efficiency: tuple[float, float] = (None, None), ylim_energy: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict | None = None, plot_kwargs_line: dict = {}, plot_kwargs_fill: dict = {}, legend_kwargs: dict = {})[source]#
Plots wake losses in the form of wind farm efficiency as well as normalized wind plant energy production for both the period of record and with the long-term correction as a function of either wind direction or wind speed. If the data arguments contain two dimensions, 95% confidence intervals will be plotted for each variable.
- Parameters:
bins (
np.ndarray
) – Wind direction or wind speed bin values representing the x-axis in the plots.efficiency_data_por (
np.ndarray
) – 1D or 2D array containing wind farm or wind turbine efficiency for the period of record for each bin in thebins
argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.efficiency_data_lt (
np.ndarray
) – 1D or 2D array containing long-term corrected wind farm or wind turbine efficiency for each bin in thebins
argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.energy_data_por (
np.ndarray
, optional) – Optional 1D or 2D array containing normalized energy production for the period of record for each bin in the bins argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted. If a value of None is provided, normalized energy will not be plotted. Defaults to None.energy_data_lt (
np.ndarray
, optional) – Optional 1D or 2D array containing normalized long-term corrected energy production for each bin in thebins
argument. If a 2D array is provided, the second dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted. If a value of None is provided, normalized energy will not be plotted. Defaults to None.bin_axis_label (str, optional) – The label to use for the bin variable (x) axis. Defaults to None.
turbine_id (str, optional) – Name of turbine if data are provided for a single wind turbine. Used to determine title and plot axis labels. Defaults to None.
xlim (
tuple[float, float]
, optional) – A tuple of floats representing the x-axis wind direction plotting display limits (degrees). Defaults to (None, None).ylim_efficiency (
tuple[float, float]
, optional) – A tuple of the y-axis plotting display limits for the wind farm efficiency plot (top plot). Defaults to (None, None).ylim_energy (
tuple[float, float]
, optional) – Ifenergy_data_por
andenergy_data_lt
arguments are provided, a tuple of the y-axis plotting display limits for the wind farm energy distribution plot (bottom plot). Defaults to (None, None).return_fig (
bool
, optional) – Flag to return the figure and axes objects. Defaults to False.figure_kwargs (
dict
, optional) – Additional figure instantiation keyword arguments that are passed to plt.figure(). Defaults to None.plot_kwargs_line (
dict
, optional) – Additional plotting keyword arguments that are passed to ax.plot() for plotting lines for the wind farm efficiency and, ifenergy_data_por
and energy_data_lt arguments are provided, energy distributions subplots. Defaults to {}.plot_kwargs_fill (
dict
, optional) – If UQ is True, additional plotting keyword arguments that are passed to ax.fill_between() for plotting shading regions for 95% confidence intervals for the wind farm efficiency and, ifenergy_data_por
and energy_data_lt arguments are provided, energy distributions subplots. Defaults to {}.legend_kwargs (
dict
, optional) – Additional legend keyword arguments that are passed to ax.legend() for the wind farm efficiency and, ifenergy_data_por
and energy_data_lt arguments are provided, energy distributions subplots. Defaults to {}.
- Returns:
If
return_fig
is True, then the figure and axes object(s), corresponding to the wake loss plot or, ifenergy_data_por
andenergy_data_lt
arguments are provided, wake loss and normalized energy plots, are returned for further tinkering/saving.- Return type:
None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes] | tuple[matplotlib.pyplot.Figure, tuple [matplotlib.pyplot.Axes, matplotlib.pyplot.Axes]]
- openoa.utils.plot.plot_yaw_misalignment(ws_bins: list[float], vane_bins: list[float], power_values_vane_ws: ndarray[Any, dtype[float64]], curve_fit_params_ws: ndarray[Any, dtype[float64]], mean_vane_angle_ws: ndarray[Any, dtype[float64]], yaw_misalignment_ws: ndarray[Any, dtype[float64]], turbine_id: str, power_performance_label: str = 'Normalized Cp (-)', xlim: tuple[float, float] = (None, None), ylim: tuple[float, float] = (None, None), return_fig: bool = False, figure_kwargs: dict | None = None, plot_kwargs_curve: dict = {}, plot_kwargs_line: dict = {}, plot_kwargs_fill: dict = {}, legend_kwargs: dict = {})[source]#
Plots power performance vs. wind vane angle along with the best-fit cosine curve for each wind speed bin for a single turbine. The mean wind vane angle and the wind vane angle where power performance is maximized are shown for each wind speed bin. Additionally, the yaw misalignments for each wind speed bin as well as the mean yaw misalignment avergaged over all wind speed bins are listed. If UQ is used, 95% confidence intervals will be plotted for the binned power performance values and listed for the yaw misalignment estiamtes.
- Parameters:
ws_bins (list[float]) – Wind speed bin values for which yaw misalignment plots are produced (m/s).
vane_bins (list[float]) – Wind vane angle bin values for which power performance values are plotted (degrees).
power_values_vane_ws (
np.ndarray
) – 2D or 3D array containing power performance data for each wind speed bin in thews_bins
argument (first dimension if a 2D array) and each wind vane bin in the vane_bins argument (second dimension if a 2D array). If a 3D array is provided, the first dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.curve_fit_params_ws (
np.ndarray
) – 2D or 3D array containing optimal cosine curve fit parameters (magnitude, offset (degrees), and cosine exponent) for each wind speed bin in the ws_bins argument (first dimension if a 2D array). If a 3D array is provided, the first dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted. The last dimension contains the optimal curve fit parameters.mean_vane_angle_ws (
np.ndarray
) – Array containing mean wind vane angles for each wind speed bin in thews_bins
argument (degrees).yaw_misalignment_ws (
np.ndarray
) – 1D or 2D array containing yaw misalignment values for each wind speed bin in thews_bins
argument (degrees). If a 2D array is provided, the first dimension should contain results from different Monte Carlo iterations and 95% confidence intervals will be plotted.turbine_id (str, optional) – Name of turbine for which yaw misalignment data are provided. Used to determine title and plot axis labels. Defaults to None.
power_performance_label (str, optional) – The label to use for the power performance (y) axis. Defaults to “Normalized Cp (-)”.
xlim (
tuple[float, float]
, optional) – A tuple of floats representing the x-axis wind vane angle plotting display limits (degrees). Defaults to (None, None).ylim (
tuple[float, float]
, optional) – A tuple of the y-axis plotting display limits for the power performance vs. wind vane plots. Defaults to (None, None).return_fig (
bool
, optional) – Flag to return the figure and axes objects. Defaults to False.figure_kwargs (
dict
, optional) – Additional figure instantiation keyword arguments that are passed toplt.figure()
. Defaults to None.plot_kwargs_curve (
dict
, optional) – Additional plotting keyword arguments that are passed toax.plot()
for plotting lines for the power performance vs. wind vane plots. Defaults to {}.plot_kwargs_line (
dict
, optional) – Additional plotting keyword arguments that are passed toax.plot()
for plotting vertical lines indicating mean vane angle and vane angle where power is maximized. Defaults to {}.plot_kwargs_fill (
dict
, optional) – IfUQ
is True, additional plotting keyword arguments that are passed toax.fill_between()
for plotting shading regions for 95% confidence intervals for power performance vs. wind vane. Defaults to {}.legend_kwargs (
dict
, optional) – Additional legend keyword arguments that are passed toax.legend()
for the power performance vs. wind vane plots. Defaults to {}.
- Returns:
If return_fig is True, then the figure and axes object(s) corresponding to the yaw misalignment plots are returned for further tinkering/saving.
- Return type:
None | tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes]