Addressing editorial comments

sandialabs · Nov 14, 2023 · 2fbf131 · 2fbf131
1 parent 27198ed
commit 2fbf131
Showing 1 changed file with 6 additions and 5 deletions.
diff --git a/paper.md b/paper.md
@@ -43,9 +43,9 @@ The purpose of `pvOps` is to support empirical evaluations of data collected in
 
 # Statement of Need
 
-Continued interest in PV deployment across the world has resulted in increased awareness of needs associated with managing reliability and performance of these systems during operation. Current open-source packages for PV analysis focus on theoretical evaluations of solar power simulations (e.g. `pvlib` [@holmgren2018pvlib]), data cleaning and feature development for production data (e.g. `pvanalytics` [@perry2022pvanalytics]), specific use cases of empirical evaluations (e.g. `RdTools` [@deceglie2018rdtools] and `Pecos` [@klise2016performance] for degradation analysis), or analysis of electroluminescene images (e.g. `PVimage` [@pierce2020identifying]); see [openpvtools](https://openpvtools.readthedocs.io/en/latest/) for a list of additional open source PV packages. However, a general package that can support data-driven, exploratory evaluations of diverse field collected information is currently lacking. For example, a maintenance log that describes an inverter failure may be temporally correlated to a dip in production levels. Identifying such relationships across different types of field data can improve understanding of the impacts of certain types of failures on a PV plant. To address this gap, we present `pvOps`, an open-source, Python package that can be used by  researchers and industry analysts alike to evaluate and extract insights from different types of data routinely collected during PV field operations. 
+Continued interest in PV deployment across the world has resulted in increased awareness of needs associated with managing reliability and performance of these systems during operation. Current open-source packages for PV analysis focus on theoretical evaluations of solar power simulations (e.g. `pvlib` [@holmgren2018pvlib]), data cleaning and feature development for production data (e.g. `pvanalytics` [@perry2022pvanalytics]), specific use cases of empirical evaluations (e.g. `RdTools` [@deceglie2018rdtools] and `Pecos` [@klise2016performance] for degradation analysis), or analysis of electroluminescene images (e.g. `PVimage` [@pierce2020identifying]); see [openpvtools](https://openpvtools.readthedocs.io/en/latest/) for a list of additional open source PV packages. However, a general package that can support data-driven, exploratory evaluations of diverse field collected information is currently lacking. For example, a maintenance log that describes an inverter failure may be temporally correlated to a dip in production levels. Identifying such relationships across different types of field data can improve understanding of the impacts of certain types of failures on a PV plant. To address this gap, we present `pvOps`, an open-source Python package that can be used by  researchers and industry analysts alike to evaluate and extract insights from different types of data routinely collected during PV field operations. 
 
-PV data collected in the field varies greatly in structure (i.e., timeseries and text records) and quality (i.e., completeness and consistency). The data available for analysis is frequently semi-structured. Furthermore, the level of detail collected between different owners/operators might vary. For example, some may capture a general start and end time for an associated event whereas others might include additional time details for different resolution activities. This diversity in data types and structures often leads to data being under-utilized due to the amount of manual processing required. To address these issues, `pvOps` provides a suite of data processing, cleaning, and visualization methods to leverage insights across a broad range of data types, including operations and maintenance records,  production timeseries, and IV curves. The functions within `pvOps` enable users to better parse available data to understand patterns in outages and production losses. 
+PV data collected in the field varies greatly in structure (e.g., timeseries and text records) and quality (e.g., completeness and consistency). The data available for analysis is frequently semi-structured. Furthermore, the level of detail collected between different owners/operators might vary. For example, some may capture a general start and end time for an associated event whereas others might include additional time details for different resolution activities. This diversity in data types and structures often leads to data being under-utilized due to the amount of manual processing required. To address these issues, `pvOps` provides a suite of data processing, cleaning, and visualization methods to leverage insights across a broad range of data types, including operations and maintenance records,  production timeseries, and IV curves. The functions within `pvOps` enable users to better parse available data to understand patterns in outages and production losses. 
 
 # Package Overview 
 The following table summarizes the four modules within `pvOps` by presenting: the type of data they analyze, example data features, and highlights of relevant functions. 
@@ -60,14 +60,15 @@ timeseries | Production data | *site*, *timestamp*, *power production*, *irradia
  | | | 
 text2time | O&M records and production data | see entries for `text` and  `timeseries` modules above | analyze overlaps between O&M and production (timeseries) records, visualize overlaps between O&M records and production data
  | | | 
-iv | IV records | *current*, *voltage*, *irradiance*, *temperature*  | *simulate* IV curves with physical faults, extract diode parameters from IV curves,. classify faults using IV curves
+iv | IV records | *current*, *voltage*, *irradiance*, *temperature*  | *simulate* IV curves with physical faults, extract diode parameters from IV curves, classify faults using IV curves
+
+The functions within each module can be used to build pipelines that integrate relevant data processing, fusion, and visualization capabilities to support user endgoals. For example, a user with IV curve data could build a pipeline that leverages functions within the `iv` module to process and extract diode parameters within IV curves as well as train models to support classifications based on fault type. A pipeline could be also be built that leverages functions across modules if a user has access to multiple types of data (e.g., both O&M and production records). A sample end-to-end workflow using `pvOps` modules could be:
 
-The functions within each module can be used to build pipelines that integrate relevant data processing, fusion, and visualization capabilities to support user endgoals. For example, a user with IV curve data could build a pipeline that leverages functions within the `iv` module to process and extract diode parameters within IV curves as well as train models to support classifications based on fault type. A pipeline could be also be built that leverages functions across modules if a user has access to multiple types of data (e.g., both O&M and production records). A sample end-to-end workflow using `pvOps` modules could be: 
 1. Use functions within the `text` module to systematically review data quality issues within O&M records, train a machine learning model on available records, and use the model to estimate possible labels for missing entries
 2. Leverage the functions within the `timeseries` module, use machine learning to develop their own expected energy models for a given time series of irradiance and system size details, or use a pre-trained expected energy model [@hopwood2022generation] or leverage industry standard equations as a basis for evaluating possible production losses
 3. Couple outputs from the above two analyses (using functions in the `text2time` module) based on timestamps to develop summaries and visualizations of production impacts observed during these periods
 
-The [package documentation] for `pvOps` provides thorough examples exploring the various capabilities of each module. Additional details about the `iv` module capabilities, are captured in [@hopwood2020neural; @hopwood2022physics] while more information about the design and development of the `text`, `timeseries`, and `text2time` modules are captured in [@mendoza2021pvops]. Key package dependencies of `pvOps` include `pandas` [@reback2020pandas], `sklearn` [@pedregosa2011sklearn], `nltk` [@bird2009nltk], and `keras` [@chollet2015keras] for analysis and `matplotlib` [@hunter2007matplotlib], `seaborn` [@waskom2021seaborn], and `plotly` [@plotly2015] for visualization.
+The [package documentation] for `pvOps` provides thorough examples exploring the various capabilities of each module. Additional details about the `iv` module capabilities are captured in [@hopwood2020neural; @hopwood2022physics] while more information about the design and development of the `text`, `timeseries`, and `text2time` modules are captured in [@mendoza2021pvops]. Key package dependencies of `pvOps` include `pandas` [@reback2020pandas], `sklearn` [@pedregosa2011sklearn], `nltk` [@bird2009nltk], and `keras` [@chollet2015keras] for analysis and `matplotlib` [@hunter2007matplotlib], `seaborn` [@waskom2021seaborn], and `plotly` [@plotly2015] for visualization.
 
 # Ongoing Development
 The `pvOps` functionality and documentation continues to be improved and updated as new empirical techniques are identified. For example, research efforts have demonstrated utility of natural language processing techniques (e.g., topic modeling) and survival analyses to support evaluation of patterns in O&M records  [@gunda2020machine]. Additional statistical methods, such as Hidden Markov Modeling, have also been successfully used to support classification of failures within production data [@hopwood2022classification]. These and other capabilities will continue to be added to the package to improve its utility for supporting empirical analyses of field data.