Physics-Based Method for Generating Fully Synthetic IV Curve Training Datasets for Machine Learning Classification of PV Failures

Hopwood, Michael W.; Stein, Joshua S.; Braid, Jennifer L.; Seigneur, Hubert P.

Physics-Based Method for Generating Fully Synthetic IV Curve Training Datasets for Machine Learning Classification of PV Failures

Michael W. Hopwood, Joshua S. Stein, Jennifer L. Braid and Hubert P. Seigneur
Additional contact information
Michael W. Hopwood: Department of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA
Joshua S. Stein: Sandia National Laboratories, Albuquerque, NM 87123, USA
Jennifer L. Braid: Sandia National Laboratories, Albuquerque, NM 87123, USA
Hubert P. Seigneur: Florida Solar Energy Center, University of Central Florida, Cocoa, FL 32922, USA

Energies, 2022, vol. 15, issue 14, 1-16

Abstract: Classification machine learning models require high-quality labeled datasets for training. Among the most useful datasets for photovoltaic array fault detection and diagnosis are module or string current-voltage (IV) curves. Unfortunately, such datasets are rarely collected due to the cost of high fidelity monitoring, and the data that is available is generally not ideal, often consisting of unbalanced classes, noisy data due to environmental conditions, and few samples. In this paper, we propose an alternate approach that utilizes physics-based simulations of string-level IV curves as a fully synthetic training corpus that is independent of the test dataset. In our example, the training corpus consists of baseline (no fault), partial soiling, and cell crack system modes. The training corpus is used to train a 1D convolutional neural network (CNN) for failure classification. The approach is validated by comparing the model’s ability to classify failures detected on a real, measured IV curve testing corpus obtained from laboratory and field experiments. Results obtained using a fully synthetic training dataset achieve identical accuracy to those obtained with use of a measured training dataset. When evaluating the measured data’s test split, a 100% accuracy was found both when using simulations or measured data as the training corpus. When evaluating all of the measured data, a 96% accuracy was found when using a fully synthetic training dataset. The use of physics-based modeling results as a training corpus for failure detection and classification has many advantages for implementation as each PV system is configured differently, and it would be nearly impossible to train using labeled measured data.

Keywords: photovoltaic systems; simulation; neural networks; IV curves (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/1996-1073/15/14/5085/pdf (application/pdf)
https://www.mdpi.com/1996-1073/15/14/5085/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:15:y:2022:i:14:p:5085-:d:861150

Access Statistics for this article

Energies is currently edited by Ms. Cassie Shen

More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().