Clustering of customers using electricity consumption data for smart grid applications.

Clustering of customers using electricity consumption data for smart grid applications.

مؤلف

Al-Qamshouiyah, Hind Mohammed Humaid.

الناشر

Sultan Qaboos University.

ميلادي

2021

اللغة

الأنجليزية

الموضوع

Electricity--Consumption--Data processing--Oman

Smart power grids--Oman

الملخص الإنجليزي

Recently, the new technological and data science achievements allowed the rapid growth of large-scale data. A typical example is the smart grid streaming data that are produced by the industrial smart energy meters. An electricity consumption sequence measurements taken at specified time intervals is considered as the load profile data of an industry, which represents the load profile of an industry in a given period. A data matrix is used to represent a set of load profiles where the sequence of measurements of the industry is represented in a row and each column represents a set of measurements processed within the particular time slot from all industries. Such kind of industrial power consumption data have a large number of irrelevant features (column) because of various kind of factors, e.g., break time, weather conditions, production orders, etc. It is a challenging task to recover robust clusters using this matrix with a large number of irrelevant features. Only few feature selection algorithms are available for unsupervised streaming data. Additionally, the behaviour of industrial data streams is dierent from other data stream, i.e, stock exchange time series. In this work, we address this problem to define the business process operations which are very useful for dierent smart grid applications. A density based feature selection technique is utilized to remove the irrelevant features from the data matrix. The local densities in dierent special areas (single features) of the data are identified. The local densities are computed, and the densities of temporal regions are also added where the temporal regions are the collection of the next features. At this stage, finding a threshold value of completed densities plays a significant role to improve the accuracy of feature section method. We used an advanced method to find a threshold value which is advanced Minimum Description Length (MDL) principle. The local densities are classified into two groups, one to represent density with high values, while zero represents the density with lower value. The density classes of industries at distinct time slots are represented using a binary matrix. Then, we used a new similarity of density vectors is computed between each two following time slots from the binary matrix, and the identified irrelevant features of density vectors are removed from the load profile data. Finally, detect the overall number of clusters by using a data visualization approach, and cluster the filtered data using the k-means algorithm to produce industry segmentation results, where each segment represents one electricity consumption pattern.

المجموعة

الرسائل والأطروحات

URL المصدر

https://hdl.handle.net/20.500.12408/9167

قالب العنصر

الرسائل والأطروحات الجامعية