وثيقة
Clustering of customers using electricity consumption data for smart grid applications.
الناشر
Sultan Qaboos University.
ميلادي
2021
اللغة
الأنجليزية
الملخص الإنجليزي
Recently, the new technological and data science achievements allowed the rapid growth
of large-scale data. A typical example is the smart grid streaming data that are produced by the industrial smart energy meters. An electricity consumption sequence
measurements taken at specified time intervals is considered as the load profile data
of an industry, which represents the load profile of an industry in a given period. A
data matrix is used to represent a set of load profiles where the sequence of measurements of the industry is represented in a row and each column represents a set of measurements processed within the particular time slot from all industries. Such kind of
industrial power consumption data have a large number of irrelevant features (column)
because of various kind of factors, e.g., break time, weather conditions, production orders, etc. It is a challenging task to recover robust clusters using this matrix with a
large number of irrelevant features. Only few feature selection algorithms are available
for unsupervised streaming data. Additionally, the behaviour of industrial data streams
is dierent from other data stream, i.e, stock exchange time series. In this work, we
address this problem to define the business process operations which are very useful
for dierent smart grid applications. A density based feature selection technique is
utilized to remove the irrelevant features from the data matrix. The local densities in
dierent special areas (single features) of the data are identified. The local densities
are computed, and the densities of temporal regions are also added where the temporal regions are the collection of the next features. At this stage, finding a threshold
value of completed densities plays a significant role to improve the accuracy of feature section method. We used an advanced method to find a threshold value which is
advanced Minimum Description Length (MDL) principle. The local densities are classified into two groups, one to represent density with high values, while zero represents
the density with lower value. The density classes of industries at distinct time slots are
represented using a binary matrix. Then, we used a new similarity of density vectors
is computed between each two following time slots from the binary matrix, and the
identified irrelevant features of density vectors are removed from the load profile data.
Finally, detect the overall number of clusters by using a data visualization approach,
and cluster the filtered data using the k-means algorithm to produce industry segmentation results, where each segment represents one electricity consumption pattern.
المجموعة
URL المصدر
قالب العنصر
الرسائل والأطروحات الجامعية