Influence of parametrized polling on convolutional neural networks performance.

Influence of parametrized polling on convolutional neural networks performance.

Source

Master's thesis

Author

Farah, Inas Shadoul Mohamed.

Country

Oman

City

Muscat

Publisher

Sultan Qaboos University

Gregorian

2024

Language

English

Subject

Neural networks (Computer science)

Thesis Type

Master's thesis

English abstract

As part of, and in effect of the 4th industrial revolution, development in many sectors has been moving towards automation. More and more, applications and systems require less human intervention and have grown to be capable of making unassisted decisions. These applications or systems have been labeled intelligent. This “intelligence” is the result of their ability to “learn”, using appropriate machine learning (ML) techniques. Convolutional Neural Networks (CNNs) are applications of these machine learning techniques. Pooling is a major component of Convolutional Neural Networks (CNNs), having considerable influential on the learning process of classification models, and CNNs in general. pooling functions in reducing the spatial dimensions of feature maps and, consequently, the number of parameters to be learnt by the CNN. This reduces the computational cost and prevents overfitting. Many pooling techniques were proposed in the literature, such as Max Pooling (MaxPool), Average Pooling (AvgPool), Stochastic Pooling, and Spectral Pooling. Each of these techniques has its advantages and limitations. Depending on the type of data, and structure of the CNN, some pooling methods are better suited to certain situations when compared to their counterparts. Some standard or popular pooling methods are the Maximum Pooling (MaxPool) and the Average Pooling (AvgPool). MaxPool is very efficient in extracting the most important features within a receptive field and is robust to minor translations and rotation of the input image, while AvgPool tends to preserve the background information and is robust to outliers. However, these methods present some limitations. MaxPool, when choosing the maximum value, disregards the other nomaximum values, while AvgPool considers the whole of the input in equal importance. This encouraged the development of more pooling methods to satisfy specific CNN needs. This thesis proposes the addition of a parameter to the pooling process through a pooling kernel. The proposed pooling methods are adaptive versions of the two standard or popular existing methods. This investigation, and throughout this thesis, will answer how this addition affects the pooling process and the subsequent CNN model performance. Through this thesis, experimentation, and the produced results, it is found that adding parameters to the pooling process produces better results than the standard aforementioned pooling methods.

Arabic abstract

كجزء من، ونتيجة للثورة الصناعية الرابعة، يتحرك التطور في العديد من القطاعات نحو الاتمتة. حيث تتطلب التطبيقات والانظمة بشكل متزايد تدخلا بشريا أقل وأصبحت قادرة على اتخاذ قرارات بدون مساعدة. تم تصنيف هذه التطبيقات أو الانظمة على أنها ذكية. هذه "الذكاء" هو نتيجة لقدرتها على "التعلم"، باستخدام تقنيات تعلم الالة المناسبة .(ML (تُعد الشبكات العصبية الالتفافية (CNNs (تطبيقات لهذه التقنيات في تعلم الالة. يُعد التجميع (Pooling (مكوناا رئيسياا في الشبكات العصبية الالتفافية(CNNs (، وله تأثير كبير على عملية التعلم لنماذج التصنيف و CNNs بشكل عام. تعمل وظائف التجميع على تقليل الابعاد المكانية لخرائط الميزات وبالتالي تقليل عدد المعلمات التي يجب تعلمها بواسطة .CNN هذا يقلل من التكلفة الحسابية ويمنع فرط التكيف. تم اقتراح العديد من تقنيات التجميع في الادبيات، مثل التجميع الاقصى (Pooling Max (أو (MaxPool (، والتجميع المتوسط (Pooling Average (أو (AvgPool (، والتجميع العشوائي(Pooling Stochastic (، والتجميع الطيفي .(Pooling Spectral (لكل من هذه التقنيات مزاياها وحدودها . اعتماداا على نوع البيانات وهيكلCNN ، فإن بعض طرق التجميع تكون أكثر مًلءمة لبعض الحالات مقارنة بنظيراتها. بعض طرق التجميع القياسية أو الشائعة هي التجميع الاقصى (MaxPool (والتجميع المتوسط للغاية في استخراج الميزات الاهم داخل مجال الاستقبال ويكون متيناا ضد .(AvgPool(يعتبر MaxPool فعالاا التحوالت البسيطة ودوران الصورة المدخلة، بينما يميل AvgPool إلى الحفاظ على معلومات الخلفية ويكون قوياا ضد القيم المتطرفة. ومع ذلك، فإن هذه الاساليب تقدم بعض القيود. حيث أنMaxPool ، عند اختيار القيمة القصوى، يتجاهل القيم الاخرى غير القصوى، بينما يعتبر AvgPool جميع المدخًلت بنفس الاهمية. هذا ما شجع على تطوير المزيد من طرق التجميع لتلبية احتياجات CNN المحددة. تقترح هذه الاطروحة إضافة معلمة إلى عملية التجميع من خًلل نواة التجميع. تمثل طرق التجميع المقترحة إصدارات تكيفية للطرق القياسية أو الشائعة الموجودة. ستجيب هذه الدراسة، وعلى مدار هذه الاطروحة، على كيفية تأثير هذه الاضافة على عملية التجميع وأداء نموذج CNN الًلحق. من خًلل هذه الاطروحة، التجارب، والنتائج المنتجة، ُوجد أن إضافة معلمات إلى عملية التجميع ينتج عنه نتائج أفضل من طرق التجميع القياسية المذكورة أعًلاه