UNDERSTANDING DOUBLE DESCENT BEHAVIOR IN DEEP LEARNING NEURAL NETWORKS

Open Access

Author:: Dutta, Shubhangam
Area of Honors:: Computer Engineering
Degree:: Bachelor of Science
Document Type:: Thesis
Thesis Supervisors:: Mehrdad Mahdavi, Thesis Supervisor
John Morgan Sampson, Thesis Honors Advisor
Keywords:: Deep Learning
Neural Networks
Double Descent
Machine Learning
Abstract:: The statistical understanding of the phenomenon of the U- shaped curve in performance of modern machine learning regimes owes to the existence of the bias-variance tradeoff between the models. However, most modern deep learning networks exhibit a double descent behavior where an increase in certain parameters such as model size, epochs leads to an increase in performance superseding the U-shaped curve past an interpolation threshold. The notion of Effective Model Complexity incorporates all these factors and conjectures a generalized double descent with respect to these factors [1]. This research work builds upon this notion of Effective Model Complexity and tests these various factors across the ResNet v2 model. This research paper also elaborates on the reason behind picking this particular model to test this hypothesis and demonstrates the effect of external parameters such as label noise on this double descent behavior.

Tools