Memory Efficient GPU based CNN training framework
October 2021-January 2022 Mentor: Dr. Vishwesh Jatala
Developed a CNN training framework using C++ and CUDA. Explored different ways to make the framework memory efficient to enable it to train large CNNs like AlexNet in single GPU with 12GB memory. Experimented different methods to offload CNN layers to disk when its computation is done for an epoch and prefetch them in the next epoch. Used priority queue based offloading to offload the largest layers first to create enough space for next layers and reducing the number of offload operations performed in an epoch.