Memory Efficient GPU based CNN training framework

October 2021-January 2022 Mentor: Dr. Vishwesh Jatala

Developed a CNN training framework using C++ and CUDA. Explored different ways to make the framework memory efficient to enable it to train large CNNs like AlexNet in single GPU with 12GB memory. Experimented different methods to offload CNN layers to disk when its computation is done for an epoch and prefetch them in the next epoch. Used priority queue based offloading to offload the largest layers first to create enough space for next layers and reducing the number of offload operations performed in an epoch.

Github Link

Share on

Twitter Facebook LinkedIn

Dhruv Deshmukh

Share on