Deep learning is a field with complex computational requirements, and your choice of GPU is important in determining your deep learning experience. But what features are important if you want to buy a new GPU? GPU RAM, cores, tensor cores? How to make a cost-effective choice? This blog post will explore these issues, address common misconceptions, give you an intuitive understanding of how you think about GPUs, and provide you with tips to help you choose the one that suits you. This blog post is designed to give you a different level of understanding of GPU and the new NVIDIA Ampere GPU series. You have a choice: (1) If you are not interested in details about how GPUs work, the faster the GPUs are, and what is unique about the new NVIDIA RTX 30 Ampere line, you will skip the display and presentation according to the dollar charts and the recommendations section. This forms the core of the blog post and most of it
This blog post is organized as follows. Let me first explain what makes a GPU fast. I will discuss CPU versus GPU, Tensor Cores, memory bandwidth, and GPU memory hierarchy, and how it relates to performance for deep learning. These statements can help you gain a more intuitive understanding of what to look for in a GPU. Then I’ll make theoretical estimates of GPU performance and adapt them to some NVIDIA marketing benchmarks to get reliable and incredible performance data.In this section, I will also discuss common misconceptions and some different issues such as cloud vs desktop, cooling,
Read for more information best GPU for deep learning
How do GPUs work?
If you use GPUs regularly, it is helpful to understand how they work. . Instead, you can better understand why you need a GPU in the first place and how other hardware options may compete in the future. You can skip this section if you want useful performance data and arguments to help you decide which GPU to buy. The top-level explanation of how GPUs work is my following Query answer:
The most important GPU specifications for deep learning processing speed
This section will help you gain a more intuitive understanding of how to think about in-depth learning demonstrations. This understanding will help you evaluate future GPUs on your own.
The Cores Tensor reduces the cycles used to calculate multiplication and addition operations 16 times – in my example for 32 × 32 matrices, from 128 cycles to 8 cycles.
Tensor Cores reduce re-access to shared memory, saving additional memory access cycles.
Tensor Cores are so fast that the computer is no longer an obstacle. The only problem is retrieving data in Tensor Cores. Now there are cheap enough GPUs that almost anyone can buy a GPU with Tensor Cores. Therefore, I only recommend GPUs with Tensor Cores. It is useful to understand how they work to assess the importance of these computational units specifically for matrix multiplication. Theoretical estimates of speed in amperes
Combining the above considerations, we expect the difference between the two Tensor-Core GPUs to be mostly memory bandwidth. Additional benefits include more shared memory / L1 cache and better use of Tensor Cores registration.