Twitter
youtube
Discord
Contact us
Forums
New posts
Trending
Rules
Explore
Bioenergetic Wiki
Bioenergetic Life Search
Bioprovement Peat Search
Ray Peat Interviews by Danny Roddy
Master List: Ray Peat, PhD Interviews & Quotes by FPS
Traveling Resources
Google Flights
Wiki Voyage
DeepL Translator
Niche
Numbeo
Merch
Log in
Register
What's new
Search
Search
Search engine:
Threadloom Search
XenForo Search
Search titles only
By:
New posts
Trending
Menu
Log in
Register
Navigation
Install the app
Install
More options
Light/Dark Mode
Contact us
Close Menu
Information
World News
An answer to: why is Nvidia making GPUs bigger when games don't need them?
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="Hacker News" data-source="post: 71881" data-attributes="member: 365"><p>I've had this discussion a few times recently including with people who are pretty up-to-date engineers working in IT. It's true that the latest offerings from Nvidia both in the gaming and pro range are pretty mind-boggling in terms of power consumption, size, cost, total VRAM etc... but there is actually a very real need for this sort of thing driven by the ML community, maybe more so than most people expect.</p><p>Here is an example from a project I'm working on today. This is the console output of a model I'm training at the moment:</p><p>-----------------------------</p><p>I1015 20:04:51.426224 139830830814976 supervisor.py:1050] Recording summary at step 107041. INFO:tensorflow:global step 107050: loss = 1.0418 (0.453 sec/step)</p><p>I1015 20:04:55.421283 139841985250112 learning.py:506] global step 107050: loss = 1.0418 (0.453 sec/step) INFO:tensorflow:global step 107060: loss = 0.9265 (0.461 sec/step)</p><p>I1015 20:04:59.865883 139841985250112 learning.py:506] global step 107060: loss = 0.9265 (0.461 sec/step) INFO:tensorflow:global step 107070: loss = 0.7003 (0.446 sec/step)</p><p>I1015 20:05:04.328712 139841985250112 learning.py:506] global step 107070: loss = 0.7003 (0.446 sec/step) INFO:tensorflow:global step 107080: loss = 0.9612 (0.434 sec/step)</p><p>I1015 20:05:08.808678 139841985250112 learning.py:506] global step 107080: loss = 0.9612 (0.434 sec/step) INFO:tensorflow:global step 107090: loss = 1.7290 (0.444 sec/step)</p><p>I1015 20:05:13.288547 139841985250112 learning.py:506] global step 107090: loss = 1.7290 (0.444 sec/step)</p><p>-----------------------------</p><p>Check out that last line: with a batch size of 42 images (maximum I can fit on my GPU with 24Gb memory) I randomly get the occasional batch where total loss is more than double the moving average over the last 100 batches!</p><p>There's nothing fundamentally wrong with this, but it will throw a wrench in convergence of the model for a number of iterations, and it's probably not going to help in reaching the ideal ultimate state of the model within the number of iterations I have planned.</p><p>This is in part due to the fact that I have a fundamentally unbalanced dataset, and need to apply some pretty large label-wise weight rebalancing in the loss function to account for that... but this is the best representation of reality in the case I am working on!</p><p>--> The ideal solution RIGHT NOW would be to use a larger batch size, in order to minimise the possibility of getting these large outliers in the training set.</p><p>To get the best results in the short term I want to train this system (using this small backbone) with batches of 420 images instead of 42, which would require 240Gb of memory... so 3x Nvidia A100 GPUs for example!</p><p>--> Ultimately the next step is to make a version with a backbone that has 5x more parameters, and on the same dataset but scaled at 2x linear resolution... requiring probably around 500Gb of VRAM to have large enough batches to achieve good convergence on all classes! And bear in mind this is a relatively small model, which can be deployed on a stamp-size Intel Movidius TPU and run in real-time directly inside a tiny sensor. For non-realtime inference there are people out there working on models with 1000x more parameters!</p><p>So if anyone is wondering why NVIDIA keeps making more and more powerful GPUs, and wondering who could possibly need that much power... this is your answer: the design / development of these GPUs is now being pulled forward 90% by people who need these kinds of solutions for ML ops, and the gaming market is 10% of the real-world "need" for this type of power, where 10 years ago that ratio would have been reversed.</p><p></p><hr /><p></p><p>Comments URL: <a href="https://news.ycombinator.com/item?id=33213976" target="_blank">https://news.ycombinator.com/item?id=33213976</a></p><p></p><p>Points: 15</p><p></p><p># Comments: 13</p><p></p><p><a href="https://news.ycombinator.com/item?id=33213976" target="_blank">Continue reading...</a></p></blockquote><p></p>
[QUOTE="Hacker News, post: 71881, member: 365"] I've had this discussion a few times recently including with people who are pretty up-to-date engineers working in IT. It's true that the latest offerings from Nvidia both in the gaming and pro range are pretty mind-boggling in terms of power consumption, size, cost, total VRAM etc... but there is actually a very real need for this sort of thing driven by the ML community, maybe more so than most people expect. Here is an example from a project I'm working on today. This is the console output of a model I'm training at the moment: ----------------------------- I1015 20:04:51.426224 139830830814976 supervisor.py:1050] Recording summary at step 107041. INFO:tensorflow:global step 107050: loss = 1.0418 (0.453 sec/step) I1015 20:04:55.421283 139841985250112 learning.py:506] global step 107050: loss = 1.0418 (0.453 sec/step) INFO:tensorflow:global step 107060: loss = 0.9265 (0.461 sec/step) I1015 20:04:59.865883 139841985250112 learning.py:506] global step 107060: loss = 0.9265 (0.461 sec/step) INFO:tensorflow:global step 107070: loss = 0.7003 (0.446 sec/step) I1015 20:05:04.328712 139841985250112 learning.py:506] global step 107070: loss = 0.7003 (0.446 sec/step) INFO:tensorflow:global step 107080: loss = 0.9612 (0.434 sec/step) I1015 20:05:08.808678 139841985250112 learning.py:506] global step 107080: loss = 0.9612 (0.434 sec/step) INFO:tensorflow:global step 107090: loss = 1.7290 (0.444 sec/step) I1015 20:05:13.288547 139841985250112 learning.py:506] global step 107090: loss = 1.7290 (0.444 sec/step) ----------------------------- Check out that last line: with a batch size of 42 images (maximum I can fit on my GPU with 24Gb memory) I randomly get the occasional batch where total loss is more than double the moving average over the last 100 batches! There's nothing fundamentally wrong with this, but it will throw a wrench in convergence of the model for a number of iterations, and it's probably not going to help in reaching the ideal ultimate state of the model within the number of iterations I have planned. This is in part due to the fact that I have a fundamentally unbalanced dataset, and need to apply some pretty large label-wise weight rebalancing in the loss function to account for that... but this is the best representation of reality in the case I am working on! --> The ideal solution RIGHT NOW would be to use a larger batch size, in order to minimise the possibility of getting these large outliers in the training set. To get the best results in the short term I want to train this system (using this small backbone) with batches of 420 images instead of 42, which would require 240Gb of memory... so 3x Nvidia A100 GPUs for example! --> Ultimately the next step is to make a version with a backbone that has 5x more parameters, and on the same dataset but scaled at 2x linear resolution... requiring probably around 500Gb of VRAM to have large enough batches to achieve good convergence on all classes! And bear in mind this is a relatively small model, which can be deployed on a stamp-size Intel Movidius TPU and run in real-time directly inside a tiny sensor. For non-realtime inference there are people out there working on models with 1000x more parameters! So if anyone is wondering why NVIDIA keeps making more and more powerful GPUs, and wondering who could possibly need that much power... this is your answer: the design / development of these GPUs is now being pulled forward 90% by people who need these kinds of solutions for ML ops, and the gaming market is 10% of the real-world "need" for this type of power, where 10 years ago that ratio would have been reversed. [HR][/HR] Comments URL: [URL]https://news.ycombinator.com/item?id=33213976[/URL] Points: 15 # Comments: 13 [url="https://news.ycombinator.com/item?id=33213976"]Continue reading...[/url] [/QUOTE]
Loading…
Insert quotes…
Verification
Post reply
Information
World News
An answer to: why is Nvidia making GPUs bigger when games don't need them?
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.
Accept
Learn more…
Top