Reply to thread

Message

<blockquote data-quote="Hacker News" data-source="post: 71881" data-attributes="member: 365">I've had this discussion a few times recently including with people who are pretty up-to-date engineers working in IT. It's true that the latest offerings from Nvidia both in the gaming and pro range are pretty mind-boggling in terms of power consumption, size, cost, total VRAM etc... but there is actually a very real need for this sort of thing driven by the ML community, maybe more so than most people expect.Here is an example from a project I'm working on today. This is the console output of a model I'm training at the moment:-----------------------------I1015 20:04:51.426224 139830830814976 supervisor.py:1050] Recording summary at step 107041. INFO:tensorflow:global step 107050: loss = 1.0418 (0.453 sec/step)I1015 20:04:55.421283 139841985250112 learning.py:506] global step 107050: loss = 1.0418 (0.453 sec/step) INFO:tensorflow:global step 107060: loss = 0.9265 (0.461 sec/step)I1015 20:04:59.865883 139841985250112 learning.py:506] global step 107060: loss = 0.9265 (0.461 sec/step) INFO:tensorflow:global step 107070: loss = 0.7003 (0.446 sec/step)I1015 20:05:04.328712 139841985250112 learning.py:506] global step 107070: loss = 0.7003 (0.446 sec/step) INFO:tensorflow:global step 107080: loss = 0.9612 (0.434 sec/step)I1015 20:05:08.808678 139841985250112 learning.py:506] global step 107080: loss = 0.9612 (0.434 sec/step) INFO:tensorflow:global step 107090: loss = 1.7290 (0.444 sec/step)I1015 20:05:13.288547 139841985250112 learning.py:506] global step 107090: loss = 1.7290 (0.444 sec/step)-----------------------------Check out that last line: with a batch size of 42 images (maximum I can fit on my GPU with 24Gb memory) I randomly get the occasional batch where total loss is more than double the moving average over the last 100 batches!There's nothing fundamentally wrong with this, but it will throw a wrench in convergence of the model for a number of iterations, and it's probably not going to help in reaching the ideal ultimate state of the model within the number of iterations I have planned.This is in part due to the fact that I have a fundamentally unbalanced dataset, and need to apply some pretty large label-wise weight rebalancing in the loss function to account for that... but this is the best representation of reality in the case I am working on!--&gt; The ideal solution RIGHT NOW would be to use a larger batch size, in order to minimise the possibility of getting these large outliers in the training set.To get the best results in the short term I want to train this system (using this small backbone) with batches of 420 images instead of 42, which would require 240Gb of memory... so 3x Nvidia A100 GPUs for example!--&gt; Ultimately the next step is to make a version with a backbone that has 5x more parameters, and on the same dataset but scaled at 2x linear resolution... requiring probably around 500Gb of VRAM to have large enough batches to achieve good convergence on all classes! And bear in mind this is a relatively small model, which can be deployed on a stamp-size Intel Movidius TPU and run in real-time directly inside a tiny sensor. For non-realtime inference there are people out there working on models with 1000x more parameters!So if anyone is wondering why NVIDIA keeps making more and more powerful GPUs, and wondering who could possibly need that much power... this is your answer: the design / development of these GPUs is now being pulled forward 90% by people who need these kinds of solutions for ML ops, and the gaming market is 10% of the real-world &quot;need&quot; for this type of power, where 10 years ago that ratio would have been reversed.<hr />Comments URL: <a href="https://news.ycombinator.com/item?id=33213976" target="_blank">https://news.ycombinator.com/item?id=33213976</a>Points: 15# Comments: 13<a href="https://news.ycombinator.com/item?id=33213976" target="_blank">Continue reading...</a></blockquote>

[QUOTE="Hacker News, post: 71881, member: 365"] I've had this discussion a few times recently including with people who are pretty up-to-date engineers working in IT. It's true that the latest offerings from Nvidia both in the gaming and pro range are pretty mind-boggling in terms of power consumption, size, cost, total VRAM etc... but there is actually a very real need for this sort of thing driven by the ML community, maybe more so than most people expect. Here is an example from a project I'm working on today. This is the console output of a model I'm training at the moment: ----------------------------- I1015 20:04:51.426224 139830830814976 supervisor.py:1050] Recording summary at step 107041. INFO:tensorflow:global step 107050: loss = 1.0418 (0.453 sec/step) I1015 20:04:55.421283 139841985250112 learning.py:506] global step 107050: loss = 1.0418 (0.453 sec/step) INFO:tensorflow:global step 107060: loss = 0.9265 (0.461 sec/step) I1015 20:04:59.865883 139841985250112 learning.py:506] global step 107060: loss = 0.9265 (0.461 sec/step) INFO:tensorflow:global step 107070: loss = 0.7003 (0.446 sec/step) I1015 20:05:04.328712 139841985250112 learning.py:506] global step 107070: loss = 0.7003 (0.446 sec/step) INFO:tensorflow:global step 107080: loss = 0.9612 (0.434 sec/step) I1015 20:05:08.808678 139841985250112 learning.py:506] global step 107080: loss = 0.9612 (0.434 sec/step) INFO:tensorflow:global step 107090: loss = 1.7290 (0.444 sec/step) I1015 20:05:13.288547 139841985250112 learning.py:506] global step 107090: loss = 1.7290 (0.444 sec/step) ----------------------------- Check out that last line: with a batch size of 42 images (maximum I can fit on my GPU with 24Gb memory) I randomly get the occasional batch where total loss is more than double the moving average over the last 100 batches! There's nothing fundamentally wrong with this, but it will throw a wrench in convergence of the model for a number of iterations, and it's probably not going to help in reaching the ideal ultimate state of the model within the number of iterations I have planned. This is in part due to the fact that I have a fundamentally unbalanced dataset, and need to apply some pretty large label-wise weight rebalancing in the loss function to account for that... but this is the best representation of reality in the case I am working on! --> The ideal solution RIGHT NOW would be to use a larger batch size, in order to minimise the possibility of getting these large outliers in the training set. To get the best results in the short term I want to train this system (using this small backbone) with batches of 420 images instead of 42, which would require 240Gb of memory... so 3x Nvidia A100 GPUs for example! --> Ultimately the next step is to make a version with a backbone that has 5x more parameters, and on the same dataset but scaled at 2x linear resolution... requiring probably around 500Gb of VRAM to have large enough batches to achieve good convergence on all classes! And bear in mind this is a relatively small model, which can be deployed on a stamp-size Intel Movidius TPU and run in real-time directly inside a tiny sensor. For non-realtime inference there are people out there working on models with 1000x more parameters! So if anyone is wondering why NVIDIA keeps making more and more powerful GPUs, and wondering who could possibly need that much power... this is your answer: the design / development of these GPUs is now being pulled forward 90% by people who need these kinds of solutions for ML ops, and the gaming market is 10% of the real-world "need" for this type of power, where 10 years ago that ratio would have been reversed. [HR][/HR] Comments URL: [URL]https://news.ycombinator.com/item?id=33213976[/URL] Points: 15 # Comments: 13 [url="https://news.ycombinator.com/item?id=33213976"]Continue reading...[/url] [/QUOTE]

Verification