Skip to main content

Analysis reveals why Nvidia graphics chips are power-saving performance beasts

Tile-based Rasterization in Nvidia GPUs with David Kanter of Real World Tech
A recent report from David Kanter of Real World Technologies investigates why graphics chips provided by Nvidia — namely those based on the company’s Maxwell and Pascal architectures — perform better than their peak theoretical numbers, and why they’re more efficient than competing graphics chips. In a nutshell, the pixel output on Nvidia GPUs is buffered by what’s called tile-based immediate-mode rasterizers, which is fast and power-efficient. The competitor’s graphics chips rely on slower, conventional full-screen immediate-mode rasterizers.

According to Kanter, tile-based rasterization has been around since the 1990s, first popping up in the PowerVR architecture and adopted by ARM and Qualcomm in their mobile processors’ GPUs. Up until Nvidia introduced this system into its Maxwell GM20x architecture, tile-based rasterization wasn’t successfully implemented into desktop graphics chips.

Tile-based rasterization essentially means that each triangle-based, three-dimensional scene is split up into tiles, and each tile is broken down (rasterized) into pixels on the graphics chip itself to be “printed” on a two-dimensional screen. By contrast, full-screen immediate-mode rasterizers use more memory and more power by breaking down the entire scene into pixels in one pass (or scan).

“Using tiled regions and buffering the rasterizer data on-die reduces the memory bandwidth for rendering, improving performance and power-efficiency,” Kanter explains. “Consistent with this hypothesis, our testing shows that Nvidia GPUs change the tile size to ensure that the pixel output from rasterization fits within a fixed size on-chip buffer or cache.”

Kanter explains that mobile GPUs from the likes of Apple and other device makers use a method called tile-based deferred rendering where geometry and pixel-based work is done in two separate passes. The scene is divided into tiles, triangles are processed for each tile at once, and then pixel shading for each tile occurs after that.

However, Nvidia is reportedly using a tile-based “immediate” technique in its desktop GPUs that divides the screen up into tiles, and then rasterizes small batches of triangles within the tile. The triangles are typically buffered or cached on-chip, he says, which in turn improves performance and saves power.

In a demonstration using a tool called Triangles. HLSL running on an AMD Radeon HD 6670 GPU and Windows 10, he shows how AMD’s graphics chip renders twelve identical, flat objects on the screen, moving from right to left and line by line until they’re rendered one by one from the top of the screen to the bottom, overwriting each other. He revealed this technique by moving a slider that sets the number of pixels that can be rendered on the screen. Just imagine an invisible printer going back and forth across the screen quicker than the human eye can fully detect.

After revealing AMD’s current draw technique, the demonstration moves to a different system using the same tool, Windows 10, and a Nvidia GeForce GTX 970 graphics card. Here you’ll notice that when the rendering process is paused, the stacked twelve objects are rendered simultaneously, with two completed tiles on the left and five more tiles appearing in various states in a checkerboard pattern to the right. Overall, the rasterization path is left to right, and top to bottom.

That all said, Nvidia fully rasterizes one tile containing a portion of all objects before moving on to the next tile. AMD, on the other hand, rasterizes each object in a printer-type fashion from top to bottom first before going back to the beginning and rendering the next object. Things get even more interesting when Nvidia’s GeForce GTX 970 is installed into the test bed, revealing even larger tiles with a different pattern.

To check out this latest investigation, be sure to hit the video embedded above for the full 19:45 demonstration.

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
A YouTuber proved there’s a problem with Nvidia’s prices
Kyle Hansen looking suspiciously at an RTX 4080 GPU

Want to build a gaming PC but don’t have thousands to shell out for an RTX 4080? We’ve got good news for you. Kyle Hansen of the Bitwit YouTube channel built a PC with a Ryzen 5 and Radeon RX 6800 for half the price of an Nvidia-powered machine.

Let’s be honest. Hansen was lampooning Nvidia and Intel when he slapped together this “budget” gaming PC, but the results were shocking. Forza Horizon 5 rendered at 110 fps on ultra settings (2560 x 1440), which is a respectable rate and undermines Nvidia’s claims of the RTX line being the only good GPUs on the market.

Read more
AMD just subtly dunks on Nvidia’s melting RTX 4090 power adapters
amd makes fun of nvidia rtx 4090 power adapters scott herkelman

AMD and Nvidia don't typically reference each other by name at public events, but at its RX 7000 and RDNA 3 launch, AMD just made some indirect nods to Nvidia's recent problems with melting power adapters.

"There's no need to rebuild your desktop. No need to upgrade your case. And there's no need for a new power adapter," said Scott Herkelman, head of graphics at AMD, with a sly smile on his face. "We made it as easy as pulling out your old card and putting in a new one."

Read more
Why Nvidia’s 40-series GPUs will never be for me
MSI RTX 4090 Suprim X on a pink background.

Nvidia’s 40-series GPUs launched recently, and to say they’ve made waves in the computing world would be a serious understatement. These cards are big, powerful, and outrageously expensive. Yet if you want the most absurd performance cards you can get your mitts on, they’re second to none.

But with all that said, there’s absolutely no way I’ll be buying one. In fact, I wouldn’t touch one with a bargepole -- and there are plenty of reasons why.
Little meets large

Read more