Friday, 1 July 2016

Clash of Titans

Clash of Titans

Mark Pickavance looks at Nvidia’s new GPU dynasty and compares them with what AMD has on offer

For me, some of the best bits of the last 30 years of computing have involved the development of dedicated video hardware. From the lightbulb moment that Voodoo Graphics delivered to the advent of multi-GPU systems, they’ve often surprised and occasionally amazed.

That said, the capability range in performance available now is so huge that no game developer could hope to encompass all the subtle layers in a single title.


Very often, they simply address the majority of their audience with a product that looks good with a £100 video card and proportionally more amazing with something better.

It’s into that context that Nvidia has launched its new 10 series cards, with specifications to humble the hard-core gamer and with a price to make them weep.

Are these the GPUs we’re looking for?

In The Green Corner


Nvidia has had a strong year, because in spite of some minor failings, gamers really loved the GTX 970 and bought large numbers of them.

According to the latest figures, 5% of all Steam gamers use one of these cards, even if they do cost from about £240, and the vast majority are usually in the £250-270 bracket. It has a bigger brother, the GTX 980, and that costs north of £400, and its Ti variant is over £500, and above that is the awesome GTX Titan X.

That very top rung will cost you an eye-watering £900 to £1,200, for a card with 12GB of GDDR5 RAM.

All these cards were based on a microarchitecture called Maxwell, were fabricated on a 28nm process, delivered between 3.5 and 6.2 gigaflops of processing power and could consume 250W of power doing that.

For those who wanted something affordable, Nvidia offered the GTX 950 and 960, although compared with their bigger brothers, they offered modest numbers of shader cores and memory bandwidth.

Now comes the new 10 Series, using a new Pascal GPU design built around the same model as Maxwell, but now using TSMC’s 16nm FinFET fabrication for even greater possibilities.

The small tracks allow for even more cores, higher clock speeds and better heat management.

Fig. 1 shows the complete specs of the GeForce GTX 1080 and GTX 1070, compared with the GTX 380Ti and Titan X designs that preceded them.

Fig.1

Looking at these details, a few things stood out that can stand greater explanation, because in many ways, these new cards are an odd hybrid between the ones they replace and something entirely different.

With a memory pathway that’s just 256 lanes wide, compared with 384 bits wide on the Maxwell cards, that’s a much easier design and therefore a less expensive board to build.

To achieve similar levels of bandwidth, the memory clocks have been increased from about 1752.5MHz on the GTX 980 to 2000MHz on the GTX 1070 and a whopping 2500MHz on the GTX 1080.

In addition, the GTX 1080 also employs GDDR5X, a modified version of GDDR5 that doubles the prefetch, theoretically doubling the memory speed while being almost identical in other respects.

That said, technically, these cards actually have less bandwidth than the Titan X and GTX 980 Ti (336GB/s), though both have more than the GTX 980.

What Nvidia appears to have done is to make this design marginally faster than the Titan X, delivering the performance crown. However, Nvidia also put a decent slice of that pie into reducing the power consumption by a whopping 28% compared to the Titan X and GTX 980 Ti.

With less power consumption comes less heat or, if you wish, more heat and greater overclocking, if you want that. Or you can run stock and cool.

Therefore, these cards should overclock easily or be able to game for many hours without overheating. At least that’s the theory.

With the first retail products hitting the shelves, I decided to acquire a couple and put them through their paces, because while technology might be one thing on a spec sheet, there’s no better acid test than actually giving them something difficult to render at a high resolution and frame-rate.

Retail Products


I’d like to thank Asus for providing two cards from its range for me to experience. At this time, it makes two baseline cards (called Founders Editions) and two Republic of Gamers Strix models, and it provided one of each level (see Fig. 2).

Fig.2

The first point to make is that neither of these products is an impulse purchase unless you’re very well off indeed. However, these are at the high end of the price scale, and some makers are offering the GTX 1070 for as little as £365.

What you don’t get at that price is the exotic triple fan cooler that Asus designed Aura, the LED lighting or the not insignificant clock tweaks.

I’ve put the baseline clocks in brackets for you to realise how much they’ve been adjusted, and these changes do make these Asus Strix designs some of the fastest 10 Series you’re likely to encounter.

Power consumption of these cards is much less than we’ve become accustomed to for flagship designs, with the GTX 1070 having only a single eight-pin PCIe power connector and a maximum TPD of just 150W. The GTX 1080 needs both an eightand six-pin power, but it only uses another 30W, even if that configuration gives it 250W to play with.

Accordingly, the recommended PSU for the GTX 1070 is just 500W, and the GTX 1080 will work with a 550W PSU, making for a modest configuration unless you intend to use SLI and more than one card.

At this price, you get what you pay for, and Asus put these beauties together with a high degree of precision and finesse. I wonder how they run?

In The Red Corner


For whatever reason, I haven’t seen an AMD Fury card before now, a technology that first appeared in June 2015.

AMD’s special sauce was boiled down into four video cards that use the Fiji-cored GPU, the R9 Fury, R9 Fury X, R9 Nano and the dual GPU Radeon Pro Duo.

By way of context, I was loaned by XFX an R9 Nano that in almost every respect has an identical profile to the Fury X, AMD’s best single GPU card.

A typical price for this card would be just over £400, making it a good bit cheaper than either the GTX 1080 or 1070.

It’s also much smaller, being able to fit into any ATX case, and it has a TDP of just 175W. Fig. 3 shows the critical specs compared with the Fury X and AMD’s previous performance king the R9 390X.

Fig.3

What’s interesting about the Fury designs in particular are their use of HBM memory and not GDDR5, empowering them with huge amounts of bandwidth and a massive memory bus width.

I wouldn’t read too much into the number of shaders compared with those on the Nvidia cards, because the way they’re organised is entirely different, and they’re not ‘oranges-too-ranges’ comparable.

Probably the most important aspect of AMD’s hardware is the relatively low clock speed of the GPUs, mostly due to them being hamstrung by the TSMC 28nm fabrication.

For my testing, XFX kindly loaned me one of its Nano models. This isn’t the very quickest AMD card, not being a Fury X, but it is representative of what it has to offer until a new flagship design comes along.

However, based on AMD’s recent announcements, the Fiji-based cards aren’t likely to be directly replaced by their new Polaris GPU designs – at least not initially.

Benchmarks


I’m sure other people will be able to test just about any title you can game on this hardware, but I just wanted to get a flavour of what sort of performance the Pascal GPU has on offer.

For this, I used my trusty test rig. Built on an Asus Sabertooth X79, it has a Core i7-3960X CPU, 16GB DDR3 quad-channel memory, Crucial MX100 SSD and Windows 10 Build 14361.

While there are bigger computing beasts available (the new Xeons, for example), there is still plenty of power in this platform to push a video card or even multiple ones.

The selection of tests I used isn’t massive, but it gives us a flavour of what the Pascal designs can do when compared with what AMD currently offers. However, it’s worth considering from the outset that these are pre-overclocked Pascal cards and therefore a decent chunk quicker than stock items.

On the flipside of that thinking, this hardware is so new, it isn’t fully optimised on the driver side of the process, so these same tests performed six months down the line might yield different results.

Bioshock Infinite


I bought this title recently when it could be purchased for 99p, and that was still probably too much. However, it does have an integrated benchmark, and because it uses lots of atmospheric effects and water, it’s worth firing up.

To make this more challenging, I moved all the quality settings to ultra and ran it in 1080p. Given the results on the GTX 1080 and GTX 1070, it would probably be perfectly playable at 4K resolutions in these settings.

The stepping in this graph suggests that the GTX 1070 is about 15% slower than the GTX 1080 and the Nano is about the same ratio below the GTX 1070. That probably suggests that AMD’s Fury X would straddle the Nano and 1070, making both Asus’s cards quicker than any single GPU that AMD had in its previous generation.

That the Nano is only 75% of the power of a GTX 1080 probably flatters the AMD chip, as we’ll see in later benchmarks.

Unigine Heaven And Valley


I like these tests because I can tell by the GPU temperatures that they really make the video card work. Both were run at 1080p on DX11, and on Valley I upped the ante by using x8 anti-aliasing, and on Heaven I used Normal tessellation.

If I’d had more time, perhaps I would have run in 1440p or maybe in stereo 3D, but as you can see from the average framerates, these are already tough tests.

Of these two benchmarks, Heaven is easier or a least it is for the Pascal cards, because they’re achieving between 49% and 77% higher frame-rates depending on the card model. Those margins are less on Valley, at just 26% and 42%, but the message is the same: a significant victory for Nvidia.

My interpretation of these results is that the 8GB of memory that the Pascal cards come with is ideal for computational graphics with high complexity, because it gives lots of very high speed working room for tessellation and geometry manipulation. While in theory the Nano has more bandwidth, having 4GB of HBM might not be enough in these scenarios.

3DMark Fire Strike


This is probably the best-known benchmark, and while its synthetic, it’s worth running if only so anyone wanting to know how their system compares can make a quick and easy comparison.

With such power on hand, I ran the standard ‘Performance’ test, the Extreme version and the new Ultra mode, all in 1080p. I could have run them in 1440p or even 4K, but the scores show that even the Pascal cards are riding uphill in Ultra settings.

As an extra bonus in these tests, I had some data I’d recently collected on the XFX R9 390X to throw into the mix.

It doesn’t really matter how you slice and dice this test; the new GTX 1080 is at least 50% quicker than the Nano, and in the Ultra mode is double the performance of the R9 390X. Those are scary numbers, and when I did overclock the GTX 1090 to 2,011MHz, I managed a score of over 10,000 in the Extreme mode.

However, there is some light at the end of AMD’s tunnel in the next test.

3DMark API Overhead


Technically, this isn’t a true benchmark, and Futuremark is clear to outline that in its notes. Yet it does provide some insight into how well some video hardware will perform in respect of DX12 and its superior handling of API draw calls.

I’ve stuck to the DX12 part of this test, because the Pascal cards can’t do AMD’s Mantle API.

Again, the Asus Pascal cards win, or rather the GTX 1080 does, and the GTX 1070 gets pipped at this post by the XFX R9 390X, curiously. Why would this be?

If you go back and look at this card, while it doesn’t have some of the things the Nano gets, it does have a couple of cards up its sleeves. One of these is 8GB of RAM, and the other is excellent double-precision performance.

I’m not sure which of these comes into play here, but however it works, the AMD cards can be very good in call handling under DX12 – something the Pascal cards achieve mostly through brute force, it seems.

What isn’t shown here is that under Mantle, the Nano achieved 20 million draw calls and the 390X a whopping 21.3 million. So in terms of API efficiency, Nvidia still has some work to do.

World of Tanks 9.15.0.1


Benchmarking this game isn’t easy, but I managed to create a method myself using FRAPS and a game replay I made. In the sequence, I tear around a town level in my Cromwell Berlin, making mincemeat out of three enemy tanks before suffering an unfortunate engine fire. The recording lasts 200 seconds, so there are plenty of frames to render and average.

I set the game to the very highest quality settings possible and ran it in both 1080p and 1440p.

I’m glad I ran the 1440p tests on this game, because frankly the 1080p ones are all within a margin of error and are probably capped by this title’s rather poor use of my multi-core processor.

I suspect, although I didn’t test it, that I’d be getting around 116fps on average even at 4K resolutions on the Asus GTX 1080, though the GTX 1070 is affected by the move up in detail.

At 1080p, the Nano does well enough, but at 1440p it suffers and registers about 70% of the GTX 1070 and 60% of the GTX 1080 levels.

As cobbled-together as this test was to a degree, it’s the one that gives the best impression of what real gaming experience is likely to be for GTX 1080 and GTX 1070 purchasers.

Tweaktown


I didn’t really have a chance to explore what overclocking powers these cards had, but the hints are that they’re immense. In fact, what exploration I did manage made me wonder why Asus had been so reserved with what enhancements it allowed itself, given the substantial headroom in this design.

All I had time to do was whack the GPU clock up from a boost of 1936 to 2011MHz on the GTX 1080, and with zero voltage added ran 3DMark FireStrike Extreme, which it executed flawlessly. That took the score to over 10,000, and the card only peaked at 74ºC – hardly very warm.

However, the temperature limit is a relatively low 82ºC, so for those wanting to really make these cards sing, they might want to consider water cooling it at some point.

According to Asus, it’s had one card up to 2145MHz, though frankly how high any one will go without problems isn’t predictable given how thin the tracks on these chips are. These pre-overclocked designs are on average 15% faster than the Founders Edition out of the box, and there might be at least another 5-10% in there for the ardent tweaker.

My only concern would be that given how much just one of these costs, destroying it through experimentation would be the last thing on my mind. Frankly, what blessings these designs bestow are probably enough for most people and the vast majority of current games.

Two Small Weaknesses


While I was testing these cards, I noticed a couple of aspects to them that aren’t ideal, so I need to point those out.

The first is that when the new 10 Series cards were first announced, Nvidia made much of the multi-GPU capabilities, and it was reasonably concluded that the ultimate gaming system could be created by getting an X99 platform with four PCIe x16 slots and a PSU the size of a dumpster, and dividing and conquering pixels on a massive scale.

However, it turns out that with this design, a problem exists that the most of these you can combine is two, or at least you can combine more, but they won’t work with the majority of titles. What complicated this story is that initially Nvidia talked about three- and four-way configurations, using a bridge connector that is unique to these cards.

With such powerful cards, it was determined very early on that the bandwidth of the PCI bus wasn’t enough to support many of these cards, so Nvidia designed a special ‘high bandwidth’ pathway to enable them to share data effectively.

Before launch, it said that while three- and four-way configurations weren’t what it was recommending, it would release a special driver Enthusiast Key (a software code) to unlock this functionality.

And then it decided that it wouldn’t do that, even if it had promised to do so, and its latest statements on the subject suggest that it’s done with more than two-card SLI.

The rub here for Nvidia was between implicit and explicit SLI, where under implicit mode, it divided the workload between cards and conversely in the explicit mode, the game developer controlled the allocation.

Just not enough game developers would use explicit mode, and often due to inefficiencies, implicit mode actually reduced the performance of multiple cards or at best didn’t justify the extra expense.

So are multi-GPU systems, the likes of which we’ve seen before, gone from Nvidia’s pathway forever? Not entirely, no.

For example, it’s possible to use more than two cards in a number of contexts, especially those that aren’t games. And you can use two Pascal-cored cards in SLI and have a third in 100% PhysX mode.

And Microsoft has promised a special multi-GPU mode for DX12 that will enable you to mix and match any GPU (even AMD and Nvidia) for some extra horsepower.

In fairness to Nvidia, the number of people using these types of system is remarkably small, and the effort to make them work reliably huge, so I fully understand why it’s chosen to put a stop to it.

The other issue is one that isn’t a problem now, but could become one further down the road: DX12 performance. At this time there are virtually no DX12 games around, because the vast majority of PC owners aren’t on Windows 10 to have this API.

What’s critical to realise about DX12 is that when AMD presented its Mantle API, it turned many heads at Microsoft, which embraced the idea and folded it into its next version of DirectX.

AMD built its Fury architecture to work the best in Mantle, and unsurprisingly it goes great on its latest hardware, as does DX12 for much the same reason.

Nvidia’s Maxwell architecture just isn’t set up for working in this way, and while it can perform well, it doesn’t sparkle like AMDs can.

And Pascal is based on Maxwell, and the numbers suggest that it isn’t built for DX12 in the way that AMD GPUs are. Or they’re not optimised yet.

The thing is, AMD was thinking about Mantle probably four or five years ago at least and tailored its development to what it saw as the most effective API change.

While Mantle ended up with modest support, what it did was get people thinking about exploiting more GPU power and reducing the bottleneck that existed between the CPU and GPU in a modern PC.

It may be that Nvidia will be able to adapt Pascal in the next iteration or the one after that, and those changes will probably sync better with the rise of DX12 titles. But at this time it isn’t ideally positioned to exploit Windows 10 and those demos and games that use its gaming API.

What About The AMD RX 480?


What isn’t here, because they’re not yet available for me to test is the new AMD RX 480, a card that has attracted many people’s interest.

AMD has outlined that this card isn’t meant to replace the Fury X or R9 390X as a flagship design, and the intention is that this will be a $200 card for the 4GB model and $225 for the 8GB variant.

What will be fascinating to see is if it’s anywhere near the performance of the Nano, because two of these working in CrossFireX mode for £100 less than the GTX 1070 would be a clear and present danger for Nvidia.

Until those tests are run, this is all conjecture, though I think most gamers would be delighted with an affordable card that can run 1440p games and has enough power for VR applications.

Final Thoughts


If you want the fastest possible single video card, then the answer is the GTX 1080, at this time. And based on the limited experience I’ve had so far, the Asus pre-overlocked design looks superior. That was so simple a conclusion that I can catch that early bus!

But life is never that straightforward, is it? And the GTX 1080 and 1070 invite as many questions as they answer, the first of those being what those retailers holding stock of the Titan X cards are going to do with them.

For the majority of my readers, the really good news here isn’t that they can blow another massive chunk of cash on something they’ll supersede a year from now, but the downward pressure it will put on the GTX 970 and correspondingly the GTX 960 below that.

Being realistic, these £100-£150 options are the cards people actually buy most of the time, not ones that cost more than a typical computer.

With no other 10 series cards available other than the two so far announced, the 9 series will still be a major factor for some time to come and probably the major contributor to Nvidia’s profitability.

Another factor in those numbers is undoubtedly what AMD has planned, because the new RX 480 is about to arrive (as I write this) and that could be a game changer from an entirely different perspective.

Instead of coming to fight Nvidia’s best toe to toe, it looks like AMD is much more interested hitting it’s competitor’s bottom line with a card that is affordable, yet powerful enough. And with the same multi-GPU restrictions that slightly salted Nvidia’s new products, AMD might well take the ultimate performance crown by weight of numbers or just by being more cost effective.

Until I test the RX 480, it’s impossible to make that call, but the prices that both sides have been getting to for video cards has been silly, and AMD looks like it’s determined to bring them back to a level where more gamers can afford something that can handle all the eye candy at high resolution.

As for the GTX 1080 and GXT 1070, they’re wonderful cards if your case can handle them, your system has a CPU that can exploit them and your bank account can suffer the consequences of that purchasing decision.

But if you don’t have a Core i7 CPU and at least a 1440p monitor and an SSD, then they probably shouldn’t be at the top of your shopping list until you have.