Tuesday, 9 December 2014

Inside Intel’s Broadwell architecture

Intel Broadwell

Intel’s latest architecture forms the basis of its fastest, most power-efficient processors yet. Mike Bedford explores the technical tricks that provide us with ever-faster chips.

If you haven’t yet come across Broadwell, you soon will: it’s Intel’s codename for the technology that will power next year’s fifth-generation Core i3, i5 and i7 processors, as well as the new Core M series. According to Intel’s now famous “Tick-Tock” roadmap, it’s a “Tick” –  that is, a die shrink of ‚its ‚predecessor Haswell, from 22nm to 14nm.

In the grand scheme of things, a die shrink may not seem significant: if you’re looking for major new features, you’ll have to wait until the next “Tock”, expected late next‚ year in the form of the all-new Skylake core. With the ‚latest consumer devices, though, it’s increasingly the ‚case ‚that performance and features are less of an issue ‚than battery life – so improving efficiency is becoming more and‚ more important in CPUs.


THE 14NM PROCESS


In semiconductor manufacture, the process size officially represents half the distance between identical features in an array of memory cells. It’s also roughly the length of a‚single gate within one of the billions of microscopic transistors that make up a CPU. The move from 22nm to‚ 14nm is a big proportional drop – more so than it may‚appear, because the gate length represents only one‚dimension of a two dimensional design. In the case of‚ the ‚move to Broadwell, the silicon die is scaled down to‚ a relatively paltry 63% of the area of a comparable Haswell chip.

The benefits of a smaller die are well known. Smaller transistors lose less power to leakage, and don’t heat up as‚much, so they draw less power and need less in the way ‚of cooling. Practically speaking, this means they can power slimmer, quieter devices, with longer battery life, without compromising performance.

Using smaller transistors also means that more can be packed onto a given area of silicon: Intel’s first commercial processor, the 4004, released in 1971, used a 10–m process to build a comparatively modest 2,300 transistors onto the ‚die. To implement today’s CPUs at that scale, you’d need a die the size of a door. Quite apart from the practical problems of building such a creation, there’s a cost issue: ultra-pure silicon wafers aren’t cheap, so shrinking the die ‚lets the manufacturer make more of its raw materials.

Needless to say, it’s not easy to keep continually shrinking the process, which is why the Tick-Tock model sees Intel spend fully half of its time focusing on improving efficiency. Even so, Intel CEO Brian Krzanich admitted back in May that it was taking longer than anticipated to‚get Broadwell into full production. That’s why the company has decided to hold back the mainstream chips ‚until early 2015 and launch Broadwell before Christmas in‚ the form of the new Core M processor.

THE CORE M PROCESSOR


Since the key advantage of Broadwell is power efficiency, it ‚makes sense for Intel to debut it in a design that plays to that strength. The Core M is effectively a light weight Core i3, designed for compact fanless devices such as ultra-thin laptops and Windows tablets.

The first Core M devices are already starting to appear such as the Lenovo Yoga 3 Pro. Although it’s being launched in three models, all are quite similar, as you can see opposite.

In terms of processing power, the Core M isn’t much to write home about: Intel claims that for every day tasks it delivers around twice the performance of “your old laptop” – based on a four-year old Core i5-520UM. We expect it to fall short of the performance of a modern Core i3.

That’s not the point of the Core M, though. The real news is that these processors are designed with a nominal thermal design power (TDP) of only 4.5W – a figure Intel has identified as the “sweet spot”, in terms of heat generation, for an 11.6in device that’s 8mm thick. Optionally, the 5Y10a can be configured down to 4W. For‚ comparison, the lightest Haswell parts – the Y-series models, designed for Ultrabooks and tablets – are designed to draw up to 11.5W. That’s a huge drop in maximum power consumption – and Intel claims a 60% reduction in‚ power consumption when the processor is idle. The company has also worked on improving the efficiency of the supporting hardware, including a new digital signal processor (DSP) called “SmartSound”.

Such power reductions don’t tell the whole story, however. The screen, hard disk and other parts of the system all consume energy, too: in our light-use battery test, the Lenovo Yoga 3 Pro lasted only around eight hours, placing it on a par with last-generation Ultrabooks. Hopefully future designs will manage to eke a longer life from the architecture.

OTHER BROADWELL MODELS


Since Broadwell is primarily a die shrink, we’re not expecting the next generation of high-end processors to bring major performance advantages over Haswell; two years ago, when the 22nm Ivy Bridge architecture replaced the 32nm Sandy Bridge, we saw an overall improvement of only 4% in benchmark scores between top-of-the range chips.

This may explain why Intel is in less of a hurry to get new Broadwell-based Core i5 and i7 models out of the door. Even when such chips become available, we probably won’t be advising high-end desktop users to rush out and upgrade to Broadwell. Not only is doing so unlikely to yield worthwhile benefits, it will be an upheaval, since you’ll need a motherboard with a new 9 Series chipset (although Intel has indicated that some 8 Series boards might be usable after a BIOS update, as the sockets are expected to be physically the same). For lower-end models, the switch makes even less sense: it’s rumoured that Intel may not bother producing low-end Broadwell CPUs for the desktop, and will instead stick with Haswell-based chips for this market.

This doesn’t mean power efficiency is Broadwell’s only benefit, however. As we’ve mentioned overleaf, the GPU has been substantially beefed up, promising slicker performance, especially in games. Broadwell also introduces a handful of instruction-set extensions: the new ADX Extensions provide upgraded performance in arbitrary-precision calculations, while the RDSEED instruction generates a non-deterministic series of random bits that can be used to seed an external randomnumber generator.

This latter tool does nothing for performance, but makes it even less likely encryption systems can be broken by analysing the pseudorandom numbers used to generate the encryption key.

WHAT BROADWELL CHIPS (PROBABLY) WON’T DO


Intel hasn’t yet shared technical details of next year’s mainstream processors, so these are still open to speculation. However, one thing we probably won’t see is an increased number of cores. It’s true that Intel’s Core i7-5960X, which was released in August, offers for the first time eight physical cores (servicing up to 16 threads, courtesy of Hyper-Threading). But despite its fifth-generation model number, this isn’t actually a Broadwell chip, but an enthusiast-class Haswell-E 22nm processor with a huge 140W TDP and a retail price of around $1300.

Broadwell is much more about efficiency, and that’s why we expect Intel won’t be throwing in extra cores. Doing so increases the size, complexity and heat generation of the chip. It also doesn’t benefit performance as much as you’d hope; although programmers are getting better at writing multi-threaded code, there are still plenty of processes that still don’t benefit from additional cores.

Even when a task can be spread across a large number of cores, all of the cores have to share a single interface to other circuitry on the motherboard, as well as sharing some of the on-chip cache memory. This can be beneficial when it comes to passing data from one core to another, as on-chip communication is much faster than it would be between two separate chips. But adding cores increases the contention for resources, so you’ll see diminishing returns. This is why, almost a decade after the first dual-core chips arrived for desktop PCs, the typical personal computer still has only two or four cores, while processors with dozens of cores are found only in specially designed server and workstation configurations.

We also probably won’t see higher clock speeds. For many years in the 1980s and 1990s, Intel continually increased performance by dialling up clock speeds. In the past decade, however, the upper range of clock speeds has barely changed. That’s because increasing the clock speed causes the processor to consume more power. When microprocessors had only a few million transistors on the chip, and were normally fed directly from the mains, it didn’t matter a great deal if the transistors were inefficient. But as transistor counts have approached the billion mark, and battery-powered computing has overtaken the desktop, the amounts of power being consumed – and heat generated – have become significant. Now that battery life is such a priority, it seems very unlikely that we’ll ever see speeds ramp up in the way they did back in the 1990s.

This doesn’t mean the 4GHz of the current top-end Haswell Core i7 is necessarily the best we can expect. As of Sandy Bridge in 2011, Intel has unlocked its top-end processor models, so the user can drive the Turbo Boost as high as they like – the only limit being the chip’s power consumption and temperature requirements. As Broadwell reduces both, we won’t be surprised to see high-end, unlocked models running in excess of 5GHz.

WE’LL HAVE TO WAIT FOR DDR4


Another rumour that’s been floating around is that Broadwell might introduce support for DDR4 memory. It’s a sensible idea, given Broadwell’s focus on efficiency: DDR4 is less power-hungry than DDR3, running at 1.2V rather than the 1.5V of the older technology. It’s also a technology that Intel has already embraced at the upper end of its range, again with Haswell-E. However, it’s been widely leaked that Broadwell Core i3, i5 and i7 processors will stick with DDR3, while server-class Broadwell Xeons move up to DDR4.

At any rate, the advantages of DDR4 are, again, not as great as they may sound. The key benefit of DDR4 is that it enables the CPU to fetch instructions and data much more quickly: where DDR3 supports operating frequencies between 800 and 2,133MHz, DDR4 establishes 1,600MHz as a minimum speed, rising to a maximum of 3,200MHz with a compatible chipset and suitable DIMMs.

In practice, though, ramping up the speed of the memory bus doesn’t automatically double memory throughput: it helps, but the latency of the memory chips is also a big factor, representing how long the processor spends waiting around before the data it’s requested starts to arrive.

On today’s DDR4 modules, that’s comparable to the latency of DDR3 – and currently. DDR4 DIMMs also tend to cost around 20% more than DDR3.

Still, there’s no reason for gloom: DDR4 support is expected to arrive in Skylake next year. By then, latencies should have improved and prices should have fallen.

ALL YOU NEED IS CACHE


Although system memory may not be getting a speed boost, one variable that Intel could tweak to improve performance is the amount of on-chip cache – superfast memory that’s integrated within the processor, used to store copies of recently used data or instructions, and those in nearby memory locations, so the CPU can access them right away if they’re needed.

In recent history, Intel has avoided tinkering with the ultra-fast L1 and L2 caches – every Core i3, i5 and i7, dating back to the very first Nehalem chip in 2008, has featured 64KB per core of L1 and 256KB per core of L2.

Shared L3 cache however has varied in different chip models: desktop Core i7 models have tended to have 8MB of cache, while more lightweight and mobile parts have had as little as 3MB – and some Celeron-branded models based on the same core have cut this right down to 1MB.

BOOSTED GRAPHICS


As onboard graphics have become more powerful, and that is now particularly important and effective in the mobility space, the importance of L3 cache has increased, as it can serve both the CPU and GPU cores. Notably, the Core M models, despite nominally sitting at the bottom of the stack, have a 4MB cache – larger than the 3MB found on low-end Haswell Core i3s. Intel might be planning a similar upgrade in other models to assist graphics performance – a move which could also help desktop computing when the GPU isn’t working too hard.

It’s worth noting that versions of Haswell using Iris Pro graphics also feature a new 128MB L4 cache, known as the eDRAM. Like the L3 cache, this is shared between the CPU and the GPU, so it can speed up both processing and visual tasks: gaming benchmarks have shown a performance increase of 20% to 60%, thanks to the presence of eDRAM.

We expect Broadwell will follow suit by incorporating this L4 cache in the high-end models using Iris Pro-branded graphics.