A state of Zen: AMD unveils new architectural details on its latest CPU core
A state of Zen: AMD unveils new architectural details on its latest CPU cadre
AMD unveiled a great deal of information at Hot Chips about its upcoming "Zen" CPU cadre and architecture. The new bit has been the subject of an enormous amount of speculation for more a twelvemonth, merely things have heated upwards over the by few weeks as leaked benchmarks surfaced and AMD conducted its own public exam.
Today'south information dump is the most item AMD has shared to appointment — in fact, information technology'south significantly more information than I expected the company to share until Zen really launched. Let's get started.
Zen's design goals
Zen is all-time understood every bit a response to the problems that plagued Bulldozer. AMD's original goal with that architecture was to intelligently share resources betwixt CPU cores, while simultaneously hit higher frequencies and higher execution efficiencies than AMD'due south previous CPU core, K10. Bulldozer's failure to deliver left AMD in an ugly position: Should it endeavor to repair its sometime core or return to the cartoon board and build something completely new?
Sources we've spoken to at AMD suggest that the difficulty of repairing Bulldozer was significant plenty that AMD opted to build a new core from scratch with none of Bulldozer'southward luggage. That doesn't hateful in that location'southward no Bulldozer Deoxyribonucleic acid in Zen — in fact, AMD has stated that the expertise it gained from improving Steamroller and Excavator'due south free energy efficiency was put to adept use for its newest compages. Say instead that what pattern elements AMD does borrow from its previous architectures will exist the components of the chip that really worked well rather than the problematic ones that dominated its functioning.
Enshroud architecture
Much of what went wrong with Bulldozer was linked to its cache subsystem and overall compages, so that's a adept place to kickoff diving into Zen.
Where Bulldozer used the concept of a CPU module (defined every bit a pair of cores that shared resource), Zen uses complexes. One CPU complex (CCX) contains iv cores, 2MB of L2 enshroud (512KB per core), and 8MB of L3 cache. That ways AMD'due south highest-cease consumer Zen contains eight cores and 16MB of L3 enshroud in total, carve up into 2x8MB chunks. AMD has stated that the two CCXs on an viii-core chip can communicate with each other via the on-chip fabric, though there'due south likely a performance penalty for doing then.
Zen'southward L3 enshroud operates equally a victim cache for the L1 and L2, pregnant data evicted from those caches is stored in the L3 instead. It's besides 16-manner associative, which is a significant alter from Bulldozer's 64-way associative L3. A cache with a college set up associativity has a greater likelihood of containing the information the CPU is looking for, but takes longer to search — and one of the issues that bedridden Bulldozer was its cache latency at nearly every phase.
Nosotros don't know anything nearly clock speeds on either the L3 cache or the integrated memory controller. Historically, AMD'south Bulldozer-derived CPUs and APUs have used a clock between 1.eight – 2.2GHz for the L3 cache and IMC.
AMD has stated that L1 and L2 bandwidth is virtually 2x Excavator while L3 bandwidth is supposedly 5x higher. These changes should go on the core fed and support higher performance. The L1 cache is write-back instead of write-through — that's a significant change that should improve performance and reduce cache contention (Bulldozer's write-through cache meant that L1 performance could be constrained by L2 cache write speed in some cases).
The CPU cadre
We've already tackled caches, so allow's bank check out the CPU core itself.
Here's Zen's high-level core diagram. There are several significant differences compared with AMD'south older Bulldozer cadre, including the addition of an op cache, a micro-op queue, and a larger number of integer pipelines per core.
Here's an expanded view of how the cadre gets fed. This was another major problem area with Bulldozer — Bulldozer and Piledriver's shared logic meant that the dispatch unit could merely transport piece of work to i core or the other every clock cycle. Steamroller afterward stock-still this issue by doubling up acceleration units, only this merely resulted in a modest performance comeback.
AMD has taken a page from Intel'south book and implemented an op cache with Zen, even if we don't know much nigh the specifics of the feature. This allows the CPU to enshroud decoded operations that it may need to dispatch repeatedly rather than requiring it to repeatedly decode and acceleration the same instructions. Each Zen core tin decode four instructions per clock cycle, but the micro-op queue can dispatch half dozen instructions per bicycle. Clearly AMD anticipates that its cache will salvage pressure on the decode units and help proceed the cadre fed while reducing power consumption. Steamroller had a macro-op queue that could concord up to 40 macro-ops only its usefulness was limited to tiny loops.
Like the Bulldozer family unit, Zen can theoretically fetch 32 bytes of data at a time, though CPU annotator Agner Fog found that the Bulldozer family unit of cores was practically limited to 21 bytes of data when both cores were in use or sixteen bytes if 1 core was used. He theorized that this limit may take been why doubling upward on Steamroller's dispatch units yielded relatively limited results. Resolving this in Zen could be part of why AMD has significantly improved its IPC.
The integer cores have been rebalanced from the Bulldozer family. Prior to Bulldozer, AMD'due south K10 paired three ALUs with three AGUs (address generation units). Bulldozer trimmed this to 2 ALUs and 2 AGUs per core. This, combined with the express acceleration ability in the BD/PD cores, was idea to be a major functioning bottleneck until Steamroller added additional dispatch capabilities and slashed the penalty Kaveri took when scaling beyond multiple cores. (Piledriver and Bulldozer achieved roughly 1.8x of the scaling you lot'd expect from a "true" dual-cadre, while Steamroller hit approximately 1.9x.) 4 ALUs and 2 AGUs could heave overall functioning compared with Bulldozer'due south narrow design, only we'll have to run into how the chip performs in benchmarks.
AMD'due south floating point unit will notwithstanding use 128-bit registers for AVX and AVX2, but latency on some FP operations has been decreased and in that location are now four pipes instead of three to feed the FPU. The CPU isn't capable of executing 256-bit AVX instructions in a single cycle. Whether this volition prove a detriment in real-world code is an open question, but AVX/AVX2 haven't boosted general application performance the way SSE2 once did.
Putting it all together:
If you want a unmarried high-level slide that captures what AMD has disclosed most Zen to appointment, this is information technology:
There are still some areas of the chip I haven't touched on, like SMT, because I desire to research how AMD'south SMT implementation differs from Intel's only oasis't had time to examine the topic in-depth. AMD hasn't stated that Zen volition utilise features like Carrizo'due south AVFS, merely given that they've extended that approach across both Polaris and their APU lines it'southward a safe bet they will.
Still, there's a lot hither to suggest that Zen will evangelize substantially ameliorate operation than any Bulldozer cadre ever did. The devil, as e'er, will exist in the details. How much performance does AMD gain with SMT? What clock speeds can it hit? How will it price the core against Intel's current products? Will it deliver "enough" of a performance improvement and how will its chipset features compare with what Intel brings to market place?
These are important questions that volition ultimately decide whether Zen can reignite contest in the CPU market place. Speaking strictly for myself, I'm cautiously optimistic about Zen. Bulldozer, in retrospect, was almost perfectly ill-positioned for the realities of the CPU and foundry business from 2011 to 2016. It was a CPU designed for high frequencies at a fourth dimension when CPU frequency had slammed confront-first into central scaling limits. AMD improved the core'due south functioning and power efficiency simply couldn't fix the problems that bankrupt it in the first place. It's non ridiculous to think that the company could spin a chip with twoscore% improved IPC given where they started from.
Zen doesn't need to lucifer Intel clock-for-clock or core-for-core to be a huge improvement over where AMD is today. It needs to offering improved efficiency, ability efficiency, and much more than competitive performance at a relevant toll point. Based on what AMD has disclosed to-date, I think they've got a real risk of pulling it off. And while we thought much the same thing nigh Bulldozer five years ago, Zen isn't trying to create a new type of shared-resources CPU. That should count for something in the last analysis.
Zen is expected to debut in Q1 2017 in wide volume. The current smart coin is on a CES debut and launch, though that's just a approximate based on previous schedules and production cycles.
Source: https://www.extremetech.com/computing/234354-a-state-of-zen-amd-unveils-new-architectural-details-on-its-latest-cpu-core
Posted by: coxouthad.blogspot.com
0 Response to "A state of Zen: AMD unveils new architectural details on its latest CPU core"
Post a Comment