Steamroller (microarchitecture)

Microarchitecture by AMD From Wikipedia, the free encyclopedia

AMD Steamroller Family 15h is a microarchitecture developed by AMD for AMD APUs, which succeeded Piledriver in the beginning of 2014 as the third-generation Bulldozer-based microarchitecture.[2] Steamroller APUs continue to use two-core modules as their predecessors, while aiming at achieving greater levels of parallelism.

Quick Facts General information, Launched ...
Steamroller - Family 15h (3rd-gen)
General information
LaunchedJanuary 14, 2014; 11 years ago (January 14, 2014)
Common manufacturer
Architecture and classification
Technology node28 nm SHP[1]
Instruction setAMD64 (x86-64)
Physical specifications
Sockets
Products, models, variants
Core name
History
PredecessorPiledriver - Family 15h (2nd-gen)
SuccessorExcavator - Family 15h (4th-gen)
Support status
iGPU unsupported
Close

Microarchitecture

Summarize
Perspective

Steamroller still features two-core modules found in Bulldozer and Piledriver designs called clustered multi-thread (CMT), meaning that one module is marketed as a dual-core processor.[3] The focus of Steamroller is for greater parallelism.[4] Improvements center on independent instruction decoders for each core within a module, 25% more of the maximum width dispatches per thread, better instruction schedulers, improved perceptron branch predictor, larger and smarter caches, up to 30% fewer instruction cache misses, branch misprediction rate reduced by 20%, dynamically resizable L2 cache, micro-operations queue,[5] more internal register resources and improved memory controller.

AMD estimated that these improvements will increase instructions per cycle (IPC) up to 30% compared to the first-generation Bulldozer core while maintaining Piledriver's high clock rates with decreased power consumption.[3] The final result was a 9% single-threaded IPC improvement, and 18% multi-threaded IPC improvement over Piledriver.[6]

Steamroller, the microarchitecture for CPUs, as well as Graphics Core Next, the microarchitecture for GPUs, are paired together in the APU lines to support features specified in Heterogeneous System Architecture.

History

In 2011, AMD announced a third-generation Bulldozer-based line of processors for 2013,[7] with Next Generation Bulldozer as the working title, using the 28 nm manufacturing process.[8]

On 21 September 2011, leaked AMD slides indicated that this third generation of Bulldozer core was codenamed Steamroller.[9][10]

In January 2014, the first Kaveri APUs became available.[11]

Starting from May 2015 till March 2016 new APUs were launched as Kaveri-refresh (codenamed Godavari).[12]

Features

The following table shows features of AMD's processors with 3D graphics, including APUs (see also: List of AMD processors with 3D graphics).

More information Platform, High, standard and low power ...
Platform High, standard and low power Low and ultra-low power
CodenameServer Basic Toronto
Micro Kyoto
Desktop Performance Raphael Phoenix
Mainstream Llano Trinity Richland Kaveri Kaveri Refresh (Godavari) Carrizo Bristol Ridge Raven Ridge Picasso Renoir Cezanne
Entry
Basic Kabini Dalí
MobilePerformance Renoir Cezanne Rembrandt Dragon Range
Mainstream Llano Trinity Richland Kaveri Carrizo Bristol Ridge Raven Ridge Picasso Renoir
Lucienne
Cezanne
Barceló
Phoenix
Entry Dalí Mendocino
Basic Desna, Ontario, Zacate Kabini, Temash Beema, Mullins Carrizo-L Stoney Ridge Pollock
Embedded Trinity Bald Eagle Merlin Falcon,
Brown Falcon
Great Horned Owl Grey Hawk Ontario, Zacate Kabini Steppe Eagle, Crowned Eagle,
LX-Family
Prairie Falcon Banded Kestrel River Hawk
ReleasedAug 2011Oct 2012Jun 2013Jan 2014 2015Jun 2015Jun 2016Oct 2017Jan 2019Mar 2020 Jan 2021Jan 2022Sep 2022Jan 2023Jan 2011May 2013Apr 2014May 2015Feb 2016Apr 2019Jul 2020Jun 2022Nov 2022
CPU microarchitecture K10 Piledriver Steamroller Excavator "Excavator+"[13] Zen Zen+ Zen 2 Zen 3 Zen 3+ Zen 4 Bobcat Jaguar Puma Puma+[14] "Excavator+" Zen Zen+ "Zen 2+"
ISAx86-64 v1x86-64 v2x86-64 v3x86-64 v4x86-64 v1x86-64 v2x86-64 v3
Socket Desktop Performance AM5
Mainstream AM4
Entry FM1 FM2 FM2+ FM2+[a], AM4 AM4
Basic AM1 FP5
Other FS1 FS1+, FP2 FP3 FP4 FP5 FP6 FP7 FL1 FP7
FP7r2
FP8
FT1 FT3 FT3b FP4 FP5 FT5 FP5 FT6
PCI Express version 2.0 3.0 4.0 5.0 4.0 2.0 3.0
CXL
Fab. (nm) GF 32SHP
(HKMG SOI)
GF 28SHP
(HKMG bulk)
GF 14LPP
(FinFET bulk)
GF 12LP
(FinFET bulk)
TSMC N7
(FinFET bulk)
TSMC N6
(FinFET bulk)
CCD: TSMC N5
(FinFET bulk)

cIOD: TSMC N6
(FinFET bulk)
TSMC 4nm
(FinFET bulk)
TSMC N40
(bulk)
TSMC N28
(HKMG bulk)
GF 28SHP
(HKMG bulk)
GF 14LPP
(FinFET bulk)
GF 12LP
(FinFET bulk)
TSMC N6
(FinFET bulk)
Die area (mm2)228246245245250210[15]156 180210CCD: (2x) 70
cIOD: 122
17875 (+ 28 FCH)107 ?125149~100
Min TDP (W)351712101565354.543.95106128
Max APU TDP (W)10095654517054182565415
Max stock APU base clock (GHz)33.84.14.13.73.83.63.73.84.03.34.74.31.752.222.23.22.61.23.352.8
Max APUs per node[b]11
Max core dies per CPU1211
Max CCX per core die1211
Max cores per CCX482424
Max CPU[c] cores per APU481682424
Max threads per CPU core1212
Integer pipeline structure3+32+24+24+2+11+3+3+1+21+1+1+12+24+24+2+1
i386, i486, i586, CMOV, NOPL, i686, PAE, NX bit, CMPXCHG16B, AMD-V, RVI, ABM, and 64-bit LAHF/SAHFYes Yes
IOMMU[d]v2v1v2
BMI1, AES-NI, CLMUL, and F16C YesYes
MOVBEYes
AVIC, BMI2, RDRAND, and MWAITX/MONITORX Yes
SME[e], TSME[e], ADX, SHA, RDSEED, SMAP, SMEP, XSAVEC, XSAVES, XRSTORS, CLFLUSHOPT, CLZERO, and PTE CoalescingYes Yes
GMET, WBNOINVD, CLWB, QOS, PQE-BW, RDPID, RDPRU, and MCOMMITYes Yes
MPK, VAESYes
SGX
FPUs per core10.5110.51
Pipes per FPU22
FPU pipe width128-bit256-bit80-bit128-bit256-bit
CPU instruction set SIMD levelSSE4a[f]AVX AVX2AVX-512SSSE3AVXAVX2
3DNow!3DNow!+
PREFETCH/PREFETCHWYes Yes
GFNIYes
AMX
FMA4, LWP, TBM, and XOPYes Yes
FMA3Yes Yes
AMD XDNAYes
L1 data cache per core (KiB)64163232
L1 data cache associativity (ways)2488
L1 instruction caches per core10.51 10.51
Max APU total L1 instruction cache (KiB)256128192256512256 64128 96 128
L1 instruction cache associativity (ways)2348 2 3 4 8
L2 caches per core10.5110.51
Max APU total L2 cache (MiB)424161212
L2 cache associativity (ways)168168
Max on-die L3 cache per CCX (MiB)416324
Max 3D V-Cache per CCD (MiB)64
Max total in-CCD L3 cache per APU (MiB)4816644
Max. total 3D V-Cache per APU (MiB)64
Max. board L3 cache per APU (MiB)
Max total L3 cache per APU (MiB)48161284
APU L3 cache associativity (ways)1616
L3 cache schemeVictimVictim
Max. L4 cache
Max stock DRAM supportDDR3-1866DDR3-2133DDR3-2133, DDR4-2400DDR4-2400DDR4-2933DDR4-3200, LPDDR4-4266DDR5-4800, LPDDR5-6400DDR5-5200DDR5-5600, LPDDR5x-7500DDR3L-1333DDR3L-1600DDR3L-1866DDR3-1866, DDR4-2400DDR4-2400DDR4-1600DDR4-3200LPDDR5-5500
Max DRAM channels per APU21212
Max stock DRAM bandwidth (GB/s) per APU29.86634.13238.40046.93268.256102.40083.200120.000 10.66612.80014.93319.20038.40012.80051.20088.000
GPU microarchitectureTeraScale 2 (VLIW5)TeraScale 3 (VLIW4)GCN 2nd genGCN 3rd genGCN 5th gen[16]RDNA 2RDNA 3TeraScale 2 (VLIW5)GCN 2nd genGCN 3rd gen[16]GCN 5th genRDNA 2
GPU instruction setTeraScale instruction setGCN instruction setRDNA instruction setTeraScale instruction setGCN instruction setRDNA instruction set
Max stock GPU base clock (MHz)60080084486611081250140021002400400 538600 ?847900120060013001900
Max stock GPU base GFLOPS[g]480614.4648.1886.71134.517601971.22150.43686.4102.4 86 ? ? ?345.6460.8230.41331.2486.4
3D engine[h]Up to 400:20:8Up to 384:24:6Up to 512:32:8Up to 704:44:16[17]Up to 512:32:8768:48:8128:8:480:8:4128:8:4Up to 192:12:8Up to 192:12:4192:12:4Up to 512:?:?128:?:?
IOMMUv1IOMMUv2IOMMUv1 ?IOMMUv2
Video decoderUVD 3.0UVD 4.2UVD 6.0VCN 1.0[18]VCN 2.1[19] VCN 2.2[19]VCN 3.1 ?UVD 3.0UVD 4.0UVD 4.2UVD 6.2VCN 1.0VCN 3.1
Video encoderVCE 1.0VCE 2.0VCE 3.1VCE 2.0VCE 3.4
AMD Fluid Motion No Yes No No Yes No
GPU power savingPowerPlayPowerTunePowerPlayPowerTune[20]
TrueAudioYes[21] ? Yes
FreeSync1
2
1
2
HDCP[i] ?1.42.22.3 ?1.42.22.3
PlayReady[i]3.0 not yet3.0 not yet
Supported displays[j]2–32–433 (desktop)
4 (mobile, embedded)
42344
/drm/radeon[k][23][24]Yes Yes
/drm/amdgpu[k][25]Yes[26] Yes[26]
Close
  1. For FM2+ Excavator models: A8-7680, A6-7480 & Athlon X4 845.
  2. A PC would be one node.
  3. An APU combines a CPU and a GPU. Both have cores.
  4. Requires firmware support.
  5. Requires firmware support.
  6. No SSE4. No SSSE3.
  7. Single-precision performance is calculated from the base (or boost) core clock speed based on a FMA operation.
  8. To play protected video content, it also requires card, operating system, driver, and application support. A compatible HDCP display is also needed for this. HDCP is mandatory for the output of certain audio formats, placing additional constraints on the multimedia setup.
  9. To feed more than two displays, the additional panels must have native DisplayPort support.[22] Alternatively active DisplayPort-to-DVI/HDMI/VGA adapters can be employed.
  10. DRM (Direct Rendering Manager) is a component of the Linux kernel. Support in this table refers to the most current version.

Processors

Summarize
Perspective

APU lines

  1. Kaveri A-series APU
  2. Berlin APU - canceled
    • Announced in 2013 by AMD[36] the Berlin APU were targeted at the enterprise and server markets featuring four Steamroller cores, up to 512 stream processors and support for ECC memory.

FX lines (discontinued)

In November 2013 AMD confirmed it would not update the FX series in 2014, neither its Socket AM3+ version, nor will it receive a Steamroller version with a new socket.[37][38]

AMD however, released a Kaveri based FX-770K for desktop and FX-7600P for mobile which are basically APUs with their integrated graphics disabled similar to the Athlon X4 FM2+ line. Those APUs were released for OEMs only.

Server lines (canceled)

AMD's server roadmaps for 2014 showed:[39][40]

  • Berlin APU - quad-core x86 Steamroller architecture (as described above) for 1 Processor (1P) compute and media clusters
  • Berlin CPU - quad-core x86 Steamroller architecture for 1P web and enterprise services clusters
  • Seattle CPU - 4/8 core AArch64 Cortex-A57 architecture (Opteron A1100) for 1P web and enterprise services clusters [41]
  • Warsaw CPU - up to 16 core x86 Piledriver (2nd gen Bulldozer) architecture (Opteron 6338P and 6370P) for 2P/4P servers [42]

However, plans for Steamroller Opteron products were cancelled, likely due to the poor energy efficiency achieved in this generation of the Bulldozer architecture. Energy efficiency was greatly increased in the following generation, Excavator, which exceeded Jaguar in performance per watt, and approximately doubled performance/watt over Steamroller (for example 20.74 pt/W vs 10.85 pt/W when comparing similar mobile APUs using rough arbitrary metrics).[43][44]

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.