CPU 访问缓存和主存的延迟大概是多少

用 adia64 这类软件测的是 L1 1-2ns (3-5 clock),L2 2-3.5ns(6-10clock),L3 10-15ns (20-50 clock), 主存 30-50ns(60-150clock) 但是一些书例如现代操作系统上写的 L1 是瞬间访问, L2 两三个时钟,而另一些书又和上面的数据更加接近

xenme

2019-01-02 13:33:19 +08:00

https://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory

Core i7 Xeon 5500 Series Data Source Latency (approximate) [Pg. 22]

local L1 CACHE hit, ~4 cycles ( 2.1 - 1.2 ns )
local L2 CACHE hit, ~10 cycles ( 5.3 - 3.0 ns )
local L3 CACHE hit, line unshared ~40 cycles ( 21.4 - 12.0 ns )
local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 - 19.5 ns )
local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 - 22.5 ns )

remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 - 30.0 ns )

local DRAM ~60 ns
remote DRAM ~100 ns

yanaraika

2019-01-02 13:41:55 +08:00

http://instlatx64.atw.hu/ MemLatX64 有更精确的数据

ryd994

2019-01-02 13:45:18 +08:00

寄存器才是指令直接访问

29EtwXn6t5wgM3fD

2019-01-02 13:57:59 +08:00

https://www.7-cpu.com/cpu/Cortex-A57.html

AMD Opteron A1170 (ARM Cortex-A57), 2.0 GHz, 28 nm. RAM: 16 GB. (Probably it's SoftIron Overdrive 3000 server, DDR3 RDIMM).

L1 Data cache = 32 KB, 64 B/line, 2-WAY.
L1 Instruction cache = 48 KB, 64 B/line, 3-WAY.
L2 Cache = 1 MB (per 2 cores), 64 B/line, 16-WAY.
L3 Cache = 8 MB (per 8 cores), 64 B/line, ?-WAY.

L1 Data Cache Latency = 4 cycles for simple access via pointer
L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]).
L2 Cache Latency = 18 cycles
L3 Cache Latency = 60 cycles
RAM Latency = 60 cycles + 124 ns

29EtwXn6t5wgM3fD

2019-01-02 14:00:07 +08:00

https://www.7-cpu.com/cpu/Skylake_X.html

Intel i7-7820X (Skylake X), 8 cores, 4.3 GHz (Turbo Boost), Mesh 2.4 GHz, 14 nm. RAM: 4x 8 GB DDR4-3400 16-18-18-36.

L1 Data cache = 32 KB, 64 B/line, 8-WAY
L1 Instruction cache = 32 KB, 64 B/line, 8-WAY.
L2 cache = 1024 KB, 64 B/line, 16-WAY
L3 cache = 11 MB, 64 B/line, 11-WAY

L1 Data Cache Latency = 4 cycles for simple access via pointer
L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]).
L2 Cache Latency = 14 cycles
L3 Cache Latency = 68 cycles (3.6 GHz)
L3 Cache Latency = 79 cycles (4.3 GHz) (77-81 cycles for different cores)
RAM Latency = 79 cycles + 50 ns

不管是 ARM 还是 x86 都需要 4/5 个时钟

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/523069

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.