m1 有原生 numpy scipy 了

https://github.com/conda-forge/miniforge

先下载对应版本的 Miniforge3, ====> OS X arm64 (Apple Silicon)

装上之后就有 conda 了,conda 里面装 numpy,scipy 什么的都是原生的

性能提升很大无论对比 Rosetta 2 还是 intel i9

YUX

2020-12-09 20:38:09 +08:00

@IgniteWhite 太超前啦😂确实是个好东西

Tilie

2020-12-09 20:54:48 +08:00

8 代 i7 mac mini
Dotted two 4096x4096 matrices in 0.76 s.
Dotted two vectors of length 524288 in 0.09 ms.
SVD of a 2048x1024 matrix in 0.56 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 5.20 s.

YUX

2020-12-09 21:03:39 +08:00

Google Colab - 2 Intel(R) Xeon(R) CPU @ 2.20GHz

Dotted two 4096x4096 matrices in 4.16 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 1.49 s.
Cholesky decomposition of a 2048x2048 matrix in 0.23 s.
Eigendecomposition of a 2048x2048 matrix in 13.11 s.

zr86

2020-12-09 21:14:01 +08:00

M1 Mac mini

Dotted two 4096x4096 matrices in 0.69 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.68 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 4.82 s.

wydinhk

2020-12-09 22:21:48 +08:00

M1 MacBook Pro

Dotted two 4096x4096 matrices in 0.68 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.71 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 5.03 s.

同时用 powermetrics 测量功耗，前两项约 26W，后三项约 16W

lovestudykid

2020-12-10 03:17:17 +08:00

这个测试拉不开差距
MF839，只是比楼主的 M1 慢了一倍
Dotted two 4096x4096 matrices in 2.33 s.
Dotted two vectors of length 524288 in 0.54 ms.
SVD of a 2048x1024 matrix in 1.05 s.
Cholesky decomposition of a 2048x2048 matrix in 0.20 s.
Eigendecomposition of a 2048x2048 matrix in 8.38 s.

Intel(R) Xeon(R) Gold 6134
Dotted two 4096x4096 matrices in 0.32 s.
Dotted two vectors of length 524288 in 0.05 ms.
SVD of a 2048x1024 matrix in 0.89 s.
Cholesky decomposition of a 2048x2048 matrix in 0.15 s.
Eigendecomposition of a 2048x2048 matrix in 8.19 s.
Anaconda 默认安装的 numpy 版本没有用 mkl，也没有开启 avx512，这个 cpu 是浪费了

pubby

2020-12-10 10:01:09 +08:00

3700X 黑苹果

Dotted two 4096x4096 matrices in 0.46 s.
Dotted two vectors of length 524288 in 0.08 ms.
SVD of a 2048x1024 matrix in 7.37 s.
Cholesky decomposition of a 2048x2048 matrix in 0.82 s.
Eigendecomposition of a 2048x2048 matrix in 49.05 s.

This was obtained using the following Numpy configuration:
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3', '-I/AppleInternal/BuildRoot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.Internal.sdk/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3)]
atlas_blas_threads_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3']
define_macros = [('NO_ATLAS_INFO', 3)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE

使用姿势不太对....

bnuliujing

2020-12-10 10:18:09 +08:00

i7-6950X 的成绩

Dotted two 4096x4096 matrices in 0.35 s.
Dotted two vectors of length 524288 in 0.03 ms.
SVD of a 2048x1024 matrix in 0.27 s.
Cholesky decomposition of a 2048x2048 matrix in 0.10 s.
Eigendecomposition of a 2048x2048 matrix in 3.39 s.

NoobX

2020-12-10 11:05:02 +08:00

Mac Mini i5 款的成绩

Dotted two 4096x4096 matrices in 0.58 s.
Dotted two vectors of length 524288 in 0.08 ms.
SVD of a 2048x1024 matrix in 0.32 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 3.30 s.

M1 成绩印象也不太深刻。。。
不过 16G 内存依旧是一个大问题，系统一般自己就吃掉 4G，16G 只有 12G 放 dataset，老实讲对我不太够用
处理器慢点问题不大，swap 吃满了，那速度是真的噩梦

MisakaTian

2020-12-10 11:58:25 +08:00

数据狗表示 anaconda 搞定就上

Goldilocks

2020-12-10 12:06:11 +08:00

Processor Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz, 3600 Mhz, 4 Core

Dotted two 4096x4096 matrices in 0.33s ，比 m1 快一倍。但是 m1 是 8 核哦。所以同等频率同样核数，intel 还是要比 m1 快 3-4 倍左右，这还是 3 年前的产品。

YUX

2020-12-10 12:12:50 +08:00

@MisakaTian 用 mamba 啊

Goldilocks

2020-12-10 12:18:45 +08:00

现在是 2020 年。Intel 如果出个 2 核 3.6G 的 cpu，你肯定看不上它的性能。你要想的是 Intel 10 核、20 核。马上 AMD 都要发布 64 核桌面 CPU 了，apple 还停留在 2 核的水准。

meloyang05

2020-12-10 13:35:48 +08:00

@Goldilocks

“8 代 i7 mac mini
Dotted two 4096x4096 matrices in 0.76 s.
Dotted two vectors of length 524288 in 0.09 ms.
SVD of a 2048x1024 matrix in 0.56 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 5.20 s.

M1 Mac mini

Dotted two 4096x4096 matrices in 0.69 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.68 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 4.82 s.”

你选择性无视其他测试成绩么。。时间在 ms 级别本来误差就可能很大，也可能是 numpy for m1 现在有 bug，你单独拎 vector 的成绩出来能说明什么问题？

Goldilocks

2020-12-10 13:38:09 +08:00

误差不会很大，一般都在 1%以内。因为矩阵乘法就受两个限制：

1. CPU flops
2. 内存带宽

Goldilocks

2020-12-10 13:45:33 +08:00

像矩阵乘法这样的数值计算是很成熟的领域，大家都研究的很透了。请参见这个： https://en.wikichip.org/wiki/flops

假设内存带宽能跟得上 cpu 的速度，要么要想跑的更快，就只有：
1. 增加核数
2. 增加 SIMD 的长度

比如 skylake 可以做到 64 FLOPs/cycle，但是同时代的 AMD CPU 只有 16 FLOPs/cycle 。大家主频都差不多，这其中的 4 倍就造成了主要的差距。而且这种差距很难追赶上，可以说一辈子都没希望。

Harry1993

2020-12-10 14:08:58 +08:00

用 Apple 的 numpy ( https://github.com/apple/tensorflow_macos)試了一下：

Dotted two 4096x4096 matrices in 0.84 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.54 s.
Cholesky decomposition of a 2048x2048 matrix in 0.06 s.
Eigendecomposition of a 2048x2048 matrix in 6.29 s.

FurN1

2020-12-10 23:07:30 +08:00

@MisakaTian miniforge 的包管理器不就是 conda 么…只是默认 channel 是 conda-forge

lly0514

2020-12-11 15:35:01 +08:00

@Goldilocks 实际上误差非常大，我实测 MKL vs openblas 的性能差距有一倍多

Richardyyz

2020-12-13 09:58:14 +08:00

@Goldilocks ZEN2 都已经 32 FLOPs/cycle 了，你这一辈子这么短吗？降频严重的 AVX512 并没有在 ZEN3 面前有多么大的优势。

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/733777

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.