m1 有原生 numpy scipy 了

https://github.com/conda-forge/miniforge

先下载对应版本的 Miniforge3, ====> OS X arm64 (Apple Silicon)

装上之后就有 conda 了,conda 里面装 numpy,scipy 什么的都是原生的

性能提升很大无论对比 Rosetta 2 还是 intel i9

pb941129

2020-12-09 15:39:45 +08:00

想知道对比 Intel i9 mkl 版 numpy 提升多少……

NoobX

2020-12-09 16:42:16 +08:00

然而 16g 封顶...

Goldilocks

2020-12-09 16:45:04 +08:00

期待 benchmark，估计被 avx512 吊打

felixcode

2020-12-09 19:43:51 +08:00

显存比你内存大

YUX

2020-12-09 19:49:07 +08:00

@pb941129
@NoobX
@Goldilocks
@felixcode

找到了个 numpy 性能脚本跑了一下 https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276

```
Dotted two 4096x4096 matrices in 0.53 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.59 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 4.74 s.

This was obtained using the following Numpy configuration:
blas_info:
libraries = ['cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
include_dirs = ['/Users/yux/miniforge3/envs/maths/include']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
libraries = ['cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
include_dirs = ['/Users/yux/miniforge3/envs/maths/include']
language = c
lapack_info:
libraries = ['lapack', 'blas', 'lapack', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
language = f77
lapack_opt_info:
libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
language = c
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/yux/miniforge3/envs/maths/include']
`
```

p.s. python 版本 3.9.1 -arm64 跑的时候关掉了所有后台

pb941129

2020-12-09 19:58:15 +08:00

@YUX Thx 这是我 16 寸 MBP i9 款跑出来的结果。没有关后台。环境 anaconda 3.8 。看上去比 M1 还是快一点的。（不然 Intel 真的要哭）

```
Dotted two 4096x4096 matrices in 0.45 s.
Dotted two vectors of length 524288 in 0.05 ms.
SVD of a 2048x1024 matrix in 0.32 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 3.53 s.

This was obtained using the following Numpy configuration:
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']

```

changepc90

2020-12-09 20:12:20 +08:00

M1:Dotted two vectors of length 524288 in 0.25 ms
MBP16:Dotted two vectors of length 524288 in 0.05 ms.
这一项差的好多啊。

YUX

2020-12-09 20:13:27 +08:00

@pb941129 不错还是 i9 强😂 是不是跑的时候 8 核 16 线程都占满了

YUX

2020-12-09 20:15:42 +08:00

@changepc90 这应该就是指令集差异造成的叭

Aspector

2020-12-09 20:19:41 +08:00

T480s 上的 i7 8550u，库是 mkl_rt

Dotted two 4096x4096 matrices in 1.07 s.
Dotted two vectors of length 524288 in 0.13 ms.
SVD of a 2048x1024 matrix in 0.53 s.
Cholesky decomposition of a 2048x2048 matrix in 0.15 s.
Eigendecomposition of a 2048x2048 matrix in 5.07 s.

用 HWMonitor 读出来 8550u 的实时功耗大概在 40-45W，M1 应该才 20W 吧（悲

YUX

2020-12-09 20:21:59 +08:00

分享一下朋友的 16inch 2.6 GHz 6-Core Intel Core i7

Dotted two 4096x4096 matrices in 0.49 s.
Dotted two vectors of length 524288 in 0.05 ms.
SVD of a 2048x1024 matrix in 0.32 s.
Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
Eigendecomposition of a 2048x2048 matrix in 3.16 s.

YUX

2020-12-09 20:24:36 +08:00

@Aspector air 的 m1 限制在 10 瓦😂

pb941129

2020-12-09 20:25:33 +08:00

@YUX 没看任务，不过以我对 numpy 尿性的理解，不至于不至于。可以等 lightgbm 适配了然后一起跑跑 CPU 版本（当时跑一个小项目找最优参数跑满整个 8700k 三小时

rock_cloud

2020-12-09 20:25:53 +08:00

2017 iMac 3.4Ghz Intel i5
Dotted two 4096x4096 matrices in 1.04 s.
Dotted two vectors of length 524288 in 0.17 ms.
SVD of a 2048x1024 matrix in 0.58 s.
Cholesky decomposition of a 2048x2048 matrix in 0.12 s.
Eigendecomposition of a 2048x2048 matrix in 5.37 s.
没关任何后台

YUX

2020-12-09 20:26:54 +08:00

@pb941129 烤鸡仨小时啊我能在冰箱里测么😂 没风扇怕烤糊了

sxd96

2020-12-09 20:31:25 +08:00

18 年 13 寸 MBP i5-8259U

Dotted two 4096x4096 matrices in 0.80 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.35 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 3.39 s.

sxd96

2020-12-09 20:35:06 +08:00

@sxd96 感觉心里平衡了一点点，也是没关后台，mkl 库。但是我发现在核心满负载的情况下，MBP 会有一点点电啸声。虽然现在 ARM 在这上面可能差了一点点，但是如果算能效比，可能并不差。我觉得移动设备重要的还是能效比。

Gandum

2020-12-09 20:35:15 +08:00

还是初步版本。不过现在是冬天还不用急，风扇不太吵。明年夏天再买。

IgniteWhite

2020-12-09 20:35:29 +08:00

哈哈我五个月前发帖讲过啦 /t/688402

rock_cloud

2020-12-09 20:36:02 +08:00

Intel Xeon Silver 4114 2.2Ghz
Dotted two 4096x4096 matrices in 0.60 s.
Dotted two vectors of length 524288 in 0.04 ms.
SVD of a 2048x1024 matrix in 0.66 s.
Cholesky decomposition of a 2048x2048 matrix in 0.26 s.
Eigendecomposition of a 2048x2048 matrix in 6.67 s.

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/733777

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.