TGOcc
V2EX  ›  Apple

Mac book air M5 32G+1TB 能跑本地大模型?

  •  
  •   TGOcc ·
    PRO
    · 1h 10m ago · 87 views

    先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。

    装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB

    附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考

    Qwen3.5-4B-MLX-4bit 2.85GB

    gemma-4-26b-a4b-it-4bit 14.57GB

    Qwen3.6-35B-A3B-4bit 15.13GB

    GLM-4.7-Flash-4bit 15.71GB

    gpt-oss-20b-MXFP4-Q8 11.27GB

    oMLX - LLM inference, optimized for your Mac
    
    Benchmark Model: Qwen3.5-4B-MLX-4bit
    ================================================================================
    Single Request Results
    --------------------------------------------------------------------------------
    Test             TTFT(ms)    TPOT(ms)        pp TPS        tg TPS    E2E(s)    Throughput    Peak Mem
    pp1024/tg128       1001.6       22.74  1022.4 tok/s    44.3 tok/s     3.889   296.2 tok/s     3.29 GB
    pp4096/tg128       3540.9       23.76  1156.8 tok/s    42.4 tok/s     6.558   644.1 tok/s     3.90 GB
    
    Continuous Batching
    pp1024 / tg128
    --------------------------------------------------------------------------------
    Batch         tg TPS    Speedup          pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
    1x        44.3 tok/s      1.00x    1022.4 tok/s  1022.4 tok/s      1001.6       3.889
    2x        88.3 tok/s      1.99x     407.6 tok/s   203.8 tok/s      3040.1       7.924
    4x       175.1 tok/s      3.95x     322.7 tok/s    80.7 tok/s      6833.9      15.617
    
    
    Benchmark Model: gemma-4-26b-a4b-it-4bit
    ================================================================================
    Single Request Results
    --------------------------------------------------------------------------------
    Test             TTFT(ms)    TPOT(ms)        pp TPS        tg TPS    E2E(s)    Throughput    Peak Mem
    pp1024/tg128       1500.5       24.21   682.4 tok/s    41.6 tok/s     4.575   251.8 tok/s    14.23 GB
    pp4096/tg128       4863.4       25.14   842.2 tok/s    40.1 tok/s     8.056   524.3 tok/s    14.91 GB
    
    Continuous Batching
    pp1024 / tg128
    --------------------------------------------------------------------------------
    Batch         tg TPS    Speedup          pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
    1x        41.6 tok/s      1.00x     682.4 tok/s   682.4 tok/s      1500.5       4.575
    2x        82.5 tok/s      1.98x     361.6 tok/s   180.8 tok/s      3495.8       8.767
    4x       166.1 tok/s      3.99x     283.4 tok/s    70.8 tok/s      7840.6      17.536
    
    
    Benchmark Model: Qwen3.6-35B-A3B-4bit
    ================================================================================
    Single Request Results
    --------------------------------------------------------------------------------
    Test             TTFT(ms)    TPOT(ms)        pp TPS        tg TPS    E2E(s)    Throughput    Peak Mem
    pp1024/tg128       1676.1       17.20   610.9 tok/s    58.6 tok/s     3.860   298.4 tok/s    18.80 GB
    pp4096/tg128       5046.3       17.93   811.7 tok/s    56.2 tok/s     7.323   576.8 tok/s    19.24 GB
    
    Continuous Batching
    pp1024 / tg128
    --------------------------------------------------------------------------------
    Batch         tg TPS    Speedup          pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
    1x        58.6 tok/s      1.00x     610.9 tok/s   610.9 tok/s      1676.1       3.860
    2x       116.2 tok/s      1.98x     435.5 tok/s   217.8 tok/s      2973.7       6.907
    4x       230.7 tok/s      3.94x     352.0 tok/s    88.0 tok/s      6445.2      13.855
    
    
    Benchmark Model: GLM-4.7-Flash-4bit
    ================================================================================
    Single Request Results
    --------------------------------------------------------------------------------
    Test             TTFT(ms)    TPOT(ms)        pp TPS        tg TPS    E2E(s)    Throughput    Peak Mem
    pp1024/tg128       1985.0       21.78   515.9 tok/s    46.3 tok/s     4.752   242.4 tok/s    16.27 GB
    pp4096/tg128       6839.2       27.31   598.9 tok/s    36.9 tok/s    10.307   409.8 tok/s    17.34 GB
    
    Continuous Batching
    pp1024 / tg128
    --------------------------------------------------------------------------------
    Batch         tg TPS    Speedup          pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
    1x        46.3 tok/s      1.00x     515.9 tok/s   515.9 tok/s      1985.0       4.752
    2x        91.5 tok/s      1.98x     362.7 tok/s   181.3 tok/s      3549.9       8.445
    4x       174.9 tok/s      3.78x     321.2 tok/s    80.3 tok/s      6393.9      15.679
    
    
    Benchmark Model: gpt-oss-20b-MXFP4-Q8
    ================================================================================
    Single Request Results
    --------------------------------------------------------------------------------
    Test             TTFT(ms)    TPOT(ms)        pp TPS        tg TPS    E2E(s)    Throughput    Peak Mem
    pp1024/tg128       1687.6       24.70   606.8 tok/s    40.8 tok/s     4.824   238.8 tok/s    11.67 GB
    pp4096/tg128       4088.8       26.44  1001.8 tok/s    38.1 tok/s     7.446   567.3 tok/s    11.75 GB
    
    Continuous Batching
    pp1024 / tg128
    --------------------------------------------------------------------------------
    Batch         tg TPS    Speedup          pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
    1x        40.8 tok/s      1.00x     606.8 tok/s   606.8 tok/s      1687.6       4.824
    2x        82.1 tok/s      2.01x     359.0 tok/s   179.5 tok/s      3489.1       8.822
    4x       159.5 tok/s      3.91x     293.2 tok/s    73.3 tok/s      7335.0      17.180
    
    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   4081 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 24ms · UTC 05:23 · PVG 13:23 · LAX 22:23 · JFK 01:23
    ♥ Do have faith in what you're doing.