Buyer's remorse hit after shelling out 8 grand for m2 ultra

Need more ram for llama3 70b inference, cant afford a100s being more expensive than my fking car. Anyways ordered m2 ultra from apple for 8 grand to get the 192 gb unified ram.

I'm sure it's suboptimal compared to Nvidia builds e.g 2x4090, in terms of performance, no cuda training and what not, but the 2-3k markup for custom build making no sense to me. On the other hand rental for GPU server with enough ram for llama3 seems around 5usd hour and I'm fed up with BS AWS bills of hundred dollars per month for instances that are not even running.

Thinking about canceling the order and look further. What would you get with 8 grand for local llm?

Shopping Cart Returner Shirt $21.68

Nothing Ever Happens Shirt $21.68

Shopping Cart Returner Shirt $21.68

  1. 2 weeks ago
    Anonymous

    iToddlers BTFO

    • 2 weeks ago
      Anonymous
  2. 2 weeks ago
    Anonymous

    what are you doing with a local llm that requires this?

    • 2 weeks ago
      Anonymous

      Code gen

  3. 2 weeks ago
    Anonymous

    What is your business model?

    • 2 weeks ago
      Anonymous

      More AI, faster prototyping, less codes to write/maintain

      • 2 weeks ago
        Anonymous

        >*slowly writing 'Bankruptcy' in comically large letters on a clipboard*
        mhm yes i see

        • 2 weeks ago
          Anonymous
  4. 2 weeks ago
    Anonymous

    itodeleres butfo

  5. 2 weeks ago
    Anonymous

    why are you running llama 3 local?

    • 2 weeks ago
      Anonymous

      Lower error rates
      Tasks with contexts of proprietary stuff

  6. 2 weeks ago
    Anonymous

    I have one. Inference speed sucks but MoE models make up for that while still benefiting from the large amount of memory. It’s nice that it can sit on a shelf and be quiet and cool and sip power. Whether or not that’s still worth $8k is up to you.

    • 2 weeks ago
      Anonymous

      Thanks for sharing your thought, yea the ram seems generous enough for quite a few good models with f16.

      speed wise, some report it get 2 t/s on larger models, which crunches about 100k t/day worth of task queue.

      Not sure if yours saw better speed than that

  7. 2 weeks ago
    Anonymous

    Write it up as a company/department expense.

  8. 2 weeks ago
    Anonymous

    wouldn't a modern server cpu be faster and cheaper? Main bottleneck is ram speed when doing it on the cpu I am pretty sure you can buy some very high quality ram for 8k

    • 2 weeks ago
      Anonymous

      unlikely, because gpu offloading

      >he spent 8k on iShit for an LLM build
      >only 192
      >could have bought a server rack with 30 P40s and 720GB of VRAM

      dam, a 30xP40s grid would make quite the sight to behold, which presumably purrs like a ford GT

      It's not all bad, you could spend another $10K, get your dual 4090 system and another mac studio. I saw that multiple mac studios connected via Thunderbolt ethernet tunneling could theoretically provide a way larger vram pool...but the software isn't there yet. lol.

      https://x.com/ronaldmannak/status/1784087769817756138

      Twitter guy is the embodiment cyberpunk

      • 2 weeks ago
        Anonymous

        >unlikely, because gpu offloading
        on the m2 ultra? I doubt that I want to see a benchmark of someone comparing a thread ripper with the m2 ultra gpu for matrix multiplication. Modern processors should get quite close if competing against something integrated.

        • 2 weeks ago
          Anonymous

          >Modern processors should get quite close if competing against something integrated.
          bullshit.
          a GPU will have like a magnitude more ALUs than a CPU.
          https://github.com/ggerganov/llama.cpp/discussions/4167
          >m2 ultra prompt processing llama 7b f16
          >1401.85 tokens/second
          >5800X3D prompt processing llama 7b f16
          >30.69 tokens/second
          m2 ultra is 46x as fast as a 5800X3D for matmul.

          • 2 weeks ago
            Anonymous

            this doesn't make much sense to me why the frick would modern cpus be that much slower? I was expecting maybe x10 times as fast not x49

  9. 2 weeks ago
    Anonymous

    >he spent 8k on iShit for an LLM build
    >only 192
    >could have bought a server rack with 30 P40s and 720GB of VRAM

    • 2 weeks ago
      Anonymous

      >30 used p40s
      useless. will fail < 1 year

  10. 2 weeks ago
    Anonymous

    I think OP is just low key trolling

  11. 2 weeks ago
    Anonymous

    It's not all bad, you could spend another $10K, get your dual 4090 system and another mac studio. I saw that multiple mac studios connected via Thunderbolt ethernet tunneling could theoretically provide a way larger vram pool...but the software isn't there yet. lol.

    https://x.com/ronaldmannak/status/1784087769817756138

  12. 2 weeks ago
    Anonymous

    Sometimes less is more. Why exactly can't you just get:
    >a desktop PC with a single 4090
    >a macbook
    >a thinkpad
    If you are on the move, just ssh into your desktop or however that works with llm im sure you can mount it or whatever technique people use
    https://twitter.com/kellerjordan0/status/1765649009891328253

    • 2 weeks ago
      Anonymous

      Already have the rest.
      Twitter post sounds promising, except as of now 100GB of weights still take 100GB of ram to load, before they publish the million dollar software to fit it into a 24gb 4090

      Did you consult /llm/ local ml build autist?

      https://rentry.org/V100MAXXING#pcie

      He has multiple build guides that perform better but for lile $1k to $2k

      Its a lot of running around ebay and obscure vendors a lot of the time

      https://rentry.org/miqumaxx
      This is his CPUMAXX'ing guide to essentially build what you buy from apple for cheaper

      Interesting guide, thanks, might spend 1~2k playing around with retired server parts in the future but can't count on them for production

  13. 2 weeks ago
    Anonymous

    Did you consult /llm/ local ml build autist?

    https://rentry.org/V100MAXXING#pcie

    He has multiple build guides that perform better but for lile $1k to $2k

    Its a lot of running around ebay and obscure vendors a lot of the time

    https://rentry.org/miqumaxx
    This is his CPUMAXX'ing guide to essentially build what you buy from apple for cheaper

  14. 2 weeks ago
    Anonymous

    i mean, an amd/inhell laptop with so much ram could cost you 2-3k... so, yeah, you probably did frick up.

    • 2 weeks ago
      Anonymous

      the difference is on mac it's VRAM also, there is nothing with this much VRAM in consumer space.

  15. 2 weeks ago
    Anonymous
  16. 2 weeks ago
    Anonymous

    macs are ultra shit for LLM because of dogshit prompt processing speed. Inference speed might be as well be instant, but you still gonna wait a few minutes for your 10k context prompt to process first. Also in case you decide you want something more than LLM - macs can't do anything else. Image gen, audio gen, text to speech, nothing works with apple gpus, it's all cuda or bust.

  17. 2 weeks ago
    Anonymous

    You are a fricking moron.

  18. 2 weeks ago
    Anonymous

    why its so fricking hard to build your own 2xGPU rig? Are you gay? Did your dad not buy you legos? Did you even have a dad?

  19. 2 weeks ago
    Anonymous

    https://pcpartpicker.com/list/zFHxMV

    • 2 weeks ago
      Anonymous

      Thanks not op but ive been looking for a consumercuckmaxx build and this is it

  20. 2 weeks ago
    Anonymous

    what's the actual use case of the top end expensive macs? like the 10k+ ones?

    • 2 weeks ago
      Anonymous

      always thought it was for people that make tons of content. doesn't make sense to me but i don't do that work, and for some people money is no object

  21. 2 weeks ago
    Anonymous

    >the 2-3k markup for custom build making no sense to me
    and the 6k markup for apple logo did?

Your email address will not be published. Required fields are marked *