Buyer's remorse hit after shelling out 8 grand for m2 ultra

Posted on May 7, 2024 by Anon

Need more ram for llama3 70b inference, cant afford a100s being more expensive than my fking car. Anyways ordered m2 ultra from apple for 8 grand to get the 192 gb unified ram.

I'm sure it's suboptimal compared to Nvidia builds e.g 2x4090, in terms of performance, no cuda training and what not, but the 2-3k markup for custom build making no sense to me. On the other hand rental for GPU server with enough ram for llama3 seems around 5usd hour and I'm fed up with BS AWS bills of hundred dollars per month for instances that are not even running.

Thinking about canceling the order and look further. What would you get with 8 grand for local llm?

Shopping Cart Returner Shirt $21.68

Nothing Ever Happens Shirt $21.68

Shopping Cart Returner Shirt $21.68

2 weeks ago

Reply

Anonymous

iToddlers BTFO
- 2 weeks ago
  
  Reply
  
  Anonymous
2 weeks ago

Reply

Anonymous

what are you doing with a local llm that requires this?
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  Code gen
2 weeks ago

Reply

Anonymous

What is your business model?
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  More AI, faster prototyping, less codes to write/maintain
  - 2 weeks ago
    
    Reply
    
    Anonymous
    
    >*slowly writing 'Bankruptcy' in comically large letters on a clipboard*
    mhm yes i see
    - 2 weeks ago
      
      Reply
      
      Anonymous
2 weeks ago

Reply

Anonymous

itodeleres butfo
2 weeks ago

Reply

Anonymous

why are you running llama 3 local?
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  Lower error rates
  Tasks with contexts of proprietary stuff
2 weeks ago

Reply

Anonymous

I have one. Inference speed sucks but MoE models make up for that while still benefiting from the large amount of memory. It’s nice that it can sit on a shelf and be quiet and cool and sip power. Whether or not that’s still worth $8k is up to you.
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  Thanks for sharing your thought, yea the ram seems generous enough for quite a few good models with f16.
  
  speed wise, some report it get 2 t/s on larger models, which crunches about 100k t/day worth of task queue.
  
  Not sure if yours saw better speed than that
2 weeks ago

Reply

Anonymous

Write it up as a company/department expense.
2 weeks ago

Reply

Anonymous

wouldn't a modern server cpu be faster and cheaper? Main bottleneck is ram speed when doing it on the cpu I am pretty sure you can buy some very high quality ram for 8k
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  unlikely, because gpu offloading
  
  >he spent 8k on iShit for an LLM build
  >only 192
  >could have bought a server rack with 30 P40s and 720GB of VRAM
  
  dam, a 30xP40s grid would make quite the sight to behold, which presumably purrs like a ford GT
  
  It's not all bad, you could spend another $10K, get your dual 4090 system and another mac studio. I saw that multiple mac studios connected via Thunderbolt ethernet tunneling could theoretically provide a way larger vram pool...but the software isn't there yet. lol.
  
  https://x.com/ronaldmannak/status/1784087769817756138
  
  Twitter guy is the embodiment cyberpunk
  - 2 weeks ago
    
    Reply
    
    Anonymous
    
    >unlikely, because gpu offloading
    on the m2 ultra? I doubt that I want to see a benchmark of someone comparing a thread ripper with the m2 ultra gpu for matrix multiplication. Modern processors should get quite close if competing against something integrated.
    - 2 weeks ago
      
      Reply
      
      Anonymous
      
      >Modern processors should get quite close if competing against something integrated.
      bullshit.
      a GPU will have like a magnitude more ALUs than a CPU.
      https://github.com/ggerganov/llama.cpp/discussions/4167
      >m2 ultra prompt processing llama 7b f16
      >1401.85 tokens/second
      >5800X3D prompt processing llama 7b f16
      >30.69 tokens/second
      m2 ultra is 46x as fast as a 5800X3D for matmul.
      - 2 weeks ago
        
        Anonymous
        
        this doesn't make much sense to me why the frick would modern cpus be that much slower? I was expecting maybe x10 times as fast not x49
2 weeks ago

Reply

Anonymous

>he spent 8k on iShit for an LLM build
>only 192
>could have bought a server rack with 30 P40s and 720GB of VRAM
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  >30 used p40s
  useless. will fail < 1 year
2 weeks ago

Reply

Anonymous

I think OP is just low key trolling
2 weeks ago

Reply

Anonymous

It's not all bad, you could spend another $10K, get your dual 4090 system and another mac studio. I saw that multiple mac studios connected via Thunderbolt ethernet tunneling could theoretically provide a way larger vram pool...but the software isn't there yet. lol.

https://x.com/ronaldmannak/status/1784087769817756138
2 weeks ago

Reply

Anonymous

Sometimes less is more. Why exactly can't you just get:
>a desktop PC with a single 4090
>a macbook
>a thinkpad
If you are on the move, just ssh into your desktop or however that works with llm im sure you can mount it or whatever technique people use
https://twitter.com/kellerjordan0/status/1765649009891328253
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  Already have the rest.
  Twitter post sounds promising, except as of now 100GB of weights still take 100GB of ram to load, before they publish the million dollar software to fit it into a 24gb 4090
  
  Did you consult /llm/ local ml build autist?
  
  https://rentry.org/V100MAXXING#pcie
  
  He has multiple build guides that perform better but for lile $1k to $2k
  
  Its a lot of running around ebay and obscure vendors a lot of the time
  
  https://rentry.org/miqumaxx
  This is his CPUMAXX'ing guide to essentially build what you buy from apple for cheaper
  
  Interesting guide, thanks, might spend 1~2k playing around with retired server parts in the future but can't count on them for production
2 weeks ago

Reply

Anonymous

Did you consult /llm/ local ml build autist?

https://rentry.org/V100MAXXING#pcie

He has multiple build guides that perform better but for lile $1k to $2k

Its a lot of running around ebay and obscure vendors a lot of the time

https://rentry.org/miqumaxx
This is his CPUMAXX'ing guide to essentially build what you buy from apple for cheaper
2 weeks ago

Reply

Anonymous

i mean, an amd/inhell laptop with so much ram could cost you 2-3k... so, yeah, you probably did frick up.
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  the difference is on mac it's VRAM also, there is nothing with this much VRAM in consumer space.
2 weeks ago

Reply

Anonymous
2 weeks ago

Reply

Anonymous

macs are ultra shit for LLM because of dogshit prompt processing speed. Inference speed might be as well be instant, but you still gonna wait a few minutes for your 10k context prompt to process first. Also in case you decide you want something more than LLM - macs can't do anything else. Image gen, audio gen, text to speech, nothing works with apple gpus, it's all cuda or bust.
2 weeks ago

Reply

Anonymous

You are a fricking moron.
2 weeks ago

Reply

Anonymous

why its so fricking hard to build your own 2xGPU rig? Are you gay? Did your dad not buy you legos? Did you even have a dad?
2 weeks ago

Reply

Anonymous

https://pcpartpicker.com/list/zFHxMV
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  Thanks not op but ive been looking for a consumercuckmaxx build and this is it
2 weeks ago

Reply

Anonymous

what's the actual use case of the top end expensive macs? like the 10k+ ones?
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  always thought it was for people that make tons of content. doesn't make sense to me but i don't do that work, and for some people money is no object
2 weeks ago

Reply

Anonymous

>the 2-3k markup for custom build making no sense to me
and the 6k markup for apple logo did?

Cancel reply