I received the small with 14 CPU cores, 20 GPU cores, 64 GB of RAM, and 2TB of storage. I'm very glad I didn't choose a model with less RAM because I wouldn't have been able to run massive language models locally.
I can now use LMStudio to run both the standard and coder versions of Qwen2.5, each with 32 billion parameters. The inference speed is quite good, around 11-12 tokens per second, which is adequate for real-world jobs. It's excellent that I can maintain both models in memory at all times, avoiding delays between responses.
I utilize LLMs frequently, and I now prefer to use local models for regular chores and coding—I even converted from Github Copilot.
I only use premium models on OpenRouter when the Qwen models do not produce the desired results.
Does anyone else run LLMs locally on Apple Silicon Macs?
I can now use LMStudio to run both the standard and coder versions of Qwen2.5, each with 32 billion parameters. The inference speed is quite good, around 11-12 tokens per second, which is adequate for real-world jobs. It's excellent that I can maintain both models in memory at all times, avoiding delays between responses.
I utilize LLMs frequently, and I now prefer to use local models for regular chores and coding—I even converted from Github Copilot.
I only use premium models on OpenRouter when the Qwen models do not produce the desired results.
Does anyone else run LLMs locally on Apple Silicon Macs?