LM Studio acts as the AI engine for ApexSpriteAI. It loads the language model you choose, exposes a local HTTP server, and accepts requests in the Anthropic messages format. Before Claude Code can make useful requests, you need to configure a handful of server and model parameters in LM Studio. This page covers each setting, explains why it matters, and shows you how to test your configuration end-to-end.Documentation Index
Fetch the complete documentation index at: https://docs-apexspriteai.reliatrack.org/llms.txt
Use this file to discover all available pages before exploring further.
Key settings overview
| Setting | Recommended value | Where to change |
|---|---|---|
| Server port | 1234 | LM Studio → Local Server |
| Bind address | 0.0.0.0 | LM Studio → Local Server |
| Context window | 32,000 – 64,000 tokens | LM Studio → Model Settings |
| Temperature | 0.2 – 0.4 for coding | LM Studio → Model Settings |
Server port
LM Studio’s default server port is1234. ApexSpriteAI expects this port unless you override ANTHROPIC_BASE_URL in your config. If you change the port, update ANTHROPIC_BASE_URL to match.
Bind address
Set the bind address to0.0.0.0 so that LM Studio accepts requests from any network interface, including Tailscale. If you leave it at 127.0.0.1, only processes on the same machine can reach the server. See Network configuration for more detail on when this matters.
Context window
The context window is the maximum number of tokens — roughly words and punctuation marks — that the model can hold in memory at once. It includes your prompt, the conversation history, any tool call results, and the model’s reply.Why context window size matters
A larger context window lets the model read more of your codebase at once, retain longer conversation histories, and process large tool outputs without truncation. However, larger windows consume more GPU memory and increase the time needed to process each token.Recommended sizes
For Qwen2.5-Coder-32B on a 128 GB GPU:- 32,000 tokens — Fast responses, suitable for most coding sessions.
- 64,000 tokens — Slower but handles large files and long conversations without truncation.
Increasing the context window beyond what your hardware can comfortably hold causes LM Studio to offload layers to CPU RAM, which significantly reduces throughput. Start at 32k and increase only if you find responses are being cut off.
How to set it
In LM Studio, load your model and open Model Settings. Find Context Length and enter your target value. Changes take effect the next time the model is loaded.Loading a model
Open the Models tab in LM Studio
Use the search bar to find the model you want. ApexSpriteAI works best with models in the 32B–70B range. See Optimize speed and performance for a full comparison.
Download the model
Click Download. Large models (32B at Q4 quantization) are roughly 18–20 GB. Ensure you have sufficient disk space before starting.
Load the model
Click Load after the download completes. LM Studio allocates GPU memory and displays a green status indicator when the model is ready.
Testing the server with a direct request
Before connecting Claude Code, verify that LM Studio is responding correctly by sending a test request from your terminal. ApexSpriteAI uses the Anthropic messages format at the/v1/messages endpoint.
content array with your expected text, the server is working and you can proceed to configure Claude Code. If you receive a connection error, check that the server is started and that the bind address and port match your request URL.