We currently only support the following model for serverless training. We are actively adding support for both larger and smaller models. If there’s a particular model you’d like to see serverless support for, please send a request to support@wandb.com.
ART has wide support for models supported by vLLM. However, not all models support all features. For instance, if a model’s chat template does not include tool call support, you won’t be able to use tools with it natively. And if a model’s architecture doesn’t have support for LoRA layers, it won’t be compatible with our LoRA-based backend, but still may work with our full-fine-tuning backend.Here are additional models that we’ve tested and found to work well with ART:
Additionally, the Qwen 3 family of models is well supported for single-turn workflows. For multi-turn workflows the Qwen 3 chat template removes the <think> tokens from previous turns, which makes training more complicated. It is still possible to use for multi-turn workflows by splitting each turn into a separate message history with our additional_histories trajectory parameter (see Additional Histories).
If you’re curious about a model that is not listed above, ask in the Discord #support channel.