AI Cloud | Yunion OneCloud

Two core AI workload types

Agent instances and inference instances together cover the full AI application stack

Agent instances

•Containerized: Linux containers with an Ubuntu desktop—one container, one Agent instance.

•Ready to use: Full Ubuntu desktop and isolated workspace, with a browser and common tools preinstalled.

•Secure isolation: Dedicated storage and network; CPU and memory quotas supported.

•Self-service: Access the desktop in a browser and configure agents, models, and IM channels on your own.

Inference instances

•Containerized: Run LLM stacks such as Ollama and vLLM as standard container instances.

•GPU / NPU: Single GPU, multi-GPU, or shared GPUs with MPS / HAMi partitioning; supports NVIDIA, AMD, and Huawei Ascend.

•Fast model delivery: Model artifacts pre-staged on hosts for sub-second attach; curated open-weight model datasets built in.

•Frameworks: vLLM, Ollama, and ComfyUI for inference, serving, and image generation.

Agent + LLM deployment patterns

Mix and match by cost, data sovereignty, and model capability

No-GPU pattern

Deploy Agents at scale on regular servers; all model calls go to hosted LLM APIs via tokens.

Benefits:no GPU capital expense; pay only for token usage.

On-premises pattern

Agents on standard servers; GPUs host Ollama / vLLM with open models inside the network. Agents consume private inference tokens only.

Benefits:data stays on the LAN; no public API token charges.

Hybrid pattern

Use both on-prem inference and online LLM APIs; intelligent routing sends easy prompts to local models and escalates complex ones to the cloud.

Benefits:balance cost against the best available model quality.

Platform strengths

Instant availability

Pre-staged models + containers

Instances ready in seconds

Network and storage isolation

Layered resource quotas

GPU sharing and partitioning

Maximize utilization

Leading LLM frameworks

Full API surface

Technical support

Scan the code to join the technical support WeChat group

Official account

Scan the code to follow and get the latest updates