Two core AI workload types
Agent instances and inference instances together cover the full AI application stack
Agent instances

Containerized: Linux containers with an Ubuntu desktop—one container, one Agent instance.

Ready to use: Full Ubuntu desktop and isolated workspace, with a browser and common tools preinstalled.

Secure isolation: Dedicated storage and network; CPU and memory quotas supported.

Self-service: Access the desktop in a browser and configure agents, models, and IM channels on your own.

Inference instances

Containerized: Run LLM stacks such as Ollama and vLLM as standard container instances.

GPU / NPU: Single GPU, multi-GPU, or shared GPUs with MPS / HAMi partitioning; supports NVIDIA, AMD, and Huawei Ascend.

Fast model delivery: Model artifacts pre-staged on hosts for sub-second attach; curated open-weight model datasets built in.

Frameworks: vLLM, Ollama, and ComfyUI for inference, serving, and image generation.

Agent + LLM deployment patterns
Mix and match by cost, data sovereignty, and model capability
No-GPU pattern
Deploy Agents at scale on regular servers; all model calls go to hosted LLM APIs via tokens.

Benefits:no GPU capital expense; pay only for token usage.

On-premises pattern
Agents on standard servers; GPUs host Ollama / vLLM with open models inside the network. Agents consume private inference tokens only.

Benefits:data stays on the LAN; no public API token charges.

Hybrid pattern
Use both on-prem inference and online LLM APIs; intelligent routing sends easy prompts to local models and escalates complex ones to the cloud.

Benefits:balance cost against the best available model quality.

Platform strengths

Instant availability

Pre-staged models + containers

Instances ready in seconds

Secure & governed

Network and storage isolation

Layered resource quotas

Cost efficient

GPU sharing and partitioning

Maximize utilization

Open & compatible

Leading LLM frameworks

Full API surface

Technical support

Technical support

Scan to join the technical support WeChat group

Scan the code to join the technical support WeChat group


Official account

Official account

Scan the code to follow and get the latest updates