FusionInfer

🔄

Unified Inference Platform

Single CRD to manage inference workloads with automatic LeaderWorkerSet, PodGroup, InferencePool, and HTTPRoute generation.

🧠

Intelligent Routing

Built-in support for prefix-cache, KV-cache utilization, queue-size, LoRA affinity, and P/D disaggregation routing strategies.

⚡

Gang Scheduling

Automatic Volcano PodGroup integration ensures all pods for multi-node inference start together or not at all.

🌐

Multi-Node Inference

Native support for distributed inference with Ray integration and per-replica LeaderWorkerSet management.

🔀

P/D Disaggregation

First-class support for prefill/decode disaggregated architectures with automatic component separation and routing.

🚀

Gateway API Native

Seamless integration with Kubernetes Gateway API and Gateway API Inference Extension for production-grade traffic management.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                            InferenceService CR                              │
│  roles:                                                                     │
│    - router (HTTPRoute, InferencePool, EPP)                                 │
│    - prefiller/decoder/worker (LeaderWorkerSet, PodGroup)                   │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    │  InferenceService Controller  │
                    └───────────────┬───────────────┘
                                    │
        ┌───────────────┬───────────┼───────────────┬───────────────┐
        ▼               ▼           ▼               ▼               ▼
┌───────────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────────┐
│ HTTPRoute     │ │ Inference │ │    EPP    │ │  Leader   │ │   PodGroup    │
│ (Gateway API) │ │   Pool    │ │ Deployment│ │ WorkerSet │ │   (Volcano)   │
└───────────────┘ └───────────┘ └───────────┘ └───────────┘ └───────────────┘