Configuration
Last updated
This page is the authoritative reference for every knob the pack exposes.
It covers the Helm chart values you set at install time, the LLMModel custom resource you create per model, and the NebariApp fields the chart uses to wire the key-manager UI into the Nebari platform.
All tables are derived directly from the source files:
charts/nebari-llm-serving/values.yaml- chart valuescharts/nebari-llm-serving/crds/llmmodel-crd.yaml- CRD schema
See the Architecture page for how these pieces fit together, and Installation for initial setup. The Quickstart shows a minimal working example.
Helm values reference #
Pass these with --set key=value or in a values.yaml override file.
Platform #
| Key | Default | Description |
|---|---|---|
createNamespace | true | Whether the chart provisions the operator namespace and labels it nebari.dev/managed=true. Set to false when something else owns the namespace (ArgoCD, Terraform, etc.). |
platform.baseDomain | "" | Required. Base domain for the Nebari deployment (e.g. your-cluster.example.com). |
platform.gateway.external.name | nebari-gateway | Name of the external Envoy Gateway resource. |
platform.gateway.external.namespace | envoy-gateway-system | Namespace of the external Envoy Gateway. |
platform.gateway.internal.name | nebari-internal-gateway | Name of the internal Envoy Gateway resource. |
platform.gateway.internal.namespace | envoy-gateway-system | Namespace of the internal Envoy Gateway. |
platform.gateway.manageSharedListeners | true | When true, the operator patches HTTPS listeners for llm.<baseDomain> and llm-internal.<baseDomain> onto the shared Gateways. Set to false if a cluster admin owns the listeners out-of-band. |
platform.tls.clusterIssuer | letsencrypt-production | The cert-manager ClusterIssuer used to issue the shared TLS certificate. Override when using a staging issuer. |
platform.tls.secretName | "" | Bring-your-own-certificate mode. When set, no cert-manager Certificate is created; the operator expects a pre-provisioned kubernetes.io/tls Secret of this name in the operator namespace. Use for air-gapped or private-CA clusters where ACME issuance is not possible. cert-manager is not required when this is set. |
Auth / OIDC #
| Key | Default | Description |
|---|---|---|
auth.oidc.issuerURL | "" | OIDC issuer URL. When empty, read from a Secret via secretKeyRef (the default Nebari integration path). |
auth.oidc.existingSecret | "" | Name of the Secret containing the issuer-url key. Defaults to <release-fullname>-oidc-client when empty. |
auth.oidc.groupsClaim | groups | JWT claim name used to extract group memberships. |
auth.oidc.audience | "" | Expected audience value in JWT tokens. |
auth.oidc.cookiePrefix | IdToken | Cookie name prefix used by the OAuth2 proxy sidecar. |
Envoy AI Gateway #
| Key | Default | Description |
|---|---|---|
envoyAIGateway.install | false | Install the Envoy AI Gateway. Not yet implemented; reserved for future use. |
Key Manager #
| Key | Default | Description |
|---|---|---|
keyManager.enabled | true | Deploy the key-manager service. |
keyManager.image.repository | ghcr.io/nebari-dev/nebari-llm-serving-pack/key-manager | Container image repository. |
keyManager.image.tag | "" | Image tag. Defaults to .Chart.AppVersion when empty, keeping chart and image versions in sync. Override only when testing a specific build. |
keyManager.image.pullPolicy | Always | Image pull policy. |
keyManager.auditInterval | 5m | How often the scaffolded audit loop runs (only when oidcUserinfoURL is set). Group-change revocation is not yet implemented (see below), so this loop performs no revocation today. |
keyManager.oidcUserinfoURL | "" | OIDC userinfo endpoint for the audit loop. Empty (the default) means the loop never starts. Setting it does not enable group-change revocation in v0.1: the userinfo lookup is a stub that always errors pending OIDC token exchange, so the auditor skips revocation. Group-change revocation is planned but not yet implemented. For Keycloak: https://<keycloak>/realms/<realm>/protocol/openid-connect/userinfo. |
keyManager.nebariApp.enabled | true | Create the NebariApp CR that registers the key-manager UI with the Nebari platform. Set to false for standalone installations without a Nebari cluster. |
keyManager.nebariApp.hostname | "" | Fully qualified hostname for the key-manager UI. Required when nebariApp.enabled=true (e.g. llm-keys.your-cluster.example.com). |
keyManager.nebariApp.gateway | public | Which Nebari shared gateway to attach the HTTPRoute to. Valid values: public or internal. |
keyManager.nebariApp.routing.routes[0].pathPrefix | / | Path prefix for the key-manager UI route. |
keyManager.nebariApp.routing.routes[0].pathType | PathPrefix | Path match type. |
keyManager.nebariApp.auth.provisionClient | true | Create a dedicated Keycloak OIDC client for the UI. |
keyManager.nebariApp.auth.scopes | [openid, profile, email, groups] | OIDC scopes requested during login. |
keyManager.nebariApp.landingPage.enabled | true | Show the key-manager on the Nebari landing page. |
keyManager.nebariApp.landingPage.displayName | "LLM API Keys" | Display name on the landing page tile. |
keyManager.nebariApp.landingPage.description | "Manage your personal API keys for LLM inference." | Tile description. |
keyManager.nebariApp.landingPage.category | "Platform" | Tile category grouping. |
keyManager.nebariApp.landingPage.icon | "keycloak" | Tile icon. One of the built-in keys (jupyter, grafana, prometheus, keycloak, argocd, kubernetes) or a URL to a custom image. |
keyManager.nebariApp.landingPage.priority | 100 | Sort order on the landing page (lower = higher priority). |
keyManager.nebariApp.landingPage.healthCheck.enabled | true | Enable active health probing for the landing page tile. |
keyManager.nebariApp.landingPage.healthCheck.path | / | Probe path. The key-manager serves its SPA at / (HTTP 200, unauthenticated). |
keyManager.nebariApp.landingPage.healthCheck.intervalSeconds | 30 | Probe interval in seconds. |
keyManager.nebariApp.landingPage.healthCheck.timeoutSeconds | 5 | Probe timeout in seconds. |
Operator #
| Key | Default | Description |
|---|---|---|
operator.image.repository | ghcr.io/nebari-dev/nebari-llm-serving-pack/operator | Operator container image repository. |
operator.image.tag | "" | Image tag. Defaults to .Chart.AppVersion when empty. Override only when testing a specific build. |
operator.image.pullPolicy | Always | Image pull policy. |
Defaults (cluster-wide fallbacks for LLMModel resources) #
| Key | Default | Description |
|---|---|---|
defaults.serving.image | ghcr.io/llm-d/llm-d-cuda:v0.7.0 | Default vLLM container image applied to all LLMModel resources that do not set serving.image. |
defaults.storage.storageClassName | "" | Default storageClassName for PVC-backed model storage. Empty means use the cluster default storage class. |
defaults.monitoring.enabled | true | Default monitoring (PodMonitor) enablement for all LLMModel resources. |
LLMModel CRD reference #
LLMModel is a namespaced custom resource (llmmodels.llm.nebari.dev, group llm.nebari.dev/v1alpha1). Each instance deploys one model via llm-d and wires it into the pack’s routing and access control.
Required fields: spec.access, spec.model, spec.resources.
spec.model #
Specifies which LLM to serve.
| Field | Type | Required | Description |
|---|---|---|---|
spec.model.name | string | Yes | Model identifier (e.g. mistralai/Devstral-Small-2505). |
spec.model.source | string | Yes | Where to load the model from. Enum: huggingface or oci. |
spec.model.authSecretName | string | No | Name of a Kubernetes Secret containing HF_TOKEN. Required for gated HuggingFace models. |
spec.model.image | string | No | OCI image containing the model. Used when source: oci. |
spec.model.preload | boolean | No | When true, uses an init container to download the model before vLLM starts. |
spec.model.revision | string | No | HuggingFace commit hash or tag to pin for reproducible deployments. |
spec.model.storage #
| Field | Type | Required | Description |
|---|---|---|---|
spec.model.storage.type | string | No | Volume type. Enum: pvc (default) or emptyDir. |
spec.model.storage.size | string | No | Storage size (e.g. 200Gi). Required when type: pvc. |
spec.model.storage.storageClassName | string | No | Overrides defaults.storage.storageClassName from the chart values. |
spec.resources #
Specifies compute requirements. spec.resources.gpu is required.
| Field | Type | Required | Description |
|---|---|---|---|
spec.resources.gpu.count | integer | Yes | Number of GPUs required per replica. |
spec.resources.gpu.type | string | Yes | GPU type. Default: nvidia. |
spec.resources.requests | map | No | CPU/memory resource requests (standard Kubernetes quantity map). |
spec.resources.limits | map | No | CPU/memory resource limits (standard Kubernetes quantity map). |
spec.access #
Controls who can call the model’s API.
| Field | Type | Required | Description |
|---|---|---|---|
spec.access.public | boolean | No (defaults false) | When true, all authenticated users can access the model regardless of group membership. |
spec.access.groups | []string | No | OIDC group names that are allowed access. Ignored when public: true. |
The
spec.accessobject itself is required, but neitherpublicnorgroupsis individually required by the schema. Setpublic: trueor listgroups(or both) - otherwise no one but cluster admins can reach the model.
spec.serving #
Configures the vLLM serving layer. All fields optional.
| Field | Type | Default | Description |
|---|---|---|---|
spec.serving.replicas | integer | 1 | Number of serving replicas. |
spec.serving.image | string | chart default | Container image for the vLLM server. Overrides defaults.serving.image. |
spec.serving.updateStrategy | string | Recreate | Deployment rollout strategy: Recreate (default) or RollingUpdate. Recreate is the default because model pods hold exclusive GPUs and a ReadWriteOnce PVC; on clusters without spare GPU capacity a rolling update deadlocks until the old pod is removed. Set to RollingUpdate only when the cluster has enough free GPUs to run old and new pods simultaneously. |
spec.serving.tensorParallelism | integer | gpu.count | Tensor parallelism degree. Defaults to the GPU count when not set. |
spec.serving.dataParallelism | integer | 1 | Data parallelism degree. |
spec.serving.vllmArgs | []string | - | Additional CLI arguments passed directly to vLLM. |
spec.serving.monitoring.enabled | boolean | chart default | Create a PodMonitor for Prometheus scraping. Overrides defaults.monitoring.enabled. |
spec.endpoints #
Controls which network endpoints are created.
| Field | Type | Default | Description |
|---|---|---|---|
spec.endpoints.external.enabled | boolean | true | Create the external (API-key-authenticated) endpoint at llm.<baseDomain>. |
spec.endpoints.external.subdomain | string | auto | Override the auto-generated subdomain for the external endpoint. |
spec.endpoints.internal.enabled | boolean | true | Create the internal (JWT-authenticated) endpoint at llm-internal.<baseDomain>. |
spec.advanced #
Escape hatches for power users. All fields optional.
| Field | Type | Description |
|---|---|---|
spec.advanced.vllm.nodeSelector | map | Node selector labels for targeting specific node pools. |
spec.advanced.vllm.tolerations | []Toleration | Kubernetes tolerations for GPU node taints. |
spec.advanced.vllm.affinity | Affinity | Full Kubernetes affinity rules for pod scheduling. |
spec.advanced.vllm.extraArgs | []string | Additional CLI arguments appended after serving.vllmArgs. |
spec.advanced.vllm.extraEnv | []EnvVar | Additional environment variables on the vLLM container (supports value, valueFrom, secretKeyRef, etc.). |
spec.advanced.inferencePool.schedulerConfig | object | EPP scheduler plugin configuration (free-form, x-kubernetes-preserve-unknown-fields). |
Status fields #
The operator writes these fields to status; they are read-only.
| Field | Type | Description |
|---|---|---|
status.phase | string | Lifecycle phase: Pending, Downloading, Starting, Ready, Degraded, or Error. |
status.replicas.ready | integer | Current number of ready replicas. |
status.replicas.desired | integer | Desired number of replicas. |
status.endpoints.external | string | Actual external endpoint URL once provisioned. |
status.endpoints.internal | string | Actual internal endpoint URL once provisioned. |
status.modelSize | string | Actual model size after download. |
status.conditions | []Condition | Standard Kubernetes conditions (type, status, reason, message). |
NebariApp fields (key-manager) #
The chart renders a NebariApp CR (API group reconcilers.nebari.dev/v1) when keyManager.nebariApp.enabled=true. The nebari-operator consumes this CR to provision an HTTPRoute, a cert-manager Certificate, a Keycloak OIDC client, and a landing page tile.
The fields below map directly to the chart values in the Key Manager section above.
| NebariApp field | Source value | Description |
|---|---|---|
spec.hostname | keyManager.nebariApp.hostname | Fully qualified hostname. Required. |
spec.gateway | keyManager.nebariApp.gateway | public or internal. |
spec.service.name | (chart-derived) | Service name for the key-manager backend. |
spec.service.port | 8080 | Fixed service port. |
spec.routing | keyManager.nebariApp.routing | Passed through verbatim. Without this block the operator sets RoutingReady=False and skips HTTPRoute and Certificate provisioning. |
spec.auth.enabled | true (fixed) | Authentication always enabled for the key-manager. |
spec.auth.provider | keycloak (fixed) | Auth provider. |
spec.auth.provisionClient | keyManager.nebariApp.auth.provisionClient | When true, creates a dedicated Keycloak client and writes credentials to <nebariapp-name>-oidc-client Secret. |
spec.auth.scopes | keyManager.nebariApp.auth.scopes | OIDC scopes requested. |
spec.landingPage.enabled | keyManager.nebariApp.landingPage.enabled | Show tile on Nebari landing page. |
spec.landingPage.displayName | keyManager.nebariApp.landingPage.displayName | Tile display name. Required when enabled: true. |
spec.landingPage.description | keyManager.nebariApp.landingPage.description | Tile description. |
spec.landingPage.category | keyManager.nebariApp.landingPage.category | Tile category grouping. |
spec.landingPage.icon | keyManager.nebariApp.landingPage.icon | Tile icon key or URL. |
spec.landingPage.priority | keyManager.nebariApp.landingPage.priority | Sort order. |
spec.landingPage.healthCheck.enabled | keyManager.nebariApp.landingPage.healthCheck.enabled | Enable active health probing. |
spec.landingPage.healthCheck.path | keyManager.nebariApp.landingPage.healthCheck.path | Probe path. |
spec.landingPage.healthCheck.intervalSeconds | keyManager.nebariApp.landingPage.healthCheck.intervalSeconds | Probe interval. |
spec.landingPage.healthCheck.timeoutSeconds | keyManager.nebariApp.landingPage.healthCheck.timeoutSeconds | Probe timeout. |
For the full NebariApp CRD schema, see the nebari-operator repository.
Minimal LLMModel example #
The example below is the smallest valid manifest. See Quickstart for an annotated walkthrough and Shared Storage for PVC options.
apiVersion: llm.nebari.dev/v1alpha1
kind: LLMModel
metadata:
name: my-model
namespace: llm-serving
spec:
access:
public: true
model:
name: mistralai/Devstral-Small-2505
source: huggingface
authSecretName: hf-token
preload: true
storage:
type: pvc
size: 200Gi
resources:
gpu:
count: 1
type: nvidia
serving:
replicas: 1