Configuration

Last updated

This page is the authoritative reference for every knob the pack exposes. It covers the Helm chart values you set at install time, the LLMModel custom resource you create per model, and the NebariApp fields the chart uses to wire the key-manager UI into the Nebari platform.

All tables are derived directly from the source files:

  • charts/nebari-llm-serving/values.yaml - chart values
  • charts/nebari-llm-serving/crds/llmmodel-crd.yaml - CRD schema

See the Architecture page for how these pieces fit together, and Installation for initial setup. The Quickstart shows a minimal working example.


Helm values reference #

Pass these with --set key=value or in a values.yaml override file.

Platform #

KeyDefaultDescription
createNamespacetrueWhether the chart provisions the operator namespace and labels it nebari.dev/managed=true. Set to false when something else owns the namespace (ArgoCD, Terraform, etc.).
platform.baseDomain""Required. Base domain for the Nebari deployment (e.g. your-cluster.example.com).
platform.gateway.external.namenebari-gatewayName of the external Envoy Gateway resource.
platform.gateway.external.namespaceenvoy-gateway-systemNamespace of the external Envoy Gateway.
platform.gateway.internal.namenebari-internal-gatewayName of the internal Envoy Gateway resource.
platform.gateway.internal.namespaceenvoy-gateway-systemNamespace of the internal Envoy Gateway.
platform.gateway.manageSharedListenerstrueWhen true, the operator patches HTTPS listeners for llm.<baseDomain> and llm-internal.<baseDomain> onto the shared Gateways. Set to false if a cluster admin owns the listeners out-of-band.
platform.tls.clusterIssuerletsencrypt-productionThe cert-manager ClusterIssuer used to issue the shared TLS certificate. Override when using a staging issuer.
platform.tls.secretName""Bring-your-own-certificate mode. When set, no cert-manager Certificate is created; the operator expects a pre-provisioned kubernetes.io/tls Secret of this name in the operator namespace. Use for air-gapped or private-CA clusters where ACME issuance is not possible. cert-manager is not required when this is set.

Auth / OIDC #

KeyDefaultDescription
auth.oidc.issuerURL""OIDC issuer URL. When empty, read from a Secret via secretKeyRef (the default Nebari integration path).
auth.oidc.existingSecret""Name of the Secret containing the issuer-url key. Defaults to <release-fullname>-oidc-client when empty.
auth.oidc.groupsClaimgroupsJWT claim name used to extract group memberships.
auth.oidc.audience""Expected audience value in JWT tokens.
auth.oidc.cookiePrefixIdTokenCookie name prefix used by the OAuth2 proxy sidecar.

Envoy AI Gateway #

KeyDefaultDescription
envoyAIGateway.installfalseInstall the Envoy AI Gateway. Not yet implemented; reserved for future use.

Key Manager #

KeyDefaultDescription
keyManager.enabledtrueDeploy the key-manager service.
keyManager.image.repositoryghcr.io/nebari-dev/nebari-llm-serving-pack/key-managerContainer image repository.
keyManager.image.tag""Image tag. Defaults to .Chart.AppVersion when empty, keeping chart and image versions in sync. Override only when testing a specific build.
keyManager.image.pullPolicyAlwaysImage pull policy.
keyManager.auditInterval5mHow often the scaffolded audit loop runs (only when oidcUserinfoURL is set). Group-change revocation is not yet implemented (see below), so this loop performs no revocation today.
keyManager.oidcUserinfoURL""OIDC userinfo endpoint for the audit loop. Empty (the default) means the loop never starts. Setting it does not enable group-change revocation in v0.1: the userinfo lookup is a stub that always errors pending OIDC token exchange, so the auditor skips revocation. Group-change revocation is planned but not yet implemented. For Keycloak: https://<keycloak>/realms/<realm>/protocol/openid-connect/userinfo.
keyManager.nebariApp.enabledtrueCreate the NebariApp CR that registers the key-manager UI with the Nebari platform. Set to false for standalone installations without a Nebari cluster.
keyManager.nebariApp.hostname""Fully qualified hostname for the key-manager UI. Required when nebariApp.enabled=true (e.g. llm-keys.your-cluster.example.com).
keyManager.nebariApp.gatewaypublicWhich Nebari shared gateway to attach the HTTPRoute to. Valid values: public or internal.
keyManager.nebariApp.routing.routes[0].pathPrefix/Path prefix for the key-manager UI route.
keyManager.nebariApp.routing.routes[0].pathTypePathPrefixPath match type.
keyManager.nebariApp.auth.provisionClienttrueCreate a dedicated Keycloak OIDC client for the UI.
keyManager.nebariApp.auth.scopes[openid, profile, email, groups]OIDC scopes requested during login.
keyManager.nebariApp.landingPage.enabledtrueShow the key-manager on the Nebari landing page.
keyManager.nebariApp.landingPage.displayName"LLM API Keys"Display name on the landing page tile.
keyManager.nebariApp.landingPage.description"Manage your personal API keys for LLM inference."Tile description.
keyManager.nebariApp.landingPage.category"Platform"Tile category grouping.
keyManager.nebariApp.landingPage.icon"keycloak"Tile icon. One of the built-in keys (jupyter, grafana, prometheus, keycloak, argocd, kubernetes) or a URL to a custom image.
keyManager.nebariApp.landingPage.priority100Sort order on the landing page (lower = higher priority).
keyManager.nebariApp.landingPage.healthCheck.enabledtrueEnable active health probing for the landing page tile.
keyManager.nebariApp.landingPage.healthCheck.path/Probe path. The key-manager serves its SPA at / (HTTP 200, unauthenticated).
keyManager.nebariApp.landingPage.healthCheck.intervalSeconds30Probe interval in seconds.
keyManager.nebariApp.landingPage.healthCheck.timeoutSeconds5Probe timeout in seconds.

Operator #

KeyDefaultDescription
operator.image.repositoryghcr.io/nebari-dev/nebari-llm-serving-pack/operatorOperator container image repository.
operator.image.tag""Image tag. Defaults to .Chart.AppVersion when empty. Override only when testing a specific build.
operator.image.pullPolicyAlwaysImage pull policy.

Defaults (cluster-wide fallbacks for LLMModel resources) #

KeyDefaultDescription
defaults.serving.imageghcr.io/llm-d/llm-d-cuda:v0.7.0Default vLLM container image applied to all LLMModel resources that do not set serving.image.
defaults.storage.storageClassName""Default storageClassName for PVC-backed model storage. Empty means use the cluster default storage class.
defaults.monitoring.enabledtrueDefault monitoring (PodMonitor) enablement for all LLMModel resources.

LLMModel CRD reference #

LLMModel is a namespaced custom resource (llmmodels.llm.nebari.dev, group llm.nebari.dev/v1alpha1). Each instance deploys one model via llm-d and wires it into the pack’s routing and access control.

Required fields: spec.access, spec.model, spec.resources.

spec.model #

Specifies which LLM to serve.

FieldTypeRequiredDescription
spec.model.namestringYesModel identifier (e.g. mistralai/Devstral-Small-2505).
spec.model.sourcestringYesWhere to load the model from. Enum: huggingface or oci.
spec.model.authSecretNamestringNoName of a Kubernetes Secret containing HF_TOKEN. Required for gated HuggingFace models.
spec.model.imagestringNoOCI image containing the model. Used when source: oci.
spec.model.preloadbooleanNoWhen true, uses an init container to download the model before vLLM starts.
spec.model.revisionstringNoHuggingFace commit hash or tag to pin for reproducible deployments.

spec.model.storage #

FieldTypeRequiredDescription
spec.model.storage.typestringNoVolume type. Enum: pvc (default) or emptyDir.
spec.model.storage.sizestringNoStorage size (e.g. 200Gi). Required when type: pvc.
spec.model.storage.storageClassNamestringNoOverrides defaults.storage.storageClassName from the chart values.

spec.resources #

Specifies compute requirements. spec.resources.gpu is required.

FieldTypeRequiredDescription
spec.resources.gpu.countintegerYesNumber of GPUs required per replica.
spec.resources.gpu.typestringYesGPU type. Default: nvidia.
spec.resources.requestsmapNoCPU/memory resource requests (standard Kubernetes quantity map).
spec.resources.limitsmapNoCPU/memory resource limits (standard Kubernetes quantity map).

spec.access #

Controls who can call the model’s API.

FieldTypeRequiredDescription
spec.access.publicbooleanNo (defaults false)When true, all authenticated users can access the model regardless of group membership.
spec.access.groups[]stringNoOIDC group names that are allowed access. Ignored when public: true.

The spec.access object itself is required, but neither public nor groups is individually required by the schema. Set public: true or list groups (or both) - otherwise no one but cluster admins can reach the model.

spec.serving #

Configures the vLLM serving layer. All fields optional.

FieldTypeDefaultDescription
spec.serving.replicasinteger1Number of serving replicas.
spec.serving.imagestringchart defaultContainer image for the vLLM server. Overrides defaults.serving.image.
spec.serving.updateStrategystringRecreateDeployment rollout strategy: Recreate (default) or RollingUpdate. Recreate is the default because model pods hold exclusive GPUs and a ReadWriteOnce PVC; on clusters without spare GPU capacity a rolling update deadlocks until the old pod is removed. Set to RollingUpdate only when the cluster has enough free GPUs to run old and new pods simultaneously.
spec.serving.tensorParallelismintegergpu.countTensor parallelism degree. Defaults to the GPU count when not set.
spec.serving.dataParallelisminteger1Data parallelism degree.
spec.serving.vllmArgs[]string-Additional CLI arguments passed directly to vLLM.
spec.serving.monitoring.enabledbooleanchart defaultCreate a PodMonitor for Prometheus scraping. Overrides defaults.monitoring.enabled.

spec.endpoints #

Controls which network endpoints are created.

FieldTypeDefaultDescription
spec.endpoints.external.enabledbooleantrueCreate the external (API-key-authenticated) endpoint at llm.<baseDomain>.
spec.endpoints.external.subdomainstringautoOverride the auto-generated subdomain for the external endpoint.
spec.endpoints.internal.enabledbooleantrueCreate the internal (JWT-authenticated) endpoint at llm-internal.<baseDomain>.

spec.advanced #

Escape hatches for power users. All fields optional.

FieldTypeDescription
spec.advanced.vllm.nodeSelectormapNode selector labels for targeting specific node pools.
spec.advanced.vllm.tolerations[]TolerationKubernetes tolerations for GPU node taints.
spec.advanced.vllm.affinityAffinityFull Kubernetes affinity rules for pod scheduling.
spec.advanced.vllm.extraArgs[]stringAdditional CLI arguments appended after serving.vllmArgs.
spec.advanced.vllm.extraEnv[]EnvVarAdditional environment variables on the vLLM container (supports value, valueFrom, secretKeyRef, etc.).
spec.advanced.inferencePool.schedulerConfigobjectEPP scheduler plugin configuration (free-form, x-kubernetes-preserve-unknown-fields).

Status fields #

The operator writes these fields to status; they are read-only.

FieldTypeDescription
status.phasestringLifecycle phase: Pending, Downloading, Starting, Ready, Degraded, or Error.
status.replicas.readyintegerCurrent number of ready replicas.
status.replicas.desiredintegerDesired number of replicas.
status.endpoints.externalstringActual external endpoint URL once provisioned.
status.endpoints.internalstringActual internal endpoint URL once provisioned.
status.modelSizestringActual model size after download.
status.conditions[]ConditionStandard Kubernetes conditions (type, status, reason, message).

NebariApp fields (key-manager) #

The chart renders a NebariApp CR (API group reconcilers.nebari.dev/v1) when keyManager.nebariApp.enabled=true. The nebari-operator consumes this CR to provision an HTTPRoute, a cert-manager Certificate, a Keycloak OIDC client, and a landing page tile.

The fields below map directly to the chart values in the Key Manager section above.

NebariApp fieldSource valueDescription
spec.hostnamekeyManager.nebariApp.hostnameFully qualified hostname. Required.
spec.gatewaykeyManager.nebariApp.gatewaypublic or internal.
spec.service.name(chart-derived)Service name for the key-manager backend.
spec.service.port8080Fixed service port.
spec.routingkeyManager.nebariApp.routingPassed through verbatim. Without this block the operator sets RoutingReady=False and skips HTTPRoute and Certificate provisioning.
spec.auth.enabledtrue (fixed)Authentication always enabled for the key-manager.
spec.auth.providerkeycloak (fixed)Auth provider.
spec.auth.provisionClientkeyManager.nebariApp.auth.provisionClientWhen true, creates a dedicated Keycloak client and writes credentials to <nebariapp-name>-oidc-client Secret.
spec.auth.scopeskeyManager.nebariApp.auth.scopesOIDC scopes requested.
spec.landingPage.enabledkeyManager.nebariApp.landingPage.enabledShow tile on Nebari landing page.
spec.landingPage.displayNamekeyManager.nebariApp.landingPage.displayNameTile display name. Required when enabled: true.
spec.landingPage.descriptionkeyManager.nebariApp.landingPage.descriptionTile description.
spec.landingPage.categorykeyManager.nebariApp.landingPage.categoryTile category grouping.
spec.landingPage.iconkeyManager.nebariApp.landingPage.iconTile icon key or URL.
spec.landingPage.prioritykeyManager.nebariApp.landingPage.prioritySort order.
spec.landingPage.healthCheck.enabledkeyManager.nebariApp.landingPage.healthCheck.enabledEnable active health probing.
spec.landingPage.healthCheck.pathkeyManager.nebariApp.landingPage.healthCheck.pathProbe path.
spec.landingPage.healthCheck.intervalSecondskeyManager.nebariApp.landingPage.healthCheck.intervalSecondsProbe interval.
spec.landingPage.healthCheck.timeoutSecondskeyManager.nebariApp.landingPage.healthCheck.timeoutSecondsProbe timeout.

For the full NebariApp CRD schema, see the nebari-operator repository.


Minimal LLMModel example #

The example below is the smallest valid manifest. See Quickstart for an annotated walkthrough and Shared Storage for PVC options.

apiVersion: llm.nebari.dev/v1alpha1
kind: LLMModel
metadata:
  name: my-model
  namespace: llm-serving
spec:
  access:
    public: true
  model:
    name: mistralai/Devstral-Small-2505
    source: huggingface
    authSecretName: hf-token
    preload: true
    storage:
      type: pvc
      size: 200Gi
  resources:
    gpu:
      count: 1
      type: nvidia
  serving:
    replicas: 1
Edit this page on GitHub