Mimir 6.x Upgrade Guide📜
This document covers the breaking changes and required actions when upgrading Grafana Mimir from 5.x to 6.x within Big Bang. It is organized into two sections: upstream breaking changes (with references to the authoritative Grafana documentation) and Big Bang-specific changes that are unique to this package’s implementation.
Upstream Breaking Changes📜
The upstream Mimir 6.0 release introduces significant architectural changes. This section highlights the areas most relevant to Big Bang operators, but is not a replacement for the full upstream migration guide.
Required reading: Migrate from Helm chart 5.x to 6.0
rollout-operator CRD Schema Changes📜
Mimir 6.0 ships with a new version of the rollout-operator that includes breaking schema changes to two CRDs:
replicatemplates.rollout-operator.grafana.comzoneawarepoddisruptionbudgets.rollout-operator.grafana.com
Because Kubernetes does not allow in-place updates to CRD schemas when existing data cannot be migrated automatically, these CRDs must be deleted and re-applied before the Helm upgrade completes. Failure to do so will cause the upgrade to fail.
See the upstream migration guide for the full context on why these CRDs changed and what the new schema introduces.
Big Bang handles the CRD delete/re-apply automatically via the upgradeJob pre-upgrade Helm hook — see Automated CRD Upgrade Job for details and the manual fallback procedure.
New Ingest Storage Architecture (Kafka)📜
Mimir 6.0 introduces a native ingest storage path backed by Kafka, replacing the classic gRPC push path between the distributor and the ingester. This is now the upstream-preferred architecture.
Key implications:
- The new architecture requires a Kafka broker (or Kafka-compatible endpoint) to be available.
- The classic gRPC push path remains supported but must be explicitly re-enabled if you are not adopting Kafka.
- The two modes are mutually exclusive —
ingest_storage.enabled: trueandingester.push_grpc_method_enabled: truecannot both be active.
See upstream docs: Ingest storage overview
Big Bang defaults to the classic architecture rather than the upstream default — see Classic Architecture Default and Ingest Storage Adoption for details.
NGINX Deprecation📜
The standalone NGINX deployment model, deprecated in Mimir 5.x, is fully removed in 6.0 in favor of the unified gateway.
- Big Bang already defaults to the unified gateway (
upstream.gateway.enabled: true). Operators who have not customized this are unaffected. - If you have any explicit
nginx:configuration in your overrides, you must complete the migration to the unified gateway before upgrading to 6.0.
See upstream migration guide: Migrate to unified proxy deployment
Additional Upstream Changes📜
The following upstream changes are covered in detail in the Grafana migration guide linked above. Operators should review each section for applicability to their deployment:
- Zone-aware replication topology changes
store_gateway.sharding_ringstability tuning recommendations for upgrades- Removal of previously deprecated configuration keys
Big Bang-Specific Changes📜
The following changes are unique to the Big Bang Mimir package and are not addressed in upstream documentation.
1. Automated CRD Upgrade Job📜
To eliminate the manual CRD deletion step required by the upstream migration guide, Big Bang includes a pre-upgrade Helm hook job (upgradeJob) that automatically handles the rollout-operator CRD lifecycle during a helm upgrade.
What it does:
- Detects whether the upgrade is transitioning from a chart version prior to
6.0.0-bb.0. - Deletes the old
replicatemplatesandzoneawarepoddisruptionbudgetsCRDs. - Applies the updated CRD definitions (bundled in the chart under
files/crds/). - Waits up to 60 seconds for both CRDs to reach
Establishedstate before allowing the upgrade to proceed.
The job only fires when all three conditions are true:
- helm upgrade is being run (not an install)
- upgradeJob.enabled: true (the default)
- upstream.rollout_operator.enabled: true (the default)
This job is a one-time operation. It only triggers when upgrading from a chart version prior to 6.0.0-bb.0. Subsequent upgrades within the 6.x line will not fire the job.
Upgrade job network policy (when networkPolicies.enabled: true):
When Big Bang network policies are enabled, the upgradeJob hook also creates a NetworkPolicy (api-egress-upgrade-job) scoped to the upgrade job pod, permitting it to reach the Kubernetes API. This policy is automatically deleted after the hook completes regardless of success or failure — it will not appear in steady-state.
By default, Big Bang sets networkPolicies.controlPlaneCidr: 0.0.0.0/0, resulting in a permissive egress rule (all destinations except the AWS metadata IP 169.254.169.254/32). No action is required for standard deployments.
For hardened environments (GovCloud, IL4/IL5), it is strongly recommended to scope this to your cluster’s specific control plane IP before upgrading:
networkPolicies:
controlPlaneCidr: "172.16.0.1/32" # replace with your actual control plane IP
Use kubectl get endpoints -n default kubernetes to find the correct value. If networkPolicies.enabled: false, this NetworkPolicy is not rendered and no action is needed.
Disabling the job and upgrading manually:
If you prefer to manage the CRD lifecycle yourself — for example, in an air-gapped environment where the job cannot reach the Kubernetes API — disable the job and follow the upstream manual procedure:
# In your Big Bang values override
mimir:
values:
upgradeJob:
enabled: false
Then perform the CRD deletion and re-application manually before running helm upgrade:
# Delete the old CRDs (safe — no CR data is stored in these CRDs)
kubectl delete crd replicatemplates.rollout-operator.grafana.com --ignore-not-found
kubectl delete crd zoneawarepoddisruptionbudgets.rollout-operator.grafana.com --ignore-not-found
# Apply the updated CRDs from the chart source
kubectl apply -f chart/files/crds/replica-templates.yaml
kubectl apply -f chart/files/crds/zone-aware-pod-disruption-budget.yaml
# Verify both CRDs are established before proceeding
kubectl wait --for=condition=established --timeout=60s \
crd/replicatemplates.rollout-operator.grafana.com \
crd/zoneawarepoddisruptionbudgets.rollout-operator.grafana.com
Upstream reference for the manual CRD procedure: Migrate from Helm chart 5.x to 6.0 — CRD upgrade steps
2. Classic Architecture Default and Ingest Storage Adoption📜
Unlike upstream, which enables ingest storage and Kafka by default in 6.0, Big Bang defaults to the classic gRPC push architecture (ingest_storage.enabled: false, ingester.push_grpc_method_enabled: true, kafka.enabled: false). This is an intentional deviation that preserves continuity for existing deployments upgrading from 5.x, giving operators the time to plan and provision a production Kafka backend at their own pace.
When you are ready to adopt ingest storage, the bundled kafka-native image included in the chart is intended for demonstration and testing only — upstream explicitly states it is not suitable for production. For production deployments, use a cloud-managed Kafka service such as Amazon MSK, Confluent Cloud, or Azure Event Hubs, and configure Mimir to connect to it externally. Refer to the upstream ingest storage documentation for production Kafka configuration guidance.
For hardened environments (GovCloud, IL4/IL5), TLS encryption of the Kafka broker connection is required. Configure ingest_storage.kafka.tls in your structuredConfig and ensure your Kafka backend (e.g. Amazon MSK) has TLS/SASL enabled before pointing Mimir at it.
3. rollout-operator Admission Webhook Impact on MinIO Tenant📜
Mimir 6.0 enables the rollout-operator by default. The rollout-operator installs namespace-scoped admission webhooks (no-downscale, prepare-downscale, pod-eviction) that intercept all StatefulSet UPDATE operations within the Mimir release namespace with failurePolicy: Fail.
The problem:
When using the Mimir-package MinIO Tenant (minio-tenant.enabled: true), the MinIO Tenant runs in the mimir namespace alongside Mimir. The rollout-operator’s admission webhooks intercept all StatefulSet UPDATE operations in that namespace, including those issued by the MinIO Operator when reconciling the Tenant. This causes:
- MinIO Operator reconciliation failures silently blocked at the webhook
TenantCR spec changes (scaling, configuration updates) that appear to apply but are never actuated- Initial bucket creation that may fail or time out if MinIO does not fully initialize
Option A — Same-namespace mitigation (Mimir-package MinIO Tenant):
The package includes a minio-tenant.bucketInit Job that mitigates the initial bucket creation race by polling MinIO’s health endpoint and only creating buckets once MinIO is ready. Enable it alongside the Mimir-package MinIO Tenant:
mimir:
values:
minio-tenant:
enabled: true
bucketInit:
enabled: true
Important: The
bucketInitjob only addresses initial bucket creation. SubsequentTenantspec changes made after initial deployment may still be silently blocked by the rollout-operator webhooks as long as MinIO and Mimir share a namespace. For production deployments requiring ongoing MinIO Tenant management, use Option B below.
Option B — Big Bang MinIO Tenant (separate namespace, recommended for production):
The recommended long-term solution is to use the Big Bang MinIO Tenant — the minio and minioOperator addons — which deploys MinIO in its own dedicated minio namespace. Because the rollout-operator webhooks are scoped to the mimir namespace, the MinIO Operator can reconcile the Tenant freely without interference.
Credential management: The
access_key_idandsecret_access_keyvalues below must not be stored as plaintext in Git. Manage them via SOPS-encrypted values or an external secrets provider (e.g. Vault, AWS Secrets Manager via External Secrets Operator).
- Enable the
minioOperatorandminioaddons in your Big Bang values:
addons:
minioOperator:
enabled: true
minio:
enabled: true
- Disable the Mimir-package MinIO Tenant and point Mimir’s object storage at the Big Bang MinIO service:
mimir:
values:
minio-tenant:
enabled: false
upstream:
mimir:
structuredConfig:
common:
storage:
backend: s3
s3:
endpoint: minio.minio.svc.cluster.local:9000
# Recommended: manage credentials via SOPS or an external secrets provider.
# Plaintext values are acceptable for development/testing only — do not commit to Git.
access_key_id: <your-access-key>
secret_access_key: <your-secret-key>
insecure: true
blocks_storage:
s3:
bucket_name: mimir-blocks
ruler_storage:
s3:
bucket_name: mimir-ruler
alertmanager_storage:
s3:
bucket_name: mimir-alertmanager
When Istio hardening is enabled, you will also need an Istio
ServiceEntryand aNetworkPolicyegress rule to allow Mimir pods to reach the MinIO service across namespaces. See overview.md for guidance on configuring Istio egress for external storage endpoints.
Option C — Disable the rollout-operator:
Important: This option is only safe if zone-aware replication is not enabled for your ingesters (
upstream.ingester.zoneAwareReplication.enabled: false) and store-gateways (upstream.store_gateway.zoneAwareReplication.enabled: false), which is the default. If zone-aware replication is active, disabling the rollout-operator removes theno-downscale,prepare-downscale, andpod-evictionsafeguards against data loss during scale-down and is not recommended.
If neither Option A nor Option B fits your deployment, and zone-aware replication is not in use, you can disable the rollout-operator entirely. This removes the admission webhooks from the namespace, eliminating the interference with MinIO Operator reconciliation.
mimir:
values:
upstream:
rollout_operator:
enabled: false