Back to Blog
Enterprise Distribution

Kubernetes Packaging 101: How We Turned a SaaS Into Enterprise Software (And Only Cried Twice)

A real-world guide to packaging your cloud-native SaaS for on-prem deployment. Based on the 14 times we got it wrong before getting it right.

Oikonex TeamJan 14, 202616 min read

"Nothing Touches Our Cloud."

Picture this: We're in a glass-walled conference room on the 38th floor of a bank's headquarters. The kind of room where the coffee is terrible but the chairs cost more than your car. We'd just delivered the most polished SaaS demo of our lives. The VP of Engineering was nodding. The product team was smiling. We were mentally high-fiving.

Then the CISO leaned forward.

"This is great. But nothing touches our cloud. We need this running in our data center, behind our firewall, on hardware we control. Can you do that?"

We said yes. We had thirty days to figure out what "yes" actually meant.

What followed was a brutal, humbling, and ultimately rewarding crash course in turning a cloud-native SaaS into something that could ship in a box. Metaphorically. The box was a Helm chart. And it took us 14 significant iterations before we got it right.

This is everything we learned.

The "Containerize Everything" Montage

You know that scene in every Rocky movie where he trains in increasingly ridiculous ways, and it's compressed into three minutes of inspirational music? Containerizing our stack was like that, except the music was docker build output scrolling by at 2 AM, and instead of running up the steps of the Philadelphia Museum of Art, we were staring at a 47-layer Dockerfile wondering where we went wrong.

The goal is simple: every component of your application runs as a container image you control. No managed services. No "oh, Lambda handles that." Everything in a box.

Here's what our service-replacement map looked like:

Cloud ServiceWhat We Replaced It WithPain Level (1-10)
Amazon RDSPostgreSQL via CloudNativePG6
Amazon SQSNATS JetStream4
AWS LambdaRegular containers (Knative was overkill for us)7
Amazon S3MinIO3
AWS Secrets ManagerKubernetes Secrets + Sealed Secrets5
Amazon ElastiCacheRedis (Bitnami Helm chart)2
Amazon CognitoKeycloak9 (we don't talk about those two weeks)

The Lambda migration was the one that surprised us. We had 23 Lambda functions. Some were trivial -- a webhook handler here, an image resizer there. But three of them had tentacles deep into Step Functions, EventBridge, and DynamoDB Streams. Extracting those was like removing a load-bearing wall. You can do it, but you'd better have a structural engineer (or at least a whiteboard and a lot of patience).

Our rule of thumb: if a service replacement takes more than a week per function, it's a "Phase 2" item. Ship the core without it and iterate. Your first enterprise customer doesn't need feature parity on day one. They need your core value proposition running behind their firewall.

# Real Dockerfile from our API service
# We learned the hard way: use specific tags, not :latest
# "latest" in production is like "surprise me" at a restaurant
# that also serves sushi and barbecue
FROM node:20.11-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

COPY . .
RUN npm run build

# ---
FROM node:20.11-alpine AS runtime

RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup

WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Don't run as root. Seriously.
# Enterprise security teams WILL scan your images
# and they WILL file a P1 ticket about this.
USER appuser

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:8080/healthz || exit 1

CMD ["node", "dist/server.js"]

The Helm Chart: Your Installation API

Here's something nobody tells you about enterprise software: your values.yaml file is more important than your README. It's the contract between your software and your customer's infrastructure team. Every decision you make here will either prevent a support ticket or generate one.

Our chart structure after 14 iterations:

oikonex-platform/
├── Chart.yaml
├── Chart.lock
├── values.yaml              # The big one
├── values.schema.json       # Validates values (add this early, thank us later)
├── templates/
│   ├── _helpers.tpl         # Template helpers
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml             # Horizontal Pod Autoscaler
│   ├── pdb.yaml             # Pod Disruption Budget
│   ├── networkpolicy.yaml   # Enterprise customers LOVE network policies
│   ├── serviceaccount.yaml
│   └── tests/
│       └── test-connection.yaml
├── charts/                  # Subcharts (postgres, redis, etc.)
└── ci/
    ├── test-values.yaml     # Values for CI testing
    └── production-values.yaml

And here's our battle-tested values.yaml, annotated with the scars of experience:

# =============================================================================
# Image Configuration
# =============================================================================
image:
  repository: registry.oikonex.io/platform
  tag: ""  # Defaults to Chart.appVersion
  pullPolicy: IfNotPresent

# LESSON LEARNED: You WILL have customers with private registries.
# Make pull secrets a first-class citizen, not an afterthought.
imagePullSecrets: []
#  - name: my-registry-secret

# =============================================================================
# Replica Configuration
# =============================================================================
replicaCount: 2

# =============================================================================
# Resource Limits
# Your customer's nodes might be completely different from yours.
# Be generous with defaults but let them tune.
# =============================================================================
resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

# =============================================================================
# Proxy Configuration
# STORY TIME: We shipped v1.0 without proxy settings.
# A Fortune 100 customer ran everything through a corporate proxy.
# They filed 11 support tickets in the first week.
# Eleven. All variations of "your app can't reach the internet."
# We now expose proxy settings for EVERYTHING.
# =============================================================================
proxy:
  httpProxy: ""
  httpsProxy: ""
  noProxy: ""

# =============================================================================
# Ingress
# Don't assume they use nginx. Don't assume they use Traefik.
# Don't assume anything about their ingress.
# =============================================================================
ingress:
  enabled: true
  className: ""  # Let THEM specify
  annotations: {}
  hosts:
    - host: platform.example.com
      paths:
        - path: /
          pathType: Prefix
  tls: []
  #  - secretName: platform-tls
  #    hosts:
  #      - platform.example.com

# =============================================================================
# TLS Configuration
# Enterprise customers manage their own PKI.
# They will NOT use Let's Encrypt. They have an internal CA.
# =============================================================================
tls:
  enabled: true
  existingSecret: ""  # Name of a pre-existing TLS secret
  certManager:
    enabled: false
    issuerRef:
      name: ""
      kind: ClusterIssuer

# =============================================================================
# Authentication Backend
# "Just use our built-in auth" said no enterprise customer ever.
# =============================================================================
auth:
  provider: "builtin"  # builtin | oidc | saml | ldap
  oidc:
    issuerUrl: ""
    clientId: ""
    clientSecretRef:
      name: ""
      key: ""
  ldap:
    host: ""
    port: 636
    baseDN: ""
    bindDN: ""
    bindPasswordRef:
      name: ""
      key: ""
    userSearchFilter: "(uid={0})"
    groupSearchFilter: "(member={0})"
  saml:
    metadataUrl: ""
    entityId: ""
    assertionConsumerServiceUrl: ""

# =============================================================================
# Database
# Default to bundled, but always support external.
# =============================================================================
postgresql:
  enabled: true  # Set to false to use external database
  auth:
    postgresPassword: ""  # MUST be set
    database: oikonex
  primary:
    persistence:
      size: 20Gi
      storageClass: ""  # Let the customer decide

externalDatabase:
  host: ""
  port: 5432
  database: "oikonex"
  existingSecret: ""  # Secret containing 'password' key
  username: "oikonex"

# =============================================================================
# Feature Flags
# Let customers turn off things they didn't pay for
# or things that don't work in their environment
# =============================================================================
features:
  analytics: true
  advancedReporting: false
  aiAssistant: false  # Requires internet access -- not available air-gapped

That proxy section? It took exactly one bad deployment to a major financial institution to teach us that lesson. Eleven tickets. We now have a Slack emoji named :proxy-trauma:.

The Private Registry Saga

Setting up a private registry sounds straightforward. It is not.

We went with Harbor. It's open source, it does image scanning with Trivy, it handles replication, and it has a UI that doesn't make you want to throw your laptop. Here's the abbreviated version of our journey:

Week 1: Set up Harbor. Everything works. We feel smart.

Week 2: Someone pushes a 4.2 GB image because they included the entire build context in their Docker image. Our registry storage costs double overnight. We add image size limits and a pre-push hook that rejects anything over 500MB.

Week 3: Discover that our Trivy vulnerability scanner is flagging CVEs in the base OS layer of Alpine. We have 200+ "critical" vulnerabilities, most of which are false positives for our use case. We spend two days configuring an allowlist.

Week 4: A customer asks for images signed with cosign. We didn't know what cosign was three days ago. Now we do.

# Harbor configuration excerpt (helmfile)
# This is the config we actually ship with
harbor:
  expose:
    type: ingress
    tls:
      enabled: true
      certSource: secret
      secret:
        secretName: harbor-tls
  persistence:
    persistentVolumeClaim:
      registry:
        size: 100Gi  # We started at 20Gi. LOL.
      database:
        size: 5Gi
  trivy:
    enabled: true
    severity: "CRITICAL,HIGH"  # Medium and below = noise
  # Image signing with cosign
  notary:
    enabled: false  # Deprecated in favor of cosign

Air-Gapped Image Bundling: The Real Script

For air-gapped customers -- and you will have them -- you need to ship your entire image set as a transferable bundle. Here's the script we actually use in production. It's not pretty, but it works, and that's what matters when someone's operations team is loading your software onto a USB drive in a SCIF.

#!/usr/bin/env bash
# bundle-images.sh - Create an air-gapped image bundle
# Usage: ./bundle-images.sh v2.5.0

set -euo pipefail

VERSION="${1:?Usage: $0 <version>}"
BUNDLE_DIR="oikonex-airgap-${VERSION}"
REGISTRY="registry.oikonex.io"

# Every image needed for a complete installation
IMAGES=(
  "${REGISTRY}/platform/api:${VERSION}"
  "${REGISTRY}/platform/web:${VERSION}"
  "${REGISTRY}/platform/worker:${VERSION}"
  "${REGISTRY}/platform/migrations:${VERSION}"
  "docker.io/bitnami/postgresql:15.4.0"
  "docker.io/bitnami/redis:7.2.3"
  "docker.io/minio/minio:RELEASE.2024-01-18T22-51-28Z"
  "quay.io/keycloak/keycloak:23.0.3"
  "docker.io/library/busybox:1.36"  # For init containers
)

echo "=== Oikonex Air-Gap Bundle Builder ==="
echo "Version: ${VERSION}"
echo "Images: ${#IMAGES[@]}"
echo ""

mkdir -p "${BUNDLE_DIR}/images"

# Pull and save each image
for img in "${IMAGES[@]}"; do
  echo "[pull] ${img}"
  docker pull "${img}" --quiet

  # Create a filesystem-safe filename
  safe_name=$(echo "${img}" | tr '/:' '_')
  echo "[save] ${img} -> ${safe_name}.tar"
  docker save "${img}" -o "${BUNDLE_DIR}/images/${safe_name}.tar"
done

# Include the Helm chart
echo "[helm] Packaging chart..."
helm package ./charts/oikonex-platform \
  --version "${VERSION}" \
  --destination "${BUNDLE_DIR}/"

# Include the loader script
cat > "${BUNDLE_DIR}/load-images.sh" << 'LOADER'
#!/usr/bin/env bash
# load-images.sh - Load images into a customer's private registry
# Usage: ./load-images.sh <target-registry>

set -euo pipefail

TARGET_REGISTRY="${1:?Usage: $0 <target-registry>}"

echo "Loading images into ${TARGET_REGISTRY}..."

for tarball in images/*.tar; do
  echo "[load] ${tarball}"
  # Load the image and capture its name
  loaded=$(docker load -i "${tarball}" | grep "Loaded image" | awk '{print $NF}')

  # Retag for the target registry
  original_name=$(echo "${loaded}" | cut -d'/' -f2-)
  new_tag="${TARGET_REGISTRY}/${original_name}"

  echo "[tag]  ${loaded} -> ${new_tag}"
  docker tag "${loaded}" "${new_tag}"

  echo "[push] ${new_tag}"
  docker push "${new_tag}"
done

echo ""
echo "All images loaded. Update your Helm values:"
echo "  image.repository: ${TARGET_REGISTRY}/platform/api"
echo ""
LOADER
chmod +x "${BUNDLE_DIR}/load-images.sh"

# Include checksums for verification
echo "[checksum] Generating SHA256 checksums..."
cd "${BUNDLE_DIR}"
sha256sum images/*.tar > checksums.sha256
cd ..

# Create the final tarball
echo "[bundle] Creating archive..."
tar czf "${BUNDLE_DIR}.tar.gz" "${BUNDLE_DIR}/"

FINAL_SIZE=$(du -h "${BUNDLE_DIR}.tar.gz" | cut -f1)
echo ""
echo "=== Bundle Complete ==="
echo "File: ${BUNDLE_DIR}.tar.gz"
echo "Size: ${FINAL_SIZE}"
echo ""
echo "Ship this to the customer. Preferably not via email."

That last line isn't a joke. Someone tried to email a 2 GB tarball to a customer once. Outlook was... unhappy.

The Documentation Tax

Here's the part nobody warns you about: enterprise software requires enterprise documentation. And we're not talking about a nice README with some ASCII art.

Your enterprise buyer's Change Advisory Board (CAB) needs to approve every piece of software that enters their environment. To get that approval, they need:

  1. Architecture diagrams -- not your "boxes and arrows in Excalidraw" diagrams. Real diagrams with data flow, network boundaries, and security zones.
  2. A 20-40 page installation runbook -- step-by-step, screenshot-by-screenshot, "click the button that says OK" level detail.
  3. Security documentation -- SBOM (Software Bill of Materials), vulnerability scan results, encryption-at-rest and in-transit details, network policy descriptions.
  4. Disaster recovery procedures -- backup frequency, RTO, RPO, tested restore procedures.
  5. Capacity planning guide -- "for 1,000 users you need X nodes with Y CPU and Z RAM."

We spent more time writing documentation for our first enterprise customer than we spent writing the actual Helm chart. This is normal. Budget for it.

Pro tip: Generate as much as you can. We auto-generate our SBOM from docker sbom, our config reference from values.schema.json, and our architecture diagrams from our actual Kubernetes manifests using a custom script that outputs Mermaid diagrams. The less you hand-write, the less gets out of date.

Health Checks and Observability: The /healthz Incident

The first time we shipped to an enterprise customer without a /healthz endpoint, their monitoring team filed a P1 incident at 3 AM. The incident title was: "New vendor application: status unknown."

They weren't wrong. They literally could not tell if our application was alive or dead. Their Nagios dashboard showed a gray box where our service should have been. Gray means "unknown." Unknown, in enterprise operations, means "assume the worst."

We now ship with three distinct health endpoints:

// health.controller.ts
// Three endpoints. Three purposes. No ambiguity.

// /healthz -- Liveness probe
// "Is the process alive?" -- If this fails, Kubernetes restarts the pod.
// Keep it simple. Don't check dependencies here.
app.get('/healthz', (req, res) => {
  res.status(200).json({
    status: 'ok',
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || 'unknown',
  });
});

// /ready -- Readiness probe
// "Can this pod handle traffic?" -- If this fails, it's removed from
// the Service's endpoint list. Check your critical dependencies here.
app.get('/ready', async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    migrations: await checkMigrationsApplied(),
  };

  const allHealthy = Object.values(checks).every(c => c.status === 'ok');

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? 'ok' : 'degraded',
    checks,
    timestamp: new Date().toISOString(),
  });
});

// /metrics -- Prometheus metrics endpoint
// Enterprise monitoring teams expect this. Don't make them ask.
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

And in the Helm chart:

# templates/deployment.yaml (excerpt)
containers:
  - name: {{ .Chart.Name }}
    image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
    ports:
      - name: http
        containerPort: 8080
        protocol: TCP
    livenessProbe:
      httpGet:
        path: /healthz
        port: http
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 3
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: http
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3
    # Startup probe -- gives slow-starting apps time to initialize
    # without being killed by the liveness probe
    startupProbe:
      httpGet:
        path: /healthz
        port: http
      failureThreshold: 30
      periodSeconds: 10

Structured logging is the other thing enterprise ops teams will ask about on day one. If your logs look like [INFO] thing happened, you will get a support ticket that's politely worded but translates to "we can't feed your logs into Splunk."

// Use structured JSON logging. Every field queryable. No regex required.
logger.info({
  event: 'request.completed',
  method: req.method,
  path: req.path,
  statusCode: res.statusCode,
  duration_ms: Date.now() - startTime,
  requestId: req.headers['x-request-id'],
  userId: req.user?.id,
  traceId: req.headers['x-trace-id'],
});

The Battle-Tested Packaging Checklist

After 14 iterations, 3 enterprise customers, and one incident involving a RHEL 7 cluster that we do not discuss in polite company (they wanted us to support RHEL 7. In 2026. We negotiated them up to RHEL 8 and considered it a diplomatic victory), here's our checklist:

Container Images

  • All components containerized with minimal base images (Alpine or distroless)
  • Specific image tags, never :latest
  • Images run as non-root user
  • No secrets baked into images (you'd be surprised how often this happens)
  • SBOM generated for every image
  • Images scanned for CVEs with Trivy or Grype
  • Multi-architecture builds if needed (amd64 + arm64)

Helm Chart

  • Comprehensive values.yaml with comments explaining every option
  • values.schema.json for validation (catches misconfigurations at install time, not runtime)
  • All images configurable (repository, tag, pull policy, pull secrets)
  • Resource requests and limits configurable
  • Proxy settings exposed (HTTP_PROXY, HTTPS_PROXY, NO_PROXY)
  • Ingress fully configurable (className, annotations, TLS)
  • External database support (not just bundled)
  • Auth backend configurable (OIDC, SAML, LDAP)
  • Network policies included
  • Pod Disruption Budgets for HA
  • Helm test included (helm test my-release)
  • Tested on: EKS, GKE, AKS, OpenShift, RKE2, k3s

Observability

  • /healthz liveness endpoint
  • /ready readiness endpoint
  • Prometheus /metrics endpoint
  • Structured JSON logging with request IDs
  • Grafana dashboard JSON included (customers love this)

Documentation

  • Architecture overview with diagrams
  • Prerequisites (K8s version, node specs, storage classes)
  • Step-by-step installation guide
  • Configuration reference (generated from values.schema.json)
  • Upgrade procedures (including breaking changes)
  • Backup and restore procedures
  • Troubleshooting guide with common issues
  • Capacity planning guide
  • Security documentation and SBOM

Distribution

  • Private registry with access controls
  • Air-gapped bundle with loader script
  • SHA256 checksums for all artifacts
  • Signed images (cosign)
  • Release notes with every version

What We Didn't Cover (But You'll Need Eventually)

This guide gets you from "we're a SaaS" to "we can ship to enterprises." But there's a whole second mountain after this one:

  • Licensing and entitlements -- How do you enforce seat limits when the software runs on their hardware? (Hint: it's harder than you think, and we wrote a whole separate post about it.)
  • Update delivery -- How do customers get new versions? How do you handle breaking changes?
  • Remote support -- How do you debug issues in an environment you can't access?
  • Multi-tenancy -- Can multiple teams share one installation? Should they?

The Honest Truth

Turning a SaaS into enterprise software is not a side project. It's not a hackathon. It's a strategic bet that your software is valuable enough that large organizations will go through procurement, security review, and change advisory boards to deploy it.

The good news? If you're reading this, it probably is. Those Fortune 500 CISOs don't waste time evaluating software they don't want to buy.

The first time you see your application running inside a customer's data center, monitored by their NOC, serving their users on their infrastructure -- it's a genuinely great feeling. Like watching your kid ride a bike for the first time, except the kid is a Kubernetes pod and the bike is a hardened RHEL 8 node with SELinux enforcing.

Start with one customer. Build just enough to close that deal. Then do it all over again, slightly better, for the next one. That's the whole game.

Enterprise DistributionKubernetesSaaS

Stay in the Loop

Get the latest insights on cloud migration, Kubernetes, and enterprise distribution delivered to your inbox.