Zee - Blog
Published on

Visualization service - Setup Superset on K8s

Authors
  • avatar
    Name
    Zee Lu
    Twitter

Deploying Apache Superset on Kubernetes with PostgreSQL Support

This tutorial covers the complete process of deploying Apache Superset on a Kubernetes cluster with PostgreSQL database driver support. We'll walk through the challenges we faced and how we solved them.

NOTE

This is a continuation of the previous post Visualization service - Setup K8s on Digital Ocean.

Prerequisites

  • Kubernetes cluster (we used DigitalOcean Kubernetes)
  • kubectl configured to access your cluster
  • Domain name with DNS access
  • NGINX Ingress Controller installed
  • cert-manager for SSL certificates

Overview

We'll deploy Superset using custom Kubernetes manifests instead of the official Helm chart, which gives us more control over the configuration and allows us to add PostgreSQL support.

Step 1: Create the Superset Namespace and Secret

First, let's create a dedicated namespace and secret for Superset:

yaml
# superset-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: superset
---
apiVersion: v1
kind: Secret
metadata:
  name: superset-secret
  namespace: superset
type: Opaque
data:
  SUPERSET_SECRET_KEY: # your-secret-key-change-this-in-production

Step 2: Create the Superset Deployment

The deployment is the most complex part. Here's our configuration:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: superset
  namespace: superset
spec:
  replicas: 1
  selector:
    matchLabels:
      app: superset
  template:
    metadata:
      labels:
        app: superset
    spec:
      containers:
      - name: superset
        image: apache/superset:latest
        ports:
        - containerPort: 8088
        env:
        - name: SUPERSET_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: superset-secret
              key: SUPERSET_SECRET_KEY
        - name: X_FRAME_OPTIONS
          value: "ALLOWALL"
        command: ["/bin/bash"]
        args:
          - -c
          - |
            # Install PostgreSQL driver in user directory
            pip install --user psycopg2-binary
            # Add user site-packages to Python path
            export PYTHONPATH="/app/superset_home/.local/lib/python3.10/site-packages:$PYTHONPATH"
            # Initialize Superset
            superset db upgrade
            superset fab create-admin \
              --username admin \
              --firstname Admin \
              --lastname User \
              --email your-email@example.com \
              --password your-password-change-this-in-production
            superset init
            # Start Superset
            superset run -h 0.0.0.0 -p 8088 --with-threads --reload --debugger
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "200m"

Step 3: Create the Service

yaml
apiVersion: v1
kind: Service
metadata:
  name: superset-service
  namespace: superset
spec:
  selector:
    app: superset
  ports:
  - port: 8088
    targetPort: 8088
  type: ClusterIP

Step 4: Create the Ingress

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: superset-ingress
  namespace: superset
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - superset.do.zeelu.me
    secretName: superset-tls
  rules:
  - host: superset.do.zeelu.me
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: superset-service
            port:
              number: 8088

Step 5: Apply the Configuration

bash
kubectl apply -f superset-deployment.yaml

Challenges and Solutions

Challenge 1: PostgreSQL Driver Missing

Problem: When trying to connect to PostgreSQL databases, Superset showed the error:

js
ERROR: Could not load database driver: PostgresEngineSpec

Root Cause: The default Superset Docker image doesn't include the PostgreSQL driver (psycopg2).

Solution: We modified the deployment to install the PostgreSQL driver during container startup:

bash
# Install PostgreSQL driver in user directory
pip install --user psycopg2-binary
# Add user site-packages to Python path
export PYTHONPATH="/app/superset_home/.local/lib/python3.10/site-packages:$PYTHONPATH"

Challenge 2: Permission Issues

Problem: When trying to install the PostgreSQL driver, we encountered permission errors:

js
PermissionError: [Errno 13] Permission denied: '/app/.venv/lib/python3.10/site-packages/psycopg2_binary.libs'

Root Cause: The Superset container runs as a non-root user (superset) for security, but the virtual environment directory is owned by root.

Solution: We installed the driver in the user directory using pip install --user and added the user site-packages to the Python path.

Challenge 3: Resource Constraints

Problem: The initial deployment failed with insufficient resources:

js
Insufficient cpu, 1 Insufficient memory

Root Cause: The single-node DigitalOcean cluster had limited resources.

Solution: We reduced the resource requests and limits:

yaml
resources:
  requests:
    memory: "256Mi"
    cpu: "100m"
  limits:
    memory: "512Mi"
    cpu: "200m"

Challenge 4: SSL Certificate Issues

Problem: SSL certificates were taking a long time to be issued.

Root Cause: DNS propagation delays - the cluster's DNS resolver couldn't resolve the new domain immediately.

Solution: We waited for DNS propagation and used HTTP access temporarily while the certificate was being issued.

Challenge 5: Iframe Embedding Blocked

Problem: When trying to embed Superset in an iframe, browsers showed the error:

js
Refused to display 'https://superset.do.zeelu.me/' in a frame because it set 'X-Frame-Options' to 'sameorigin'.

Root Cause: Superset by default sets X-Frame-Options: sameorigin which prevents embedding in iframes from different domains.

Solution: We added the environment variable to allow iframe embedding:

yaml
env:
- name: X_FRAME_OPTIONS
  value: "ALLOWALL"

DNS Configuration

Add an A record in your DNS provider:

  • Name: superset (for superset.do.zeelu.me)
  • Value: Your load balancer IP (e.g., 104.248.105.26)
  • TTL: Default

Accessing Superset

Once deployed, you can access Superset at your domain/ip.

Or you can try out mine at:

  • URL: https://superset.do.zeelu.me
  • Username: guestuser
  • Password: guestuser

Verifying PostgreSQL Support

To verify that PostgreSQL support is working:

  1. Log into Superset
  2. Go to SettingsDatabase Connections
  3. Click + Database
  4. Select PostgreSQL from the database type dropdown
  5. You should now be able to configure PostgreSQL connections without the driver error

Production Considerations

For production deployments, consider:

  1. Security: Change default passwords and use proper secret management
  2. Persistence: Use a proper database (PostgreSQL/MySQL) instead of SQLite
  3. Scaling: Configure multiple replicas and proper resource limits
  4. Monitoring: Add health checks and monitoring
  5. Backup: Implement regular database backups

Troubleshooting

Check Pod Status

bash
kubectl get pods --namespace superset

View Logs

bash
kubectl logs <pod-name> --namespace superset

Check SSL Certificate

bash
kubectl get certificates --namespace superset

Conclusion

Deploying Superset on Kubernetes with PostgreSQL support requires careful attention to container permissions, resource constraints, and DNS configuration. By using custom manifests instead of Helm charts, we gained the flexibility to install additional packages and configure the deployment exactly as needed.

The key was understanding that modern containers run as non-root users and installing packages in the user directory while ensuring the Python path includes the user site-packages directory.