Zee - Blog
Published on

Visualization service - Design serverless function

Authors
  • avatar
    Name
    Zee Lu
    Twitter

Introduction

NOTE

The code is publicly available on github: cf-worker-superset

This blog documents the process of building the serverless function for the visualization service, including design, implementation and challenges. I'm building this not only for a simple backing service, but also for learning and experimenting the concepts of building cloud native components.

Design

The serverless function is deployed on cloudflare workers, about the reason, there are severals:

  • urltra low latency and cold start time comparing to other providers
  • very generous free tier, up to 100,000 requests per day
  • Binding other services this project requires, such as R2 buckets, AI Gateway, etc.

But also, those comes with trade-offs:

  • No native support for rate limitor, api keys, etc.
  • The web ui is not very mature yet.
  • Different url structure, for example, in azure, it comes with something like https://<appname>.azurewebsites.net/api/<function-name>, that we can have the different endpoints for different functions within one application. In cloudflare, things are different, the function name is at subdomain level, such as https://<function-name>.<username>.workers.dev.

This section will introduce the design for different functions, including the purpose, input and output.

Functions

We need only two functions at this early stage, and the rest will come after the MVP is done.

Execute SQL query

This function is used to execute the SQL query directly from the superset, which takes query and give the result accordingly.

endpoint: execute-sql

input:

  • database_id (number, required)
  • sql: must be a single SELECT statement
  • full: if falsey, a row limit is injected/clamped
  • limit: requested limit, clamped to max

output:

  • status: Execution status, e.g. "ok".
  • meta:
    • full: If falsey, a row limit is injected/clamped.
    • effectiveLimit: The applied limit (requested or default), clamped to max.
    • wasClamped: True if a provided limit exceeded the max and was reduced.
  • data:
    • columns: Column descriptors from Superset
    • rows: Result rows.

Visualize the data

This function is used to visualize the data from the superset, which first execute the sql query, create a virtual dataset, then create a chart from that dataset, and return the constructed permalink as an embeddable url.

endpoint: charts/create

input:

  • database_id: the id of the database to execute the query on
  • sql: must be a single SELECT statement
  • viz_type: the type of the visualization
  • slice_name: the name of the slice
  • params: additional explore form data

output:

  • status: Execution status, e.g. "ok".
  • datasource: Datasource slug used to explore, e.g. 12345__query.
  • viz_type: The visualization type.
  • explore_url: URL to open the chart in Explore with saved form data.
  • form_data_key: Key for the Explore form data state.
  • embed_url: Standalone embeddable URL (when a permalink key or full URL is returned, e.g. http://superset.do.zeelu.me/superset/explore/p/2948j2Z8YEo/).
  • meta:
    • full: If falsey, a row limit is injected/clamped.
    • effectiveLimit: The applied limit (requested or default), clamped to max.
    • wasClamped: True if a provided limit exceeded the max and was reduced.

Preperation

In order to achieve these two simple functions, and consider the future development and integration, we need to prepare the following:

Backing services

  • Cloudflare R2: Store logs, images, etc. Make sure setup automatic cleanup policy(by default it's 7 days).

CICD

  • Github Actions: Run tests, build and deploy onto different environments.
  • Set up secrets: API keys, database credentials, etc.

Check out the action history for current worker function.

Collaboration

Set up repository rulesets:

  • No deletion on main branch.
  • All push to main must be made through a pull request.
  • Code linting and vulunerable scan must be passed before merging.
  • dev branch's deployment and testing must be triggered before merging to main.

Implementation

Functions

The implementation follows a modular architecture with clear separation of concerns:

Core Structure

  • Entry Point (src/index.ts): Cloudflare Worker entrypoint that delegates to the router
  • Router (src/routes.ts): Handles request routing and CORS for different endpoints
  • Authentication (src/utils/auth.ts): Manages Superset session-based authentication
  • SQL Execution (src/utils/executeSql.ts): Handles SQL query execution with limit management
  • Chart Creation (src/utils/createChart.ts): Creates visualizations from SQL queries
  • Response Utils (src/utils/response.ts): Standardized response formatting

Key Features

  • Session-based Authentication: Implements web-based login flow to obtain CSRF tokens and session cookies
  • SQL Safety: Only allows single-statement SELECT queries with automatic limit injection/clamping
  • Error Handling: Comprehensive error handling with meaningful error messages
  • CORS Support: Full CORS headers for cross-origin requests

Tests

The testing strategy covers multiple aspects of the service:

Test Categories

  • Authentication Tests (tests/auth/): Validates Superset authentication flow
  • Health Checks (tests/health/): Ensures service availability
  • SQL Execution Tests (tests/execute-sql/): Tests query execution functionality
  • Browser Screenshot Tests (tests/browser-screenshot/): Visual validation of chart generation
  • Integration Tests (tests/index/): End-to-end API testing

Test Infrastructure

  • HTTP Client (tests/utils/http.ts): Reusable HTTP testing utilities
  • Browser Automation (tests/browser-screenshot/utils/browser.ts): Puppeteer-based screenshot testing
  • Test Runner (tests/runner.ts): Centralized test execution

CI/CD Integration

Tests run automatically on every push and pull request, ensuring code quality and preventing regressions.

Workflows

GitHub Actions Pipeline

The CI/CD pipeline includes several key workflows:

Browser Screenshot Testing

  • Trigger: Manual or scheduled runs
  • Purpose: Visual regression testing of chart generation
  • Features:
    • Screenshot comparison for temperature and weather code charts
    • Automatic cleanup of old screenshots
    • Integration with Cloudflare Workers deployment

Deployment Workflow

  • Environment Management: Separate deployments for dev and main branches
  • Secret Management: Secure handling of Superset credentials and API keys
  • Health Checks: Post-deployment validation

Environment Configuration

  • Development: dev branch with testing environment
  • Production: main branch with production Superset instance
  • Secrets: Environment-specific configuration for database connections and authentication

Monitoring and Logging

  • Cloudflare Analytics: Built-in request monitoring
  • R2 Storage: Log persistence and screenshot storage

Challenges and Solutions

Authentication Complexity

Challenge: Superset's session-based authentication requires multiple steps and CSRF token management.

Solution: Implemented a robust authentication flow that:

  • Handles login page extraction
  • Manages session cookies
  • Obtains CSRF tokens from API endpoints
  • Provides fallback mechanisms for token refresh

Cross-Origin Requests

Challenge: Enabling frontend applications to consume the API.

Solution: Comprehensive CORS headers and preflight request handling.

Future plan

  • Implement rate limiting and Api key generation/validation use Cloudflare KV
  • Add response caching for frequently accessed queries
  • Add data import via superset api

Conclusion

This serverless function provides a robust foundation for data visualization services, balancing simplicity with extensibility. The modular architecture and comprehensive testing ensure reliability while the Cloudflare Workers platform delivers excellent performance and cost-effectiveness.

The implementation demonstrates key principles of cloud-native development:

  • Stateless Design: No server-side state management
  • Event-Driven Architecture: Request-response pattern with clear interfaces
  • Security First: Authentication, input validation, and CORS protection
  • Observability: Comprehensive logging and monitoring
  • Testing: Multi-layered testing strategy from unit to integration tests