Skip to main content
Beta ProductSQL Proxy is currently in beta. Features and APIs may change. Contact your Datafold representative for access.

What is SQL Proxy?

SQL Proxy is a middleware that routes SQL queries to different compute resources based on query characteristics. It works with any tool that connects via ODBC/JDBC, including BI tools and dbt. Supported platforms:
  • Databricks (available now)
  • Snowflake (coming soon)

The Problem

Without intelligent routing, each system typically connects to a dedicated warehouse sized for peak load—wasting compute on small queries or causing spills on large ones. Oversized warehouses waste compute. Undersized warehouses spill to disk, hurting performance.

With Datafold SQL Proxy

Queries are routed to appropriately-sized compute. Large warehouses only spin up for truly large queries. Small workloads run on cheaper compute. Key points:
  • Supports passthrough auth (your Databricks credentials) or managed auth (proxy tokens)
  • Datafold uses a separate admin account to manage infrastructure

Routing Modes

ModeDescription
Explicit RoutingControl routing via @datafold: SQL comments
Smart RoutingML-based automatic warehouse selection

Authentication

SQL Proxy supports two authentication modes:
MethodModeDescription
PATPassthroughDatabricks Personal Access Token forwarded to Databricks
M2M OAuthPassthroughDatabricks service principal credentials forwarded to Databricks
Proxy TokenManagedToken issued by proxy for registered principals
Passthrough mode forwards your credentials directly to Databricks. Managed mode stores Databricks credentials in the proxy, and users authenticate with proxy tokens. See Authentication for setup details.

Networking Requirements

SQL Proxy is deployed in Datafold’s infrastructure. Your data platform must allow inbound connections from Datafold’s IP ranges. Required Databricks Access:
  • SQL Warehouse connectivity (port 443)
  • Jobs API access (for jobs compute routing)
  • Unity Catalog access (if used)

Quick Start

  1. Configure networking - allowlist Datafold IPs in your Databricks workspace
  2. Register warehouses - add your Databricks warehouses via the Admin API, optionally configure jobs compute
  3. Register principals - create users/service principals with their Databricks credentials via the Admin API
  4. Generate tokens - create proxy tokens for principals via the Tokens API
  5. Update connection - point dbt/BI tools to SQL Proxy endpoint
  6. Test connectivity - run dbt debug or a simple query
  7. Add annotations - use @datafold: directives for routing control
# dbt profiles.yml
my_project:
  outputs:
    prod:
      type: databricks
      host: sqlproxy.your-company.datafold.com
      http_path: /sql/1.0/warehouses/proxy
      token: "{{ env_var('PAT_OR_PROXY_TOKEN') }}"  # Databricks PAT or proxy token
See dbt Integration for complete setup instructions.