Beta ProductSQL Proxy is currently in beta. Features and APIs may change. Contact your Datafold representative for access.
What is SQL Proxy?
SQL Proxy is a middleware that routes SQL queries to different compute resources based on query characteristics. It works with any tool that connects via ODBC/JDBC, including BI tools and dbt.
Supported platforms:
- Databricks (available now)
- Snowflake (coming soon)
The Problem
Without intelligent routing, each system typically connects to a dedicated warehouse sized for peak load—wasting compute on small queries or causing spills on large ones.
Oversized warehouses waste compute. Undersized warehouses spill to disk, hurting performance.
With Datafold SQL Proxy
Queries are routed to appropriately-sized compute. Large warehouses only spin up for truly large queries. Small workloads run on cheaper compute.
Key points:
- Supports passthrough auth (your Databricks credentials) or managed auth (proxy tokens)
- Datafold uses a separate admin account to manage infrastructure
Routing Modes
| Mode | Description |
|---|
| Explicit Routing | Control routing via @datafold: SQL comments |
| Smart Routing | ML-based automatic warehouse selection |
Authentication
SQL Proxy supports two authentication modes:
| Method | Mode | Description |
|---|
| PAT | Passthrough | Databricks Personal Access Token forwarded to Databricks |
| M2M OAuth | Passthrough | Databricks service principal credentials forwarded to Databricks |
| Proxy Token | Managed | Token issued by proxy for registered principals |
Passthrough mode forwards your credentials directly to Databricks. Managed mode stores Databricks credentials in the proxy, and users authenticate with proxy tokens.
See Authentication for setup details.
Networking Requirements
SQL Proxy is deployed in Datafold’s infrastructure. Your data platform must allow inbound connections from Datafold’s IP ranges.
Required Databricks Access:
- SQL Warehouse connectivity (port 443)
- Jobs API access (for jobs compute routing)
- Unity Catalog access (if used)
Quick Start
- Configure networking - allowlist Datafold IPs in your Databricks workspace
- Register warehouses - add your Databricks warehouses via the Admin API, optionally configure jobs compute
- Register principals - create users/service principals with their Databricks credentials via the Admin API
- Generate tokens - create proxy tokens for principals via the Tokens API
- Update connection - point dbt/BI tools to SQL Proxy endpoint
- Test connectivity - run
dbt debug or a simple query
- Add annotations - use
@datafold: directives for routing control
# dbt profiles.yml
my_project:
outputs:
prod:
type: databricks
host: sqlproxy.your-company.datafold.com
http_path: /sql/1.0/warehouses/proxy
token: "{{ env_var('PAT_OR_PROXY_TOKEN') }}" # Databricks PAT or proxy token
See dbt Integration for complete setup instructions.