SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can run and deploy data transformations written in SQL or Python with visibility and control at any size.
It is more than just a dbt alternative.
Core Features

Get instant SQL impact and context of your changes, both in the CLI and in the SQLMesh VSCode Extension
Virtual Data Environments
- See a full diagram of how Virtual Data Environments work
Create isolated development environments without data warehouse costs
- Plan / Apply workflow like Terraform to understand potential impact of changes
- Easy to use CI/CD bot for true blue-green deployments
Efficiency and Testing
Running this command will generate a unit test file in the tests/ folder: test_stg_payments.yaml
Runs a live query to generate the expected output of the model
sqlmesh create_test tcloud_demo.stg_payments --query tcloud_demo.seed_raw_payments "select * from tcloud_demo.seed_raw_payments limit 5"
# run the unit test
sqlmesh test
MODEL (
name tcloud_demo.stg_payments,
cron '@daily',
grain payment_id,
audits (UNIQUE_VALUES(columns = (
payment_id
)), NOT_NULL(columns = (
payment_id
)))
);
SELECT
id AS payment_id,
order_id,
payment_method,
amount / 100 AS amount, /* `amount` is currently stored in cents, so we convert it to dollars */
'new_column' AS new_column, /* non-breaking change example */
FROM tcloud_demo.seed_raw_payments
test_stg_payments:
model: tcloud_demo.stg_payments
inputs:
tcloud_demo.seed_raw_payments:
- id: 66
order_id: 58
payment_method: coupon
amount: 1800
- id: 27
order_id: 24
payment_method: coupon
amount: 2600
- id: 30
order_id: 25
payment_method: coupon
amount: 1600
- id: 109
order_id: 95
payment_method: coupon
amount: 2400
- id: 3
order_id: 3
payment_method: coupon
amount: 100
outputs:
query:
- payment_id: 66
order_id: 58
payment_method: coupon
amount: 18.0
new_column: new_column
- payment_id: 27
order_id: 24
payment_method: coupon
amount: 26.0
new_column: new_column
- payment_id: 30
order_id: 25
payment_method: coupon
amount: 16.0
new_column: new_column
- payment_id: 109
order_id: 95
payment_method: coupon
amount: 24.0
new_column: new_column
- payment_id: 3
order_id: 3
payment_method: coupon
amount: 1.0
new_column: new_column
- Never build a table more than once
- Track what data’s been modified and run only the necessary transformations for incremental models
- Run unit tests for free and configure automated audits
- Run table diffs between prod and dev based on tables/views impacted by a change
Level Up Your SQL
Write SQL in any dialect and SQLMesh will transpile it to your target SQL dialect on the fly before sending it to the warehouse.
- Debug transformation errors before you run them in your warehouse in 10+ different SQL dialects
- Definitions using simply SQL (no need for redundant and confusing
Jinja+YAML) - See impact of changes before you run them in your warehouse with column-level lineage
For more information, check out the website and documentation.
Getting Started
Install SQLMesh through pypi by running:
mkdir sqlmesh-example
cd sqlmesh-example
python -m venv .venv
source .venv/bin/activate
pip install 'sqlmesh[lsp]' # install the sqlmesh package with extensions to work with VSCode
source .venv/bin/activate # reactivate the venv to ensure you're using the right installation
sqlmesh init # follow the prompts to get started (choose DuckDB)
Note: You may need to run
python3orpip3instead ofpythonorpip, depending on your python installation.
Windows Installation
mkdir sqlmesh-example
cd sqlmesh-example
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install 'sqlmesh[lsp]' # install the sqlmesh package with extensions to work with VSCode
.\.venv\Scripts\Activate.ps1 # reactivate the venv to ensure you're using the right installation
sqlmesh init # follow the prompts to get started (choose DuckDB)
Follow the quickstart guide to learn how to use SQLMesh. You already have a head start!
Follow the crash course to learn the core movesets and use the easy to reference cheat sheet.
Follow this example to learn how to use SQLMesh in a full walkthrough.
Join Our Community
Together, we want to build data transformation without the waste. Connect with us in the following ways:
- Join the Tobiko Slack Community to ask questions, or just to say hi!
- File an issue on our GitHub
- Send us an email at hello@tobikodata.com with your questions or feedback
- Read our blog
Contribution
Contributions in the form of issues or pull requests (from fork) are greatly appreciated.
Read more on how to contribute to SQLMesh open source.
Watch this video walkthrough to see how our team contributes a feature to SQLMesh.
1# ruff: noqa: E402 2""" 3.. include:: ../README.md 4""" 5 6from __future__ import annotations 7 8import glob 9import logging 10import os 11import sys 12import typing as t 13from datetime import datetime 14from enum import Enum 15from pathlib import Path 16 17from sqlmesh.core.dialect import extend_sqlglot 18 19extend_sqlglot() 20 21from sqlmesh.core import constants as c 22from sqlmesh.core.config import Config as Config 23from sqlmesh.core.context import Context as Context, ExecutionContext as ExecutionContext 24from sqlmesh.core.engine_adapter import EngineAdapter as EngineAdapter 25from sqlmesh.core.macros import SQL as SQL, macro as macro 26from sqlmesh.core.model import Model as Model, model as model 27from sqlmesh.core.signal import signal as signal 28from sqlmesh.core.snapshot import Snapshot as Snapshot 29from sqlmesh.core.snapshot.evaluator import ( 30 CustomMaterialization as CustomMaterialization, 31) 32from sqlmesh.core.model.kind import CustomKind as CustomKind 33from sqlmesh.utils import ( 34 debug_mode_enabled as debug_mode_enabled, 35 enable_debug_mode as enable_debug_mode, 36 str_to_bool, 37) 38from sqlmesh.utils.date import DatetimeRanges as DatetimeRanges 39 40try: 41 from sqlmesh._version import __version__ as __version__, __version_tuple__ as __version_tuple__ 42except ImportError: 43 pass 44 45 46if t.TYPE_CHECKING: 47 from sqlmesh.core.engine_adapter._typing import QueryOrDF # noqa: F401 48 49 50class RuntimeEnv(str, Enum): 51 """Enum defining what environment SQLMesh is running in.""" 52 53 TERMINAL = "terminal" 54 DATABRICKS = "databricks" 55 GOOGLE_COLAB = "google_colab" # Not currently officially supported 56 JUPYTER = "jupyter" 57 DEBUGGER = "debugger" 58 CI = "ci" # CI or other envs that shouldn't use emojis 59 60 @classmethod 61 def get(cls) -> RuntimeEnv: 62 """Get the console class to use based on the environment that the code is running in 63 Reference implementation: https://github.com/noklam/rich/blob/d3a1ae61a77d934844563514370084971bc3e143/rich/console.py#L511-L528 64 65 Unlike the rich implementation we try to split out by notebook type instead of treating it all as Jupyter. 66 """ 67 runtime_env_var = os.getenv("SQLMESH_RUNTIME_ENVIRONMENT") 68 if runtime_env_var: 69 try: 70 return RuntimeEnv(runtime_env_var) 71 except ValueError: 72 valid_values = [f'"{member.value}"' for member in RuntimeEnv] 73 raise ValueError( 74 f"Invalid SQLMESH_RUNTIME_ENVIRONMENT value: {runtime_env_var}. Must be one of {', '.join(valid_values)}." 75 ) 76 77 try: 78 shell = get_ipython() # type: ignore 79 if os.getenv("DATABRICKS_RUNTIME_VERSION"): 80 return RuntimeEnv.DATABRICKS 81 if "google.colab" in str(shell.__class__): # type: ignore 82 return RuntimeEnv.GOOGLE_COLAB 83 if shell.__class__.__name__ == "ZMQInteractiveShell": # type: ignore 84 return RuntimeEnv.JUPYTER 85 except NameError: 86 pass 87 88 if debug_mode_enabled(): 89 return RuntimeEnv.DEBUGGER 90 91 if is_cicd_environment() or not is_interactive_environment(): 92 return RuntimeEnv.CI 93 94 return RuntimeEnv.TERMINAL 95 96 @property 97 def is_terminal(self) -> bool: 98 return self == RuntimeEnv.TERMINAL 99 100 @property 101 def is_databricks(self) -> bool: 102 return self == RuntimeEnv.DATABRICKS 103 104 @property 105 def is_jupyter(self) -> bool: 106 return self == RuntimeEnv.JUPYTER 107 108 @property 109 def is_google_colab(self) -> bool: 110 return self == RuntimeEnv.GOOGLE_COLAB 111 112 @property 113 def is_ci(self) -> bool: 114 return self == RuntimeEnv.CI 115 116 @property 117 def is_notebook(self) -> bool: 118 return not self.is_terminal and not self.is_ci 119 120 121def is_cicd_environment() -> bool: 122 for key in ("CI", "GITHUB_ACTIONS", "TRAVIS", "CIRCLECI", "GITLAB_CI", "BUILDKITE"): 123 if str_to_bool(os.environ.get(key, "false")): 124 return True 125 return False 126 127 128def is_interactive_environment() -> bool: 129 if sys.stdin is None or sys.stdout is None: 130 return False 131 return sys.stdin.isatty() and sys.stdout.isatty() 132 133 134if RuntimeEnv.get().is_notebook: 135 try: 136 from sqlmesh.magics import register_magics 137 138 register_magics() 139 except ImportError: 140 pass 141 142 143LOG_FORMAT = "%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)" 144LOG_FILENAME_PREFIX = "sqlmesh_" 145 146 147# SO: https://stackoverflow.com/questions/384076/how-can-i-color-python-logging-output 148class CustomFormatter(logging.Formatter): 149 """Custom logging formatter.""" 150 151 grey = "\x1b[38;20m" 152 yellow = "\x1b[33;20m" 153 red = "\x1b[31;20m" 154 bold_red = "\x1b[31;1m" 155 reset = "\x1b[0m" 156 157 FORMATS = { 158 logging.DEBUG: grey + LOG_FORMAT + reset, 159 logging.INFO: grey + LOG_FORMAT + reset, 160 logging.WARNING: yellow + LOG_FORMAT + reset, 161 logging.ERROR: red + LOG_FORMAT + reset, 162 logging.CRITICAL: bold_red + LOG_FORMAT + reset, 163 } 164 165 def format(self, record: logging.LogRecord) -> str: 166 log_fmt = self.FORMATS.get(record.levelno) 167 formatter = logging.Formatter(log_fmt) 168 return formatter.format(record) 169 170 171def remove_excess_logs( 172 log_file_dir: t.Optional[t.Union[str, Path]] = None, 173 log_limit: int = c.DEFAULT_LOG_LIMIT, 174) -> None: 175 if log_limit <= 0: 176 return 177 178 log_file_dir = log_file_dir or c.DEFAULT_LOG_FILE_DIR 179 log_path_prefix = Path(log_file_dir) / LOG_FILENAME_PREFIX 180 181 for path in list(sorted(glob.glob(f"{log_path_prefix}*.log"), reverse=True))[log_limit:]: 182 os.remove(path) 183 184 185def configure_logging( 186 force_debug: bool = False, 187 write_to_stdout: bool = False, 188 write_to_file: bool = True, 189 log_file_dir: t.Optional[t.Union[str, Path]] = None, 190 ignore_warnings: bool = False, 191 log_level: t.Optional[t.Union[str, int]] = None, 192) -> None: 193 # Remove noisy grpc logs that are not useful for users 194 os.environ["GRPC_VERBOSITY"] = os.environ.get("GRPC_VERBOSITY", "NONE") 195 196 logger = logging.getLogger() 197 debug = force_debug or debug_mode_enabled() 198 199 if log_level is not None: 200 if isinstance(log_level, str): 201 level = logging._nameToLevel.get(log_level.upper()) or logging.INFO 202 else: 203 level = log_level 204 else: 205 # base logger needs to be the lowest level that we plan to log 206 level = logging.DEBUG if debug else logging.INFO 207 208 logger.setLevel(level) 209 210 if debug: 211 # Remove noisy snowflake connector logs that are not useful for users 212 logging.getLogger("snowflake.connector").setLevel(logging.INFO) 213 214 if write_to_stdout: 215 stdout_handler = logging.StreamHandler(sys.stdout) 216 stdout_handler.setFormatter(CustomFormatter()) 217 stdout_handler.setLevel(logging.ERROR if ignore_warnings else level) 218 logger.addHandler(stdout_handler) 219 220 log_file_dir = log_file_dir or c.DEFAULT_LOG_FILE_DIR 221 log_path_prefix = Path(log_file_dir) / LOG_FILENAME_PREFIX 222 223 if write_to_file: 224 os.makedirs(str(log_file_dir), exist_ok=True) 225 filename = f"{log_path_prefix}{datetime.now().strftime('%Y_%m_%d_%H_%M_%S')}.log" 226 file_handler = logging.FileHandler(filename, mode="w", encoding="utf-8") 227 228 # the log files should always log at least info so that users will always have 229 # minimal info for debugging even if they specify "ignore_warnings" 230 file_handler.setLevel(level) 231 file_handler.setFormatter(logging.Formatter(LOG_FORMAT)) 232 logger.addHandler(file_handler) 233 234 if debug: 235 import faulthandler 236 237 enable_debug_mode() 238 239 # Enable threadumps. 240 faulthandler.enable() 241 242 # Windows doesn't support register so we check for it here 243 if hasattr(faulthandler, "register"): 244 from signal import SIGUSR1 245 246 faulthandler.register(SIGUSR1.value)
51class RuntimeEnv(str, Enum): 52 """Enum defining what environment SQLMesh is running in.""" 53 54 TERMINAL = "terminal" 55 DATABRICKS = "databricks" 56 GOOGLE_COLAB = "google_colab" # Not currently officially supported 57 JUPYTER = "jupyter" 58 DEBUGGER = "debugger" 59 CI = "ci" # CI or other envs that shouldn't use emojis 60 61 @classmethod 62 def get(cls) -> RuntimeEnv: 63 """Get the console class to use based on the environment that the code is running in 64 Reference implementation: https://github.com/noklam/rich/blob/d3a1ae61a77d934844563514370084971bc3e143/rich/console.py#L511-L528 65 66 Unlike the rich implementation we try to split out by notebook type instead of treating it all as Jupyter. 67 """ 68 runtime_env_var = os.getenv("SQLMESH_RUNTIME_ENVIRONMENT") 69 if runtime_env_var: 70 try: 71 return RuntimeEnv(runtime_env_var) 72 except ValueError: 73 valid_values = [f'"{member.value}"' for member in RuntimeEnv] 74 raise ValueError( 75 f"Invalid SQLMESH_RUNTIME_ENVIRONMENT value: {runtime_env_var}. Must be one of {', '.join(valid_values)}." 76 ) 77 78 try: 79 shell = get_ipython() # type: ignore 80 if os.getenv("DATABRICKS_RUNTIME_VERSION"): 81 return RuntimeEnv.DATABRICKS 82 if "google.colab" in str(shell.__class__): # type: ignore 83 return RuntimeEnv.GOOGLE_COLAB 84 if shell.__class__.__name__ == "ZMQInteractiveShell": # type: ignore 85 return RuntimeEnv.JUPYTER 86 except NameError: 87 pass 88 89 if debug_mode_enabled(): 90 return RuntimeEnv.DEBUGGER 91 92 if is_cicd_environment() or not is_interactive_environment(): 93 return RuntimeEnv.CI 94 95 return RuntimeEnv.TERMINAL 96 97 @property 98 def is_terminal(self) -> bool: 99 return self == RuntimeEnv.TERMINAL 100 101 @property 102 def is_databricks(self) -> bool: 103 return self == RuntimeEnv.DATABRICKS 104 105 @property 106 def is_jupyter(self) -> bool: 107 return self == RuntimeEnv.JUPYTER 108 109 @property 110 def is_google_colab(self) -> bool: 111 return self == RuntimeEnv.GOOGLE_COLAB 112 113 @property 114 def is_ci(self) -> bool: 115 return self == RuntimeEnv.CI 116 117 @property 118 def is_notebook(self) -> bool: 119 return not self.is_terminal and not self.is_ci
Enum defining what environment SQLMesh is running in.
61 @classmethod 62 def get(cls) -> RuntimeEnv: 63 """Get the console class to use based on the environment that the code is running in 64 Reference implementation: https://github.com/noklam/rich/blob/d3a1ae61a77d934844563514370084971bc3e143/rich/console.py#L511-L528 65 66 Unlike the rich implementation we try to split out by notebook type instead of treating it all as Jupyter. 67 """ 68 runtime_env_var = os.getenv("SQLMESH_RUNTIME_ENVIRONMENT") 69 if runtime_env_var: 70 try: 71 return RuntimeEnv(runtime_env_var) 72 except ValueError: 73 valid_values = [f'"{member.value}"' for member in RuntimeEnv] 74 raise ValueError( 75 f"Invalid SQLMESH_RUNTIME_ENVIRONMENT value: {runtime_env_var}. Must be one of {', '.join(valid_values)}." 76 ) 77 78 try: 79 shell = get_ipython() # type: ignore 80 if os.getenv("DATABRICKS_RUNTIME_VERSION"): 81 return RuntimeEnv.DATABRICKS 82 if "google.colab" in str(shell.__class__): # type: ignore 83 return RuntimeEnv.GOOGLE_COLAB 84 if shell.__class__.__name__ == "ZMQInteractiveShell": # type: ignore 85 return RuntimeEnv.JUPYTER 86 except NameError: 87 pass 88 89 if debug_mode_enabled(): 90 return RuntimeEnv.DEBUGGER 91 92 if is_cicd_environment() or not is_interactive_environment(): 93 return RuntimeEnv.CI 94 95 return RuntimeEnv.TERMINAL
Get the console class to use based on the environment that the code is running in Reference implementation: https://github.com/noklam/rich/blob/d3a1ae61a77d934844563514370084971bc3e143/rich/console.py#L511-L528
Unlike the rich implementation we try to split out by notebook type instead of treating it all as Jupyter.
Inherited Members
- enum.Enum
- name
- value
- builtins.str
- encode
- replace
- split
- rsplit
- join
- capitalize
- casefold
- title
- center
- count
- expandtabs
- find
- partition
- index
- ljust
- lower
- lstrip
- rfind
- rindex
- rjust
- rstrip
- rpartition
- splitlines
- strip
- swapcase
- translate
- upper
- startswith
- endswith
- removeprefix
- removesuffix
- isascii
- islower
- isupper
- istitle
- isspace
- isdecimal
- isdigit
- isnumeric
- isalpha
- isalnum
- isidentifier
- isprintable
- zfill
- format
- format_map
- maketrans
149class CustomFormatter(logging.Formatter): 150 """Custom logging formatter.""" 151 152 grey = "\x1b[38;20m" 153 yellow = "\x1b[33;20m" 154 red = "\x1b[31;20m" 155 bold_red = "\x1b[31;1m" 156 reset = "\x1b[0m" 157 158 FORMATS = { 159 logging.DEBUG: grey + LOG_FORMAT + reset, 160 logging.INFO: grey + LOG_FORMAT + reset, 161 logging.WARNING: yellow + LOG_FORMAT + reset, 162 logging.ERROR: red + LOG_FORMAT + reset, 163 logging.CRITICAL: bold_red + LOG_FORMAT + reset, 164 } 165 166 def format(self, record: logging.LogRecord) -> str: 167 log_fmt = self.FORMATS.get(record.levelno) 168 formatter = logging.Formatter(log_fmt) 169 return formatter.format(record)
Custom logging formatter.
166 def format(self, record: logging.LogRecord) -> str: 167 log_fmt = self.FORMATS.get(record.levelno) 168 formatter = logging.Formatter(log_fmt) 169 return formatter.format(record)
Format the specified record as text.
The record's attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
Inherited Members
- logging.Formatter
- Formatter
- converter
- datefmt
- default_time_format
- default_msec_format
- formatTime
- formatException
- usesTime
- formatMessage
- formatStack
172def remove_excess_logs( 173 log_file_dir: t.Optional[t.Union[str, Path]] = None, 174 log_limit: int = c.DEFAULT_LOG_LIMIT, 175) -> None: 176 if log_limit <= 0: 177 return 178 179 log_file_dir = log_file_dir or c.DEFAULT_LOG_FILE_DIR 180 log_path_prefix = Path(log_file_dir) / LOG_FILENAME_PREFIX 181 182 for path in list(sorted(glob.glob(f"{log_path_prefix}*.log"), reverse=True))[log_limit:]: 183 os.remove(path)
186def configure_logging( 187 force_debug: bool = False, 188 write_to_stdout: bool = False, 189 write_to_file: bool = True, 190 log_file_dir: t.Optional[t.Union[str, Path]] = None, 191 ignore_warnings: bool = False, 192 log_level: t.Optional[t.Union[str, int]] = None, 193) -> None: 194 # Remove noisy grpc logs that are not useful for users 195 os.environ["GRPC_VERBOSITY"] = os.environ.get("GRPC_VERBOSITY", "NONE") 196 197 logger = logging.getLogger() 198 debug = force_debug or debug_mode_enabled() 199 200 if log_level is not None: 201 if isinstance(log_level, str): 202 level = logging._nameToLevel.get(log_level.upper()) or logging.INFO 203 else: 204 level = log_level 205 else: 206 # base logger needs to be the lowest level that we plan to log 207 level = logging.DEBUG if debug else logging.INFO 208 209 logger.setLevel(level) 210 211 if debug: 212 # Remove noisy snowflake connector logs that are not useful for users 213 logging.getLogger("snowflake.connector").setLevel(logging.INFO) 214 215 if write_to_stdout: 216 stdout_handler = logging.StreamHandler(sys.stdout) 217 stdout_handler.setFormatter(CustomFormatter()) 218 stdout_handler.setLevel(logging.ERROR if ignore_warnings else level) 219 logger.addHandler(stdout_handler) 220 221 log_file_dir = log_file_dir or c.DEFAULT_LOG_FILE_DIR 222 log_path_prefix = Path(log_file_dir) / LOG_FILENAME_PREFIX 223 224 if write_to_file: 225 os.makedirs(str(log_file_dir), exist_ok=True) 226 filename = f"{log_path_prefix}{datetime.now().strftime('%Y_%m_%d_%H_%M_%S')}.log" 227 file_handler = logging.FileHandler(filename, mode="w", encoding="utf-8") 228 229 # the log files should always log at least info so that users will always have 230 # minimal info for debugging even if they specify "ignore_warnings" 231 file_handler.setLevel(level) 232 file_handler.setFormatter(logging.Formatter(LOG_FORMAT)) 233 logger.addHandler(file_handler) 234 235 if debug: 236 import faulthandler 237 238 enable_debug_mode() 239 240 # Enable threadumps. 241 faulthandler.enable() 242 243 # Windows doesn't support register so we check for it here 244 if hasattr(faulthandler, "register"): 245 from signal import SIGUSR1 246 247 faulthandler.register(SIGUSR1.value)