Expected Behavior
TrinoOfflineStoreshould handle valid Trino types returned by schema introspection, includingchar(10), varbinary, json, array(varchar(10)), array(decimal(10, 2)), row(...), map(...), and bare decimal`.
Current Behavior
In sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py, trino_to_pa_value_type() assumes arrays always look like array(<word>):
if trino_type_as_str.startswith("array"):
_is_list = True
trino_type_as_str = re.search(r"^array\((\w+)\)$", trino_type_as_str).group(1)
The error is evident by the (\w+) regex, which cannot match parameterized types like varchar(10) or decimal(10, 2).
There are also missing normalizations/mappings:
char(10) is not normalized like varchar(10)
int, varbinary, and json are missing from _TRINO_TO_PA_TYPE_MAP
- bare
decimal leaves pa_type unset
row(...) and map(...) fall through to KeyError
Examples:
trino_to_pa_value_type("array(varchar(10))")
trino_to_pa_value_type("array(decimal(10, 2))")
trino_to_pa_value_type("char(10)")
trino_to_pa_value_type("varbinary")
trino_to_pa_value_type("json")
trino_to_pa_value_type("row(x bigint)")
trino_to_pa_value_type("map(varchar, bigint)")
trino_to_pa_value_type("decimal")
trino_to_feast_value_type("char(10)")
- `trino_to_feast_value_type("decimal")
Steps to reproduce
from feast.infra.offline_stores.contrib.trino_offline_store.trino_type_map import (
trino_to_feast_value_type,
trino_to_pa_value_type,
)
trino_to_pa_value_type("array(varchar(10))")
trino_to_pa_value_type("array(decimal(10, 2))")
trino_to_pa_value_type("char(10)")
trino_to_pa_value_type("varbinary")
trino_to_pa_value_type("json")
trino_to_pa_value_type("row(x bigint)")
trino_to_pa_value_type("map(varchar, bigint)")
trino_to_pa_value_type("decimal")
trino_to_feast_value_type("char(10)")
trino_to_feast_value_type("decimal")
Specifications
- Version: 0.63.0
- Platform: FeatureStore CR / K8s
- Subsystem:
TrinoOfflineStore
Possible Solution
- normalize
char(...) like varchar(...)
- add missing aliases/types such as
int, varbinary, and json
- replace the
array(\w+) regex with recursive parsing
- handle bare
decimal
- gracefully degrade
row(...) and map(...) instead of raising low-level parser errors
Solution be something like this:
# normalize char(...) and bare decimal in trino_to_feast_value_type()
if trino_type_as_str.startswith("decimal"):
search_precision = re.search(
r"^decimal\((\d+)(?>,\s?\d+)?\)$", trino_type_as_str
)
if search_precision:
precision = int(search_precision.group(1))
if precision > 32:
trino_type_as_str = "decimal64"
else:
trino_type_as_str = "decimal32"
else:
trino_type_as_str = "decimal64"
elif trino_type_as_str.startswith("timestamp"):
trino_type_as_str = "timestamp"
elif trino_type_as_str.startswith("varchar"):
trino_type_as_str = "varchar"
elif trino_type_as_str.startswith("char"):
trino_type_as_str = "char"
_TRINO_TO_PA_TYPE_MAP = {
"null": pa.null(),
"boolean": pa.bool_(),
"date": pa.date32(),
"tinyint": pa.int8(),
"smallint": pa.int16(),
"integer": pa.int32(),
"int": pa.int32(),
"bigint": pa.int64(),
"double": pa.float64(),
"binary": pa.binary(),
"varbinary": pa.binary(),
"char": pa.string(),
"json": pa.string(),
"real": pa.float32(),
}
def _trino_array_item_type(trino_type_as_str: str):
if trino_type_as_str.startswith("array(") and trino_type_as_str.endswith(")"):
return trino_type_as_str[6:-1].strip()
return None
def trino_to_pa_value_type(trino_type_as_str: str) -> pa.DataType:
trino_type_as_str = trino_type_as_str.lower().strip()
array_item_type = _trino_array_item_type(trino_type_as_str)
if array_item_type is not None:
return pa.list_(trino_to_pa_value_type(array_item_type))
if trino_type_as_str.startswith("decimal"):
search_precision = re.search(
r"^decimal\((\d+)(?>,\s?\d+)?\)$", trino_type_as_str
)
if search_precision:
precision = int(search_precision.group(1))
if precision > 32:
return pa.float64()
else:
return pa.float32()
return pa.float64()
if trino_type_as_str.startswith("timestamp"):
return pa.timestamp("us")
if trino_type_as_str.startswith("varchar"):
return pa.string()
if trino_type_as_str.startswith("char"):
return pa.string()
if trino_type_as_str.startswith("row("):
return pa.string()
if trino_type_as_str.startswith("map("):
return pa.string()
return _TRINO_TO_PA_TYPE_MAP[trino_type_as_str]
Expected Behavior
TrinoOfflineStore
should handle valid Trino types returned by schema introspection, includingchar(10),varbinary,json,array(varchar(10)),array(decimal(10, 2)),row(...),map(...), and baredecimal`.Current Behavior
In
sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py,trino_to_pa_value_type()assumes arrays always look likearray(<word>):The error is evident by the
(\w+)regex, which cannot match parameterized types likevarchar(10)ordecimal(10, 2).There are also missing normalizations/mappings:
char(10)is not normalized likevarchar(10)int,varbinary, andjsonare missing from_TRINO_TO_PA_TYPE_MAPdecimalleavespa_typeunsetrow(...)andmap(...)fall through toKeyErrorExamples:
trino_to_pa_value_type("array(varchar(10))")trino_to_pa_value_type("array(decimal(10, 2))")trino_to_pa_value_type("char(10)")trino_to_pa_value_type("varbinary")trino_to_pa_value_type("json")trino_to_pa_value_type("row(x bigint)")trino_to_pa_value_type("map(varchar, bigint)")trino_to_pa_value_type("decimal")trino_to_feast_value_type("char(10)")Steps to reproduce
Specifications
TrinoOfflineStorePossible Solution
char(...)likevarchar(...)int,varbinary, andjsonarray(\w+)regex with recursive parsingdecimalrow(...)andmap(...)instead of raising low-level parser errorsSolution be something like this: