Skip to content

TrinoOfflineStore: trino_type_map fails on parameterized and complex types #6489

@dbbvitor

Description

@dbbvitor

Expected Behavior

TrinoOfflineStoreshould handle valid Trino types returned by schema introspection, includingchar(10), varbinary, json, array(varchar(10)), array(decimal(10, 2)), row(...), map(...), and bare decimal`.

Current Behavior

In sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py, trino_to_pa_value_type() assumes arrays always look like array(<word>):

if trino_type_as_str.startswith("array"):
    _is_list = True
    trino_type_as_str = re.search(r"^array\((\w+)\)$", trino_type_as_str).group(1)

The error is evident by the (\w+) regex, which cannot match parameterized types like varchar(10) or decimal(10, 2).

There are also missing normalizations/mappings:

  • char(10) is not normalized like varchar(10)
  • int, varbinary, and json are missing from _TRINO_TO_PA_TYPE_MAP
  • bare decimal leaves pa_type unset
  • row(...) and map(...) fall through to KeyError

Examples:

  • trino_to_pa_value_type("array(varchar(10))")
  • trino_to_pa_value_type("array(decimal(10, 2))")
  • trino_to_pa_value_type("char(10)")
  • trino_to_pa_value_type("varbinary")
  • trino_to_pa_value_type("json")
  • trino_to_pa_value_type("row(x bigint)")
  • trino_to_pa_value_type("map(varchar, bigint)")
  • trino_to_pa_value_type("decimal")
  • trino_to_feast_value_type("char(10)")
  • `trino_to_feast_value_type("decimal")

Steps to reproduce

from feast.infra.offline_stores.contrib.trino_offline_store.trino_type_map import (
    trino_to_feast_value_type,
    trino_to_pa_value_type,
)

trino_to_pa_value_type("array(varchar(10))")
trino_to_pa_value_type("array(decimal(10, 2))")
trino_to_pa_value_type("char(10)")
trino_to_pa_value_type("varbinary")
trino_to_pa_value_type("json")
trino_to_pa_value_type("row(x bigint)")
trino_to_pa_value_type("map(varchar, bigint)")
trino_to_pa_value_type("decimal")
trino_to_feast_value_type("char(10)")
trino_to_feast_value_type("decimal")

Specifications

  • Version: 0.63.0
  • Platform: FeatureStore CR / K8s
  • Subsystem: TrinoOfflineStore

Possible Solution

  • normalize char(...) like varchar(...)
  • add missing aliases/types such as int, varbinary, and json
  • replace the array(\w+) regex with recursive parsing
  • handle bare decimal
  • gracefully degrade row(...) and map(...) instead of raising low-level parser errors

Solution be something like this:

# normalize char(...) and bare decimal in trino_to_feast_value_type()
if trino_type_as_str.startswith("decimal"):
    search_precision = re.search(
        r"^decimal\((\d+)(?>,\s?\d+)?\)$", trino_type_as_str
    )
    if search_precision:
        precision = int(search_precision.group(1))
        if precision > 32:
            trino_type_as_str = "decimal64"
        else:
            trino_type_as_str = "decimal32"
    else:
        trino_type_as_str = "decimal64"

elif trino_type_as_str.startswith("timestamp"):
    trino_type_as_str = "timestamp"

elif trino_type_as_str.startswith("varchar"):
    trino_type_as_str = "varchar"

elif trino_type_as_str.startswith("char"):
    trino_type_as_str = "char"


_TRINO_TO_PA_TYPE_MAP = {
    "null": pa.null(),
    "boolean": pa.bool_(),
    "date": pa.date32(),
    "tinyint": pa.int8(),
    "smallint": pa.int16(),
    "integer": pa.int32(),
    "int": pa.int32(),
    "bigint": pa.int64(),
    "double": pa.float64(),
    "binary": pa.binary(),
    "varbinary": pa.binary(),
    "char": pa.string(),
    "json": pa.string(),
    "real": pa.float32(),
}


def _trino_array_item_type(trino_type_as_str: str):
    if trino_type_as_str.startswith("array(") and trino_type_as_str.endswith(")"):
        return trino_type_as_str[6:-1].strip()
    return None


def trino_to_pa_value_type(trino_type_as_str: str) -> pa.DataType:
    trino_type_as_str = trino_type_as_str.lower().strip()

    array_item_type = _trino_array_item_type(trino_type_as_str)
    if array_item_type is not None:
        return pa.list_(trino_to_pa_value_type(array_item_type))

    if trino_type_as_str.startswith("decimal"):
        search_precision = re.search(
            r"^decimal\((\d+)(?>,\s?\d+)?\)$", trino_type_as_str
        )
        if search_precision:
            precision = int(search_precision.group(1))
            if precision > 32:
                return pa.float64()
            else:
                return pa.float32()
        return pa.float64()

    if trino_type_as_str.startswith("timestamp"):
        return pa.timestamp("us")

    if trino_type_as_str.startswith("varchar"):
        return pa.string()

    if trino_type_as_str.startswith("char"):
        return pa.string()

    if trino_type_as_str.startswith("row("):
        return pa.string()

    if trino_type_as_str.startswith("map("):
        return pa.string()

    return _TRINO_TO_PA_TYPE_MAP[trino_type_as_str]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions