feast.infra.offline_stores.contrib.trino_offline_store package

Subpackages

Submodules

feast.infra.offline_stores.contrib.trino_offline_store.trino module

class feast.infra.offline_stores.contrib.trino_offline_store.trino.AuthConfig(*, type: Literal['kerberos', 'basic', 'jwt', 'oauth2', 'certificate'], config: Dict[StrictStr, Any] | None = None)[source]

Bases: FeastConfigBaseModel

config: Dict[StrictStr, Any] | None
classmethod config_only_nullable_for_oauth2(values)[source]
to_trino_auth()[source]
type: Literal['kerberos', 'basic', 'jwt', 'oauth2', 'certificate']
class feast.infra.offline_stores.contrib.trino_offline_store.trino.BasicAuthModel(*, username: StrictStr, password: SecretStr)[source]

Bases: FeastConfigBaseModel

password: SecretStr
username: StrictStr
class feast.infra.offline_stores.contrib.trino_offline_store.trino.CertificateAuthModel(**extra_data: Any)[source]

Bases: FeastConfigBaseModel

cert: FilePath
key: FilePath
class feast.infra.offline_stores.contrib.trino_offline_store.trino.JWTAuthModel(*, token: SecretStr)[source]

Bases: FeastConfigBaseModel

token: SecretStr
class feast.infra.offline_stores.contrib.trino_offline_store.trino.KerberosAuthModel(*, principal: StrictStr | None = None, delegate: StrictBool = False, **extra_data: Any)[source]

Bases: FeastConfigBaseModel

ca_bundle: FilePath | None
config: FilePath | None
delegate: StrictBool
force_preemptive: StrictBool
hostname_override: StrictStr | None
mutual_authentication: StrictBool
principal: StrictStr | None
sanitize_mutual_error_response: StrictBool
service_name: StrictStr | None
class feast.infra.offline_stores.contrib.trino_offline_store.trino.TrinoOfflineStore[source]

Bases: OfflineStore

static get_historical_features(config: RepoConfig, feature_views: List[FeatureView], feature_refs: List[str], entity_df: DataFrame | str, registry: Registry, project: str, full_feature_names: bool = False) TrinoRetrievalJob[source]

Retrieves the point-in-time correct historical feature values for the specified entity rows.

Parameters:
  • config – The config for the current feature store.

  • feature_views – A list containing all feature views that are referenced in the entity rows.

  • feature_refs – The features to be retrieved.

  • entity_df – A collection of rows containing all entity columns on which features need to be joined, as well as the timestamp column used for point-in-time joins. Either a pandas dataframe can be provided or a SQL query.

  • registry – The registry for the current feature store.

  • project – Feast project to which the feature views belong.

  • full_feature_names – If True, feature names will be prefixed with the corresponding feature view name, changing them from the format “feature” to “feature_view__feature” (e.g. “daily_transactions” changes to “customer_fv__daily_transactions”).

Returns:

A RetrievalJob that can be executed to get the features.

static pull_all_from_table_or_query(config: RepoConfig, data_source: DataSource, join_key_columns: List[str], feature_name_columns: List[str], timestamp_field: str, start_date: datetime, end_date: datetime) RetrievalJob[source]

Extracts all the entity rows (i.e. the combination of join key columns, feature columns, and timestamp columns) from the specified data source that lie within the specified time range.

All of the column names should refer to columns that exist in the data source. In particular, any mapping of column names must have already happened.

Parameters:
  • config – The config for the current feature store.

  • data_source – The data source from which the entity rows will be extracted.

  • join_key_columns – The columns of the join keys.

  • feature_name_columns – The columns of the features.

  • timestamp_field – The timestamp column.

  • start_date – The start of the time range.

  • end_date – The end of the time range.

Returns:

A RetrievalJob that can be executed to get the entity rows.

static pull_latest_from_table_or_query(config: RepoConfig, data_source: DataSource, join_key_columns: List[str], feature_name_columns: List[str], timestamp_field: str, created_timestamp_column: str | None, start_date: datetime, end_date: datetime) TrinoRetrievalJob[source]

Extracts the latest entity rows (i.e. the combination of join key columns, feature columns, and timestamp columns) from the specified data source that lie within the specified time range.

All of the column names should refer to columns that exist in the data source. In particular, any mapping of column names must have already happened.

Parameters:
  • config – The config for the current feature store.

  • data_source – The data source from which the entity rows will be extracted.

  • join_key_columns – The columns of the join keys.

  • feature_name_columns – The columns of the features.

  • timestamp_field – The timestamp column, used to determine which rows are the most recent.

  • created_timestamp_column – The column indicating when the row was created, used to break ties.

  • start_date – The start of the time range.

  • end_date – The end of the time range.

Returns:

A RetrievalJob that can be executed to get the entity rows.

class feast.infra.offline_stores.contrib.trino_offline_store.trino.TrinoOfflineStoreConfig(*, type: StrictStr = 'trino', host: StrictStr, port: int, catalog: StrictStr, user: StrictStr, source: StrictStr | None = 'trino-python-client', connector: Dict[str, str], dataset: StrictStr = 'feast', auth: AuthConfig | None = None, **extra_data: Any)[source]

Bases: FeastConfigBaseModel

Online store config for Trino

auth: AuthConfig | None

(optional) Authentication mechanism to use when connecting to Trino. Supported options are: - kerberos - basic - jwt - oauth2 - certificate

catalog: StrictStr

Catalog of the Trino cluster

connector: Dict[str, str]

Trino connector to use as well as potential extra parameters. Needs to contain at least the path, for example {“type”: “bigquery”} or {“type”: “hive”, “file_format”: “parquet”}

dataset: StrictStr

(optional) Trino Dataset name for temporary tables

extra_credential: StrictStr | None

Specifies the HTTP header X-Trino-Extra-Credential, e.g. user1=pwd1, user2=pwd2

host: StrictStr

Host of the Trino cluster

http_scheme: Literal['http', 'https']

HTTP scheme that should be used while establishing a connection to the Trino cluster

port: int

Port of the Trino cluster

source: StrictStr | None

ID of the feast’s Trino Python client, useful for debugging

type: StrictStr

Offline store type selector

user: StrictStr

User of the Trino cluster

verify: StrictBool

Whether the SSL certificate emited by the Trino cluster should be verified or not

class feast.infra.offline_stores.contrib.trino_offline_store.trino.TrinoRetrievalJob(query: str, client: Trino, config: RepoConfig, full_feature_names: bool, on_demand_feature_views: List[OnDemandFeatureView] | None = None, metadata: RetrievalMetadata | None = None)[source]

Bases: RetrievalJob

property full_feature_names: bool

Returns True if full feature names should be applied to the results of the query.

property metadata: RetrievalMetadata | None

Return metadata information about retrieval. Should be available even before materializing the dataset itself.

property on_demand_feature_views: List[OnDemandFeatureView]

Returns a list containing all the on demand feature views to be handled.

persist(storage: SavedDatasetStorage, allow_overwrite: bool | None = False, timeout: int | None = None)[source]

Run the retrieval and persist the results in the same offline store used for read.

to_sql() str[source]

Returns the SQL query that will be executed in Trino to build the historical feature table

to_trino(destination_table: str | None = None, timeout: int = 1800, retry_cadence: int = 10) str | None[source]

Triggers the execution of a historical feature retrieval query and exports the results to a Trino table. Runs for a maximum amount of time specified by the timeout parameter (defaulting to 30 minutes). :param timeout: An optional number of seconds for setting the time limit of the QueryJob. :param retry_cadence: An optional number of seconds for setting how long the job should checked for completion.

Returns:

Returns the destination table name.

feast.infra.offline_stores.contrib.trino_offline_store.trino_queries module

class feast.infra.offline_stores.contrib.trino_offline_store.trino_queries.Query(query_text: str, cursor: Cursor)[source]

Bases: object

cancel(*args)[source]
close()[source]
execute() Results[source]
class feast.infra.offline_stores.contrib.trino_offline_store.trino_queries.QueryStatus(value)[source]

Bases: Enum

An enumeration.

CANCELLED = 4
COMPLETED = 3
ERROR = 2
PENDING = 0
RUNNING = 1
class feast.infra.offline_stores.contrib.trino_offline_store.trino_queries.Results(data: List[List[Any]], columns: List[Dict])[source]

Bases: object

Class for keeping track of the results of a Trino query

columns: List[Dict]
property columns_names: List[str]
data: List[List[Any]]
property pyarrow_schema: Schema
property schema: Dict[str, str]
to_dataframe() DataFrame[source]
class feast.infra.offline_stores.contrib.trino_offline_store.trino_queries.Trino(host: str, port: int, user: str, catalog: str, source: str | None, http_scheme: str, verify: bool, extra_credential: str | None, auth: trino.Authentication | None)[source]

Bases: object

create_query(query_text: str) Query[source]

Create a Query object without executing it.

execute_query(query_text: str) Results[source]

Create a Query object and execute it.

feast.infra.offline_stores.contrib.trino_offline_store.trino_source module

class feast.infra.offline_stores.contrib.trino_offline_store.trino_source.SavedDatasetTrinoStorage(table: str | None = None, query: str | None = None)[source]

Bases: SavedDatasetStorage

static from_proto(storage_proto: SavedDatasetStorage) SavedDatasetStorage[source]
to_data_source() DataSource[source]
to_proto() SavedDatasetStorage[source]
trino_options: TrinoOptions
class feast.infra.offline_stores.contrib.trino_offline_store.trino_source.TrinoOptions(table: str | None, query: str | None)[source]

Bases: object

DataSource Trino options used to source features from Trino query

classmethod from_proto(trino_options_proto: TrinoOptions)[source]

Creates a TrinoOptions from a protobuf representation of a Trino option :param trino_options_proto: A protobuf representation of a DataSource

Returns:

Returns a TrinoOptions object based on the trino_options protobuf

property query

Returns the Trino SQL query referenced by this source

property table

Returns the table ref of this Trino table

to_proto() TrinoOptions[source]

Converts an TrinoOptionsProto object to its protobuf representation. :returns: TrinoOptionsProto protobuf

class feast.infra.offline_stores.contrib.trino_offline_store.trino_source.TrinoSource(*, name: str | None = None, timestamp_field: str | None = None, table: str | None = None, created_timestamp_column: str | None = '', field_mapping: Dict[str, str] | None = None, query: str | None = None, description: str | None = '', tags: Dict[str, str] | None = None, owner: str | None = '')[source]

Bases: DataSource

created_timestamp_column: str
date_partition_column: str
description: str
field_mapping: Dict[str, str]
static from_proto(data_source: DataSource)[source]

Converts data source config in protobuf spec to a DataSource class object.

Parameters:

data_source – A protobuf representation of a DataSource.

Returns:

A DataSource class object.

Raises:

ValueError – The type of DataSource could not be identified.

get_table_column_names_and_types(config: RepoConfig) Iterable[Tuple[str, str]][source]

Returns the list of column names and raw column types.

Parameters:

config – Configuration object used to configure a feature store.

get_table_query_string() str[source]

Returns a string that can directly be used to reference this table in SQL

name: str
owner: str
property query
static source_datatype_to_feast_value_type() Callable[[str], ValueType][source]

Returns the callable method that returns Feast type given the raw column type.

property table
tags: Dict[str, str]
timestamp_field: str
to_proto() DataSource[source]

Converts a DataSourceProto object to its protobuf representation.

property trino_options

Returns the Trino options of this data source

validate(config: RepoConfig)[source]

Validates the underlying data source.

Parameters:

config – Configuration object used to configure a feature store.

feast.infra.offline_stores.contrib.trino_offline_store.trino_type_map module

feast.infra.offline_stores.contrib.trino_offline_store.trino_type_map.pa_to_trino_value_type(pa_type_as_str: str) str[source]
feast.infra.offline_stores.contrib.trino_offline_store.trino_type_map.trino_to_feast_value_type(trino_type_as_str: str) ValueType[source]
feast.infra.offline_stores.contrib.trino_offline_store.trino_type_map.trino_to_pa_value_type(trino_type_as_str: str) DataType[source]

Module contents