feast.infra.offline_stores.contrib.athena_offline_store package
Subpackages
Submodules
feast.infra.offline_stores.contrib.athena_offline_store.athena module
- class feast.infra.offline_stores.contrib.athena_offline_store.athena.AthenaOfflineStore[source]
Bases:
feast.infra.offline_stores.offline_store.OfflineStore
- static get_historical_features(config: feast.repo_config.RepoConfig, feature_views: List[feast.feature_view.FeatureView], feature_refs: List[str], entity_df: Union[pandas.core.frame.DataFrame, str], registry: feast.infra.registry.registry.Registry, project: str, full_feature_names: bool = False) feast.infra.offline_stores.offline_store.RetrievalJob [source]
Retrieves the point-in-time correct historical feature values for the specified entity rows.
- Parameters
config – The config for the current feature store.
feature_views – A list containing all feature views that are referenced in the entity rows.
feature_refs – The features to be retrieved.
entity_df – A collection of rows containing all entity columns on which features need to be joined, as well as the timestamp column used for point-in-time joins. Either a pandas dataframe can be provided or a SQL query.
registry – The registry for the current feature store.
project – Feast project to which the feature views belong.
full_feature_names – If True, feature names will be prefixed with the corresponding feature view name, changing them from the format “feature” to “feature_view__feature” (e.g. “daily_transactions” changes to “customer_fv__daily_transactions”).
- Returns
A RetrievalJob that can be executed to get the features.
- static pull_all_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], timestamp_field: str, start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob [source]
Extracts all the entity rows (i.e. the combination of join key columns, feature columns, and timestamp columns) from the specified data source that lie within the specified time range.
All of the column names should refer to columns that exist in the data source. In particular, any mapping of column names must have already happened.
- Parameters
config – The config for the current feature store.
data_source – The data source from which the entity rows will be extracted.
join_key_columns – The columns of the join keys.
feature_name_columns – The columns of the features.
timestamp_field – The timestamp column.
start_date – The start of the time range.
end_date – The end of the time range.
- Returns
A RetrievalJob that can be executed to get the entity rows.
- static pull_latest_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], timestamp_field: str, created_timestamp_column: Optional[str], start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob [source]
Extracts the latest entity rows (i.e. the combination of join key columns, feature columns, and timestamp columns) from the specified data source that lie within the specified time range.
All of the column names should refer to columns that exist in the data source. In particular, any mapping of column names must have already happened.
- Parameters
config – The config for the current feature store.
data_source – The data source from which the entity rows will be extracted.
join_key_columns – The columns of the join keys.
feature_name_columns – The columns of the features.
timestamp_field – The timestamp column, used to determine which rows are the most recent.
created_timestamp_column – The column indicating when the row was created, used to break ties.
start_date – The start of the time range.
end_date – The end of the time range.
- Returns
A RetrievalJob that can be executed to get the entity rows.
- static write_logged_features(config: feast.repo_config.RepoConfig, data: Union[pyarrow.lib.Table, pathlib.Path], source: feast.feature_logging.LoggingSource, logging_config: feast.feature_logging.LoggingConfig, registry: feast.infra.registry.base_registry.BaseRegistry)[source]
Writes logged features to a specified destination in the offline store.
If the specified destination exists, data will be appended; otherwise, the destination will be created and data will be added. Thus this function can be called repeatedly with the same destination to flush logs in chunks.
- Parameters
config – The config for the current feature store.
data – An arrow table or a path to parquet directory that contains the logs to write.
source – The logging source that provides a schema and some additional metadata.
logging_config – A LoggingConfig object that determines where the logs will be written.
registry – The registry for the current feature store.
- class feast.infra.offline_stores.contrib.athena_offline_store.athena.AthenaOfflineStoreConfig(*, type: Literal['athena'] = 'athena', data_source: pydantic.types.StrictStr, region: pydantic.types.StrictStr, database: pydantic.types.StrictStr, workgroup: pydantic.types.StrictStr, s3_staging_location: pydantic.types.StrictStr)[source]
Bases:
feast.repo_config.FeastConfigBaseModel
Offline store config for AWS Athena
- data_source: pydantic.types.StrictStr
athena data source ex) AwsDataCatalog
- database: pydantic.types.StrictStr
Athena database name
- region: pydantic.types.StrictStr
Athena’s AWS region
- s3_staging_location: pydantic.types.StrictStr
S3 path for importing & exporting data to Athena
- type: Literal['athena']
Offline store type selector
- workgroup: pydantic.types.StrictStr
Athena workgroup name
- class feast.infra.offline_stores.contrib.athena_offline_store.athena.AthenaRetrievalJob(query: Union[str, Callable[[], AbstractContextManager[str]]], athena_client, s3_resource, config: feast.repo_config.RepoConfig, full_feature_names: bool, on_demand_feature_views: Optional[List[feast.on_demand_feature_view.OnDemandFeatureView]] = None, metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata] = None)[source]
Bases:
feast.infra.offline_stores.offline_store.RetrievalJob
- property full_feature_names: bool
Returns True if full feature names should be applied to the results of the query.
- property metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata]
Returns metadata about the retrieval job.
- property on_demand_feature_views: List[feast.on_demand_feature_view.OnDemandFeatureView]
Returns a list containing all the on demand feature views to be handled.
- persist(storage: feast.saved_dataset.SavedDatasetStorage, allow_overwrite: bool = False)[source]
Synchronously executes the underlying query and persists the result in the same offline store at the specified destination.
- Parameters
storage – The saved dataset storage object specifying where the result should be persisted.
allow_overwrite – If True, a pre-existing location (e.g. table or file) can be overwritten. Currently not all individual offline store implementations make use of this parameter.
feast.infra.offline_stores.contrib.athena_offline_store.athena_source module
- class feast.infra.offline_stores.contrib.athena_offline_store.athena_source.AthenaLoggingDestination(*, table_name: str)[source]
Bases:
feast.feature_logging.LoggingDestination
- classmethod from_proto(config_proto: feast.core.FeatureService_pb2.LoggingConfig) feast.feature_logging.LoggingDestination [source]
- to_data_source() feast.data_source.DataSource [source]
Convert this object into a data source to read logs from an offline store.
- class feast.infra.offline_stores.contrib.athena_offline_store.athena_source.AthenaOptions(table: Optional[str], query: Optional[str], database: Optional[str], data_source: Optional[str])[source]
Bases:
object
Configuration options for a Athena data source.
- classmethod from_proto(athena_options_proto: feast.core.DataSource_pb2.AthenaOptions)[source]
Creates a AthenaOptions from a protobuf representation of a Athena option.
- Parameters
athena_options_proto – A protobuf representation of a DataSource
- Returns
A AthenaOptions object based on the athena_options protobuf.
- class feast.infra.offline_stores.contrib.athena_offline_store.athena_source.AthenaSource(*, timestamp_field: Optional[str] = '', table: Optional[str] = None, database: Optional[str] = None, data_source: Optional[str] = None, created_timestamp_column: Optional[str] = None, field_mapping: Optional[Dict[str, str]] = None, date_partition_column: Optional[str] = None, query: Optional[str] = None, name: Optional[str] = None, description: Optional[str] = '', tags: Optional[Dict[str, str]] = None, owner: Optional[str] = '')[source]
Bases:
feast.data_source.DataSource
- property data_source
Returns the Athena data_source of this Athena source.
- property database
Returns the database of this Athena source.
- static from_proto(data_source: feast.core.DataSource_pb2.DataSource)[source]
Creates a AthenaSource from a protobuf representation of a AthenaSource.
- Parameters
data_source – A protobuf representation of a AthenaSource
- Returns
A AthenaSource object based on the data_source protobuf.
- get_table_column_names_and_types(config: feast.repo_config.RepoConfig) Iterable[Tuple[str, str]] [source]
Returns a mapping of column names to types for this Athena source.
- Parameters
config – A RepoConfig describing the feature repo
- get_table_query_string(config: Optional[feast.repo_config.RepoConfig] = None) str [source]
Returns a string that can directly be used to reference this table in SQL.
- property query
Returns the Athena query of this Athena source.
- static source_datatype_to_feast_value_type() Callable[[str], feast.value_type.ValueType] [source]
Returns the callable method that returns Feast type given the raw column type.
- property table
Returns the table of this Athena source.
- to_proto() feast.core.DataSource_pb2.DataSource [source]
Converts a RedshiftSource object to its protobuf representation.
- Returns
A DataSourceProto object.
- validate(config: feast.repo_config.RepoConfig)[source]
Validates the underlying data source.
- Parameters
config – Configuration object used to configure a feature store.
- class feast.infra.offline_stores.contrib.athena_offline_store.athena_source.SavedDatasetAthenaStorage(table_ref: str, query: Optional[str] = None, database: Optional[str] = None, data_source: Optional[str] = None)[source]
Bases:
feast.saved_dataset.SavedDatasetStorage
- athena_options: feast.infra.offline_stores.contrib.athena_offline_store.athena_source.AthenaOptions
- static from_proto(storage_proto: feast.core.SavedDataset_pb2.SavedDatasetStorage) feast.saved_dataset.SavedDatasetStorage [source]
- to_data_source() feast.data_source.DataSource [source]