feast.infra.offline_stores package

Submodules

feast.infra.offline_stores.bigquery module

class feast.infra.offline_stores.bigquery.BigQueryOfflineStore[source]

Bases: feast.infra.offline_stores.offline_store.OfflineStore

static get_historical_features(config: feast.repo_config.RepoConfig, feature_views: List[feast.feature_view.FeatureView], feature_refs: List[str], entity_df: Union[pandas.core.frame.DataFrame, str], registry: feast.registry.Registry, project: str, full_feature_names: bool = False) feast.infra.offline_stores.offline_store.RetrievalJob[source]
static pull_all_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], event_timestamp_column: str, start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob[source]

Note that join_key_columns, feature_name_columns, event_timestamp_column, and created_timestamp_column have all already been mapped to column names of the source table and those column names are the values passed into this function.

static pull_latest_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], event_timestamp_column: str, created_timestamp_column: Optional[str], start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob[source]

Note that join_key_columns, feature_name_columns, event_timestamp_column, and created_timestamp_column have all already been mapped to column names of the source table and those column names are the values passed into this function.

class feast.infra.offline_stores.bigquery.BigQueryOfflineStoreConfig(*, type: typing_extensions.Literal[bigquery] = 'bigquery', dataset: pydantic.types.StrictStr = 'feast', project_id: pydantic.types.StrictStr = None, location: pydantic.types.StrictStr = None)[source]

Bases: feast.repo_config.FeastConfigBaseModel

Offline store config for GCP BigQuery

dataset: pydantic.types.StrictStr

(optional) BigQuery Dataset name for temporary tables

location: Optional[pydantic.types.StrictStr]

(optional) GCP location name used for the BigQuery offline store. Examples of location names include US, EU, us-central1, us-west4. If a location is not specified, the location defaults to the US multi-regional location. For more information on BigQuery data locations see: https://cloud.google.com/bigquery/docs/locations

project_id: Optional[pydantic.types.StrictStr]

(optional) GCP project name used for the BigQuery offline store

type: typing_extensions.Literal[bigquery]

Offline store type selector

class feast.infra.offline_stores.bigquery.BigQueryRetrievalJob(query: Union[str, Callable[[], AbstractContextManager[str]]], client: google.cloud.bigquery.client.Client, config: feast.repo_config.RepoConfig, full_feature_names: bool, on_demand_feature_views: Optional[List[feast.on_demand_feature_view.OnDemandFeatureView]] = None, metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata] = None)[source]

Bases: feast.infra.offline_stores.offline_store.RetrievalJob

property full_feature_names: bool
property metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata]

Return metadata information about retrieval. Should be available even before materializing the dataset itself.

property on_demand_feature_views: Optional[List[feast.on_demand_feature_view.OnDemandFeatureView]]
persist(storage: feast.saved_dataset.SavedDatasetStorage)[source]

Run the retrieval and persist the results in the same offline store used for read.

to_bigquery(job_config: Optional[google.cloud.bigquery.job.query.QueryJobConfig] = None, timeout: int = 1800, retry_cadence: int = 10) Optional[str][source]

Triggers the execution of a historical feature retrieval query and exports the results to a BigQuery table. Runs for a maximum amount of time specified by the timeout parameter (defaulting to 30 minutes).

Parameters
  • job_config – An optional bigquery.QueryJobConfig to specify options like destination table, dry run, etc.

  • timeout – An optional number of seconds for setting the time limit of the QueryJob.

  • retry_cadence – An optional number of seconds for setting how long the job should checked for completion.

Returns

Returns the destination table name or returns None if job_config.dry_run is True.

to_sql() str[source]

Returns the SQL query that will be executed in BigQuery to build the historical feature table.

feast.infra.offline_stores.bigquery.block_until_done(client: google.cloud.bigquery.client.Client, bq_job: Union[google.cloud.bigquery.job.query.QueryJob, google.cloud.bigquery.job.load.LoadJob], timeout: int = 1800, retry_cadence: float = 1)[source]

Waits for bq_job to finish running, up to a maximum amount of time specified by the timeout parameter (defaulting to 30 minutes).

Parameters
  • client – A bigquery.client.Client to monitor the bq_job.

  • bq_job – The bigquery.job.QueryJob that blocks until done runnning.

  • timeout – An optional number of seconds for setting the time limit of the job.

  • retry_cadence – An optional number of seconds for setting how long the job should checked for completion.

Raises
  • BigQueryJobStillRunning exception if the function has blocked longer than 30 minutes.

  • BigQueryJobCancelled exception to signify when that the job has been cancelled (i.e. from timeout or KeyboardInterrupt)

feast.infra.offline_stores.file module

class feast.infra.offline_stores.file.FileOfflineStore[source]

Bases: feast.infra.offline_stores.offline_store.OfflineStore

static get_historical_features(config: feast.repo_config.RepoConfig, feature_views: List[feast.feature_view.FeatureView], feature_refs: List[str], entity_df: Union[pandas.core.frame.DataFrame, str], registry: feast.registry.Registry, project: str, full_feature_names: bool = False) feast.infra.offline_stores.offline_store.RetrievalJob[source]
static pull_all_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], event_timestamp_column: str, start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob[source]

Note that join_key_columns, feature_name_columns, event_timestamp_column, and created_timestamp_column have all already been mapped to column names of the source table and those column names are the values passed into this function.

static pull_latest_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], event_timestamp_column: str, created_timestamp_column: Optional[str], start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob[source]

Note that join_key_columns, feature_name_columns, event_timestamp_column, and created_timestamp_column have all already been mapped to column names of the source table and those column names are the values passed into this function.

class feast.infra.offline_stores.file.FileOfflineStoreConfig(*, type: typing_extensions.Literal[file] = 'file')[source]

Bases: feast.repo_config.FeastConfigBaseModel

Offline store config for local (file-based) store

type: typing_extensions.Literal[file]

Offline store type selector

class feast.infra.offline_stores.file.FileRetrievalJob(evaluation_function: Callable, full_feature_names: bool, on_demand_feature_views: Optional[List[feast.on_demand_feature_view.OnDemandFeatureView]] = None, metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata] = None)[source]

Bases: feast.infra.offline_stores.offline_store.RetrievalJob

property full_feature_names: bool
property metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata]

Return metadata information about retrieval. Should be available even before materializing the dataset itself.

property on_demand_feature_views: Optional[List[feast.on_demand_feature_view.OnDemandFeatureView]]
persist(storage: feast.saved_dataset.SavedDatasetStorage)[source]

Run the retrieval and persist the results in the same offline store used for read.

feast.infra.offline_stores.helpers module

feast.infra.offline_stores.offline_store module

class feast.infra.offline_stores.offline_store.OfflineStore[source]

Bases: abc.ABC

OfflineStore is an object used for all interaction between Feast and the service used for offline storage of features.

abstract static get_historical_features(config: feast.repo_config.RepoConfig, feature_views: List[feast.feature_view.FeatureView], feature_refs: List[str], entity_df: Union[pandas.core.frame.DataFrame, str], registry: feast.registry.Registry, project: str, full_feature_names: bool = False) feast.infra.offline_stores.offline_store.RetrievalJob[source]
abstract static pull_all_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], event_timestamp_column: str, start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob[source]

Note that join_key_columns, feature_name_columns, event_timestamp_column, and created_timestamp_column have all already been mapped to column names of the source table and those column names are the values passed into this function.

abstract static pull_latest_from_table_or_query(config: feast.repo_config.RepoConfig, data_source: feast.data_source.DataSource, join_key_columns: List[str], feature_name_columns: List[str], event_timestamp_column: str, created_timestamp_column: Optional[str], start_date: datetime.datetime, end_date: datetime.datetime) feast.infra.offline_stores.offline_store.RetrievalJob[source]

Note that join_key_columns, feature_name_columns, event_timestamp_column, and created_timestamp_column have all already been mapped to column names of the source table and those column names are the values passed into this function.

class feast.infra.offline_stores.offline_store.RetrievalJob[source]

Bases: abc.ABC

RetrievalJob is used to manage the execution of a historical feature retrieval

abstract property full_feature_names: bool
abstract property metadata: Optional[feast.infra.offline_stores.offline_store.RetrievalMetadata]

Return metadata information about retrieval. Should be available even before materializing the dataset itself.

abstract property on_demand_feature_views: Optional[List[feast.on_demand_feature_view.OnDemandFeatureView]]
abstract persist(storage: feast.saved_dataset.SavedDatasetStorage)[source]

Run the retrieval and persist the results in the same offline store used for read.

to_arrow(validation_reference: Optional[ValidationReference] = None) pyarrow.lib.Table[source]

Return dataset as pyarrow Table synchronously :param validation_reference: If provided resulting dataset will be validated against this reference profile.

to_df(validation_reference: Optional[ValidationReference] = None) pandas.core.frame.DataFrame[source]

Return dataset as Pandas DataFrame synchronously including on demand transforms :param validation_reference: If provided resulting dataset will be validated against this reference profile.

class feast.infra.offline_stores.offline_store.RetrievalMetadata(features: List[str], keys: List[str], min_event_timestamp: Optional[datetime.datetime] = None, max_event_timestamp: Optional[datetime.datetime] = None)[source]

Bases: object

features: List[str]
keys: List[str]
max_event_timestamp: Optional[datetime.datetime]
min_event_timestamp: Optional[datetime.datetime]

Module contents