feast.staging package¶
Submodules¶
feast.staging.entities module¶
- feast.staging.entities.create_bq_view_of_joined_features_and_entities(source: feast.data_source.BigQuerySource, entity_source: feast.data_source.BigQuerySource, entity_names: List[str]) → feast.data_source.BigQuerySource[source]¶
Creates BQ view that joins tables from source and entity_source with join key derived from entity_names. Returns BigQuerySource with reference to created view.
- feast.staging.entities.stage_entities_to_bq(entity_source: pandas.core.frame.DataFrame, project: str, dataset: str) → feast.data_source.BigQuerySource[source]¶
Stores given (entity) dataframe as new table in BQ. Name of the table generated based on current time. Table will expire in 1 day. Returns BigQuerySource with reference to created table.
- feast.staging.entities.stage_entities_to_fs(entity_source: pandas.core.frame.DataFrame, staging_location: str, config: feast.config.Config) → feast.data_source.FileSource[source]¶
Dumps given (entities) dataframe as parquet file and stage it to remote file storage (subdirectory of staging_location)
- Returns
FileSource with remote destination path
feast.staging.storage_client module¶
- class feast.staging.storage_client.AbstractStagingClient[source]¶
Bases:
abc.ABC
Client used to stage files in order to upload or download datasets into a historical store.
- abstract download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]¶
Downloads a file from an object store and returns a TemporaryFile object
- abstract list_files(uri: urllib.parse.ParseResult) → List[str][source]¶
Lists all the files under a directory in an object store.
- abstract upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶
Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
- class feast.staging.storage_client.AzureBlobClient(account_name: str, account_access_key: str)[source]¶
Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for Azure Blob storage
- download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]¶
Downloads a file from Azure blob storage and returns a TemporaryFile object
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“wasbs://bucket@account_name.blob.core.windows.net/file.avro”)
- Returns
TemporaryFile object
- list_files(uri: urllib.parse.ParseResult) → List[str][source]¶
Lists all the files under a directory in azure blob storage if path has wildcard(*) character.
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of this location
- Returns
- A list containing the full path to the file(s) in the
remote staging location.
- Return type
List[str]
- upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶
Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
- class feast.staging.storage_client.GCSClient[source]¶
Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for google cloud storage
- download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]¶
Downloads a file from google cloud storage and returns a TemporaryFile object
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“gs://bucket/file.avro”)
- Returns
TemporaryFile object
- list_files(uri: urllib.parse.ParseResult) → List[str][source]¶
Lists all the files under a directory in google cloud storage if path has wildcard(*) character.
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of this location
- Returns
- A list containing the full path to the file(s) in the
remote staging location.
- Return type
List[str]
- upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶
Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
- class feast.staging.storage_client.LocalFSClient[source]¶
Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for local file Note: The is used for E2E tests.
- download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]¶
Reads a local file from the disk
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“file:///folder/file.avro”)
- Returns
TemporaryFile object
- list_files(uri: urllib.parse.ParseResult) → List[str][source]¶
Lists all the files under a directory in an object store.
- upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶
Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
- class feast.staging.storage_client.S3Client(endpoint_url: Optional[str] = None, url_scheme='s3')[source]¶
Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for Aws S3 storage
- download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]¶
Downloads a file from AWS s3 storage and returns a TemporaryFile object
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“s3://bucket/file.avro”)
- Returns
TemporaryFile object
- list_files(uri: urllib.parse.ParseResult) → List[str][source]¶
Lists all the files under a directory in s3 if path has wildcard(*) character.
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of this location
- Returns
- A list containing the full path to the file(s) in the
remote staging location.
- Return type
List[str]
- upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶
Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
- feast.staging.storage_client.get_staging_client(scheme, config: Optional[feast.config.Config] = None) → feast.staging.storage_client.AbstractStagingClient[source]¶
Initialization of a specific client object(GCSClient, S3Client etc.)