here.geopandas_adapter.geopandas_adapter module#

HERE Platform Python SDK, GeoPandas adapter access package

class here.geopandas_adapter.geopandas_adapter.GeoPandasAdapter(partition_column: str = 'partition_id', timestamp_column: str = 'partition_timestamp', including_default_value_fields: bool = True, preserving_proto_field_name: bool = True)[source]#

Bases: Adapter

This adapter transform data from and to pd.DataFrame and gpd.DataFrame, when geometry information such as longitude and latitude is involved.

An adapter controls the encoding and decoding process of platform data. It transforms data from and to adapter-specific data structure and supports reading, writing, encoding and decoding a variety of MIME content types.

For the list of MIME content types supported when reading and writing a layer with read_* and write_* functions of the Layer and its subclasses, please see documentation of GeoPandasDecoder and GeoPandasEncoder.

All the operations involving content passes through an adapter when the parameters encode or decode are True, their default value. These are parameters of the read_* and write_* functions. If a content type is not supported, or if reading or writing raw content is preferred, pass False to skip encoding or decoding and deal with raw bytes instead.

property content_adapter: ContentAdapter#: The adapter specialized for content.

static convert_to_dict(df)[source]#: Converts the columns to dict

property decoder: Decoder#: The decoder associated with the adapter.

property encoder: Encoder#: The encoder associated with the adapter.

from_feature_ids(feature_ids: Iterator[str], **kwargs) → Series[source]#

Adapt a sequence of feature identifiers to a Series.

Parameters:

feature_ids – sequence of feature identifiers
kwargs – additional parameters are passed unchanged to pd.Series(). For additional information please see: https://pandas.pydata.org/docs/reference/api/pandas.Series.html

Returns:

a Series with the feature identifiers

from_geo_features(features: Iterator[Feature], **kwargs) → GeoDataFrame[source]#

Adapt a sequence of geographic features to a GeoDataFrame.

Parameters:

features – sequence of geographic features
kwargs – additional parameters are passed unchanged to gpd.GeoDataFrame.from_features(). For additional information please see: https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.from_features.html

Returns:

a new gpd.GeoDataFrame containing the features

from_index_data(partitions_data: Iterator[Tuple[IndexPartition, bytes]], content_type: str, schema: Schema | None, **kwargs) → DataFrame[source]#

Adapt index partition metadata and data to a :class:pd.DataFrame.

Parameters:

partitions_data – sequence of partition metadata and data from a stream layer
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasDecoder

Returns:

partition data as pd.DataFrame or gpd.GeoDataFrame

from_index_metadata(partitions: Iterator[IndexPartition], **kwargs) → DataFrame[source]#

Adapt index partition metadata to a :class:pd.DataFrame.

Parameters:

partitions – sequence of partition metadata from an index layer
kwargs – unused

Returns:

partition metadata as :class:pd.DataFrame

from_stream_data(partitions_data: Iterator[Tuple[StreamPartition, bytes]], content_type: str, schema: Schema | None, **kwargs) → DataFrame[source]#

Adapt stream partition metadata and data to a :class:pd.DataFrame.

Parameters:

partitions_data – sequence of partition metadata and data from a stream layer
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasDecoder

Returns:

stream message data as pd.DataFrame or gpd.GeoDataFrame

from_stream_metadata(partitions: Iterator[StreamPartition], **kwargs) → DataFrame[source]#

Adapt stream partition metadata to a :class:pd.DataFrame.

Parameters:

partitions – sequence of partition metadata from a versioned layer
kwargs – unused

Returns:

partition metadata as :class:pd.DataFrame

from_versioned_data(partitions_data: Iterator[Tuple[VersionedPartition, bytes]], content_type: str, schema: Schema | None, **kwargs) → DataFrame[source]#

Adapt versioned partition metadata and data to a :class:pd.DataFrame.

Parameters:

partitions_data – sequence of partition metadata and data from a versioned layer
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasDecoder

Returns:

partition data as pd.DataFrame or gpd.GeoDataFrame

from_versioned_metadata(partitions: Iterator[VersionedPartition], **kwargs) → DataFrame[source]#

Adapt versioned partition metadata to a :class:pd.DataFrame.

Parameters:

partitions – sequence of partition metadata from a versioned layer
kwargs – unused

Returns:

partition metadata as :class:pd.DataFrame

from_volatile_data(partitions_data: Iterator[Tuple[VersionedPartition, bytes]], content_type: str, schema: Schema | None, **kwargs) → DataFrame[source]#

Adapt versioned partition metadata and data to a :class:pd.DataFrame.

Parameters:

partitions_data – sequence of partition metadata and data from a volatile layer
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasDecoder

Returns:

partition data as pd.DataFrame or gpd.GeoDataFrame

from_volatile_metadata(partitions: Iterator[VolatilePartition], **kwargs) → DataFrame[source]#

Adapt volatile partition metadata to the target format.

Parameters:

partitions – sequence of partition metadata from a volatile layer
kwargs – unused

Returns:

partition metadata as :class:pd.DataFrame

to_feature_ids(data: Series, **kwargs) → Iterator[str][source]#

Adapt data from a Series to a sequence of feature identifiers.

Values are converted to str. NA values discarded.

Parameters:

data – a Series containing feature identifiers
kwargs – unused

Returns:

sequence of feature identifiers

to_geo_features(data: GeoDataFrame, **kwargs) → Iterator[Feature][source]#

Adapt data in a GeoDataFrame to a sequence of geographic features.

Parameters:

data – the gpd.GeoDataFrame to adapt
kwargs – additional parameters are passed unchanged to gpd.GeoDataFrame.iterfeatures(). For additional information please see: https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.iterfeatures.html

Returns:

sequence of geographic features from the GeoDataFrame

to_index_single_data(data: DataFrame, content_type: str, schema: Schema | None, **kwargs) → bytes[source]#

Adapt a DataFrame to be stored in an index layer.

Parameters:

data – data in the form of DataFrame
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasEncoder

Returns:

data encoded for an index layer

Raises:

ValueError – in case the content type is not supported by the adapter # noqa

to_stream_data(layer: StreamLayer, data, content_type: str, schema: Schema | None, timestamp: int | None, **kwargs) → Iterator[Tuple[str | int, bytes, int | None]][source]#

Adapt data from the target format to stream partition metadata and data.

Parameters:

layer – the layer all the metadata and data belong to
data – adapter-specific, the data to adapt
content_type – the MIME content type of the layer
schema – optional Schema of the layer
timestamp – optional timestamp for all the messages, if none is specified in data: in milliseconds since Unix epoch (1970-01-01T00:00:00 UTC)
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports

Yield:

partition id, data and timestamp for the stream layer

Raises:

ValueError – in case required columns are missing

to_stream_metadata(layer: StreamLayer, partitions: DataFrame, **kwargs) → Iterator[StreamPartition][source]#

Adapt what to publish from the target format to stream partition metadata.

Parameters:

layer – the layer all the metadata and data belong to
partitions – the pd.DataFrame of partition metadata to append
kwargs – unused

Yield:

the StreamPartition that are adapted

to_versioned_data(layer: VersionedLayer, data: pd.DataFrame, content_type: str, schema: Schema | None, **kwargs) → Iterator[Tuple[str | int, bytes]][source]#

Adapt data from sequence of partition ids and data to versioned partition id and data.

Parameters:

layer – the layer all the metadata and data belong to
data – data as pd.DataFrame or gpd.GeoDataFrame
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasEncoder

Returns:

sequence of partition id and data for the volatile layer

to_versioned_metadata(layer: VersionedLayer, partitions_update: DataFrame | None, partitions_delete: Series | None, **kwargs) → Tuple[Iterator[VersionedPartition], Iterator[str | int]][source]#

Adapt pd.DataFrame of metadata and pd.Series of keys to versioned partition metadata and partition ids to update and delete.

Parameters:

layer – the layer all the metadata and data belong to
partitions_update – the pd.DataFrame of partition metadata to update, if any
partitions_delete – the pd.Series of partitions ids to delete, if any
kwargs – unused

Returns:

tuple of Iterator, the first with the VersionedPartition that have to be updated, the second with the partition ids to delete

to_volatile_data(layer: VolatileLayer, data: pd.DataFrame, content_type: str, schema: Schema | None, **kwargs) → Iterator[Tuple[str | int, bytes]][source]#

Adapt data from sequence of partition ids and data to volatile partition id and data.

Parameters:

layer – the layer all the metadata and data belong to
data – data as pd.DataFrame or gpd.GeoDataFrame
content_type – the MIME content type of the layer
schema – optional Schema of the layer
kwargs – additional, content-type-specific parameters, see GeoPandasEncoder

Returns:

sequence of partition id and data for the volatile layer

to_volatile_metadata(layer: VolatileLayer, partitions_update: DataFrame | None, partitions_delete: Series | None, **kwargs) → Tuple[Iterator[VolatilePartition], Iterator[str | int]][source]#

Adapt pd.DataFrame of metadata and pd.Series of keys to volatile partition metadata and partition ids to update and delete.

Parameters:

layer – the layer all the metadata and data belong to
partitions_update – the pd.DataFrame of partition metadata to update, if any
partitions_delete – the pd.Series of partitions ids to delete, if any
kwargs – unused

Returns:

tuple of Iterator, the first with the VolatilePartition that have to be updated, the second with the partition ids to delete

class here.geopandas_adapter.geopandas_adapter.GeoPandasContentAdapter(partition_column: str)[source]#

Bases: ContentAdapter

Specialization of the GeoPandasAdapter to map tabular-like content from content bindings to GeoDataFrame or DataFrame.

Adapt content form a structured representation to pandas DataFrame or geopandas GeoDataFrame.

It can optionally perform indexing of objects, based on their partition, identifier and set of references to other objects. Indexing is specified by naming the field of the object that contains the value to index, or by passing a function that calculates that value from the object.

Parameters:

fields – the fields to extract, as specified by a dataclass. Field names are looked up among the attributes of each object via getattr`. When missing, ``None or equivalent is used. Each field has a type that describes its semantic: it is used to adapt the value to the most appropriate representation for the output format. TypeError is raised in case this is not possible.
data – the objects to adapt to the target format. Fields not mentioned in fields are discarded. Expected but missing fields and identifiers are considered None. Field values may be of any type compatible with the type declared for the field. Partition ids don’t have to be unique, but they have to be contiguous: all the objects with a given partition identifier must be returned in sequence. Object identifiers, when present, must be unique across the whole content.
single_element – the data contains exactly one element, the content adapter case use this information to optimize or return a specialized representation
index_partition – index the content by partition, using the field specified
index_id – index the content by object identifier, using the field specified
index_ref – index the content by references, using the field specified. Each object can contain zero, one or more references, and references can be shared among multiple objects.

Returns:

objects in a dataframe, indexed as requested

Raises:

ValueError: if the fields are not described by a dataclass KeyError: in case partition id, object id or reference is needed but not present TypeError: in case partition or object id is not of type int or string. Also raised in case field values are not of the type declared for the field, or if they can’t be converted to it.

from_tabular(columns, data, geometry_column='geometry')[source]#: Convert the given attribute to tabular data :param columns: column names :param data: tabular data :param geometry_column: geometry column string :return: dataframe or geodataframe

class here.geopandas_adapter.geopandas_adapter.GeoPandasDecoder(including_default_value_fields: bool = True, preserving_proto_field_name: bool = True)[source]#

Bases: Decoder

Implementation of a Decoder to work with pd.DataFrame and gpd.GeoDataFrame.

decode_blob(data: bytes, content_type: str, schema: Schema | None = None, **kwargs)[source]#

Decode one single blob of data.

Parameters:

data – the encoded data
content_type – the MIME content type to be decoded
schema – the schema, if the content type requires one
kwargs –
additional, content-type-specific parameters for the decoder:

For Protobuf (application/protobuf or application/x-protobuf): - record_path: the name of a schema field that is decoded and transformed to DataFrame. It can reference nested fields by concatenating the field names with .. When referencing a single Protobuf sub-message, that message is decoded into one single dataframe row. When referencing repeated Protobuf messages, each repeated message is decoded in its own row, resulting in multiple rows per partition. Fields that are not Protobuf messages or repeated fields containing single values (ints, strings, …) are not supported because it is not possible to transform them to a dataframe. If not specified, the whole blob is decoded as single message. Messages are decoded, normalized (see max_level) and passed to pd.DataFrame.from_record() together with the rest of kwargs: this turns each field of the normalized messages into a column of the resulting dataframe. - record_prefix: if True, prefix the column names with the record_path. If a non-empty string, that string is used as prefix. . is used as separator. - max_level: normalize each record of the decoded Protobuf message up to the specified maximum level in depth. 0 disables normalization. - geometry_col: name of a column that contains geometries that is converted to a geopandas GeoSeries, resulting in a GeoDataFrame returned in place of a pandas DataFrame. For the supported formats, please see documentation of here.geopandas_adapter.geo_utils.to_geometry. Geometry field and sub-fields are excluded from normalization. If not specified, pandas DataFrame is returned and geometry is not interpreted. - geometry_crs: the CRS to set in the GeoDataFrame, when applicable. - The rest of the parameters are passed unchanged to pd.DataFrame.from_record() for further customizations. For additional information please see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_records.html

For Parquet (application/x-parquet): - engine: an optional param for type of engine used to parse the parquet data, values allowed are [auto, fastparquet, pyarrow]. If ‘auto’, then the behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ArrowNotImplementedError is raised. - The rest of the parameters are passed unchanged to pd.read_parquet() for further customizations. For additional information please see: https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html

For CSV (text/csv): sep: delimiter or column separator to use. header: row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified are skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file. names: list of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed. index_col: column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int/str is given, a MultiIndex is used. Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.

For JSON (application/json): orient: indication of expected JSON string format. The set of possible orients is: - ‘split’: dict like {index -> [index], columns -> [columns], data -> [values]} - ‘records’: list like [{column -> value}, … , {column -> value}] - ‘index’: dict like {index -> {column -> value}} - ‘columns’: dict like {column -> {index -> value}} - ‘values’: just the values array - ‘table’: dict like {‘schema’: {schema}, ‘data’: {data}} lines: set to True to read the file as a json object per line nrows: the number of lines from the line-delimited json file to read. This can only be passed if lines=True. If None, all the rows are returned.

For GeoJSON (application/geo+json or application/vnd.geo+json): No additional parameters available.

Returns:

the decoded blob, its type correspond to the type declared in the property supported_content_types for the content type

Raises:

ValueError – in case the specified content type is not decodable or the schema is mandatory for the content type but missing
UnsupportedContentTypeDecodeException – in case the content type is not decodable
ValueError – if the schema is mandatory for the content type but missing
DecodeException – in case the blob can’t be properly decoded # noqa
SchemaException – in case the schema can’t be used to decode the content # noqa

property supported_content_types: Dict[str, type | Tuple[type, ...]]#

Returns:: the dictionary of MIME content types supported when decoding single blobs

with the decode_blob function of this decoder, each with the type of the decoded data.

class here.geopandas_adapter.geopandas_adapter.GeoPandasEncoder[source]#

Bases: Encoder

Implementation of an Encoder to work with pd.DataFrame and gpd.GeoDataFrame.

encode_blob(data, content_type: str, schema: Schema | None = None, **kwargs) → bytes[source]#

Encode one single blob of data.

Parameters:

data – the data to be encoded, its type corresponds to the type declared in the property supported_content_types for the content type
content_type – the MIME content type to be encoded
schema – the schema, if the content type requires one
kwargs –
additional, content-type-specific parameters for the encoder:

For Parquet (application/x-parquet): - engine: an optional param for type of engine used to parse the parquet data, values allowed are [auto, fastparquet, pyarrow]. If ‘auto’, then the behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ArrowNotImplementedError is raised. - The rest of the parameters are passed unchanged to pd.DataFrame.to_parquet() for further customizations. For additional information please see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html

For CSV (text/csv): For parameters and general info, please see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

For JSON (application/json): orient: indication of JSON string format to produce. The set of possible orients is: - ‘split’: dict like {index -> [index], columns -> [columns], data -> [values]} - ‘records’: list like [{column -> value}, … , {column -> value}] - ‘index’: dict like {index -> {column -> value}} - ‘columns’: dict like {column -> {index -> value}} - ‘values’: just the values array - ‘table’: dict like {‘schema’: {schema}, ‘data’: {data}} lines: if orient is records write out line-delimited json format. For additional parameters and general info, please see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

For GeoJSON (application/geo+json or application/vnd.geo+json): For parameters and general info, please see: https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.to_json.html#geopandas.GeoDataFrame.to_json

For Protobuf (application/protobuf or application/x-protobuf): For parameters and general info, please see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_records.html

Returns:

the encoded data

Raises:

UnsupportedContentTypeEncodeException – in case the content type is not encodable
ValueError – if the schema is mandatory for the content type but missing
EncodeException – in case the blob can’t be properly encoded # noqa
SchemaException – in case the schema can’t be used to encode the content # noqa

property supported_content_types: Dict[str, type | Tuple[type, ...]]#

Returns:: the dictionary of MIME content types supported when encoding single blobs

with the encode_blob function of this encoder, each with the type of the encoded data.