Specify unique-row-id column in get_historical_features

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
If I have columns X, A, B, C, event_timestamp in my entity source data and A, B, C are the entity columns to join feature data to but the combination of [A, B, C, event_timestamp] may not be unique, the join will have issues that produces duplicate rows. One solution is to preprocess the data so that only the unique rows of the combination are filtered out for the join, but we may want all the rows to be preserved since X may already be unique and each row represents a real unique training example.  It could also be that there are columns Y, Z, etc that aren't part of the feast join but contain unique info on a row basis so it doesn't make sense to filter those out. In this example, X might be impression_id for instance and we're not joining data directly based on impression_id but based on the A, B, C columns which might be tweet_id, user_id, etc. 

**Describe the solution you'd like**
A clear and concise description of what you want to happen.
Be able to optionally specify a unique-row-id column in get_historical_features so in the example above, X would be chosen as the unique-row-id column. I've tested swapping this part of the feast join query

`           CONCAT(
                {% for entity in featureview.entities %}
                    CAST({{entity}} AS STRING),
                {% endfor %}
                CAST({{entity_df_event_timestamp_col}} AS STRING)
            ) AS {{featureview.name}}__entity_row_unique_id,
`

with just 

`X as entity_row_unique_id`

and it fixes the issue, plus there seems to be performance gains possibly from just eliminating the work of creating the concatenated strings for each row. I think the change should be relatively easy to make though this involves an API change which always requires some consideration. get_historical_features might become something like 


```
    def get_historical_features(
        entity_df: Union[pd.DataFrame, str],
        feature_refs: List[str],
        unique_id_col: str = "",
        full_feature_names: bool = False,
    ) -> RetrievalJob:

 unique_id_col: str = "", being the new addition of an optional param
```


**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

N/A

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specify unique-row-id column in get_historical_features #1736

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Specify unique-row-id column in get_historical_features #1736

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions