Skip to content

Add SparkSQLSource doc#1102

Merged
xiaoyongzhu merged 59 commits intofeathr-ai:mainfrom
Yuqing-cat:#1101
May 31, 2023
Merged

Add SparkSQLSource doc#1102
xiaoyongzhu merged 59 commits intofeathr-ai:mainfrom
Yuqing-cat:#1101

Conversation

@Yuqing-cat
Copy link
Copy Markdown
Collaborator

Description

Resolves #1101

How was this PR tested?

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to clarify your proposed changes.

Signed-off-by: Yuqing Wei <weiyuqing021@outlook.com>
enya-yx and others added 24 commits May 22, 2023 20:50
- Update pre-built docker image from `feathrfeaturestore/feathr-registry:releases-v0.9.0` to `feathrfeaturestore/feathr-registry:releases-v1.0.0`
- Update workflow names to be more descriptive
- Restrict pull_request_target configuration to workflow requires secret access
- Isolate gradle test off E2E test, and trigger for scala change only
- Update the quickstart guide to make it easier for users to get started, validate feature definitions and develop new things
* Add null filter

* Add spark flag

* filter obs data nulls

* Remove feature data null handling

* Update test

* remove additional test

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
* working test

* Minor comment

* bump version

* documentation update

* update version

---------

Co-authored-by: rkashyap <rkashyap@linkedin.com>
Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
This PR addresses the issue of Spark materialize job failure on machines with an arm platform, such as Mac M1, due to pre-fetched amd64 versions of Python packages and Maven jars during docker image creation. To resolve this problem, Sandbox Docker GitHub action is updated to support the arm 64 platform.

- Update job name in `.github/workflows/publish-to-dockerhub.yml`
- Update `build-push-action` from v3 to v4
- Add `setup-qemu-action` and `setup-buildx-action`
- Add support for Linux/AMD64 and Linux platforms
…precated warnings (feathr-ai#1110)

- Upgrade action checkout version from `v2` to `v3`
* Add Fake Data Generator

* update

* Update data_generator.py
* Update README to reflect the latest thought

* update readme
Fix "value is not a valid dict"
when access sql-registry api /projects/{project}/datasources/{datasource}

Co-authored-by: brianxiao <brianxiao@tencent.com>
…athr-ai#1128)

* Fix skipping features when derived feature contains a swa feature

* Fix comments

* Update documentation

* update version

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
feathr-ai#1130)

* fix bug when SWA hdfs and local paths without data.avro.json extensions are included for evaluation

* try

* Fix tests

* revert test file

* Add tests

* Add private classifier to variable

* fix test

* fix test

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
* Revert "Allow alien value in MVEL-based derivations (feathr-ai#1120) and remove stdout statements"

This reverts commit 55290e7.

* updating rc version after last commit

---------

Co-authored-by: Anirudh Agarwal <aniagarw@aniagarw-mn1.linkedin.biz>
jaymo001 and others added 22 commits May 22, 2023 20:50
* Add another API for accessing doJoinObsAndFeatures which suppresses exceptions

* version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
…eathr-ai#1156)

Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
* Add default column for missing features

* Fix failing test

* Fix SWA sparksession issue

* address comments

* Add comment

* bump version

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Jinghui Mo <jmo@linkedin.com>
… aggregation. (feathr-ai#1159)

The bucketed aggregation works by aggregate data at lower level timestamp, e.g. 5 minutes bucket, then leverage the lower level bucket aggregated result to produce the higher level aggregation result such as 1 hour, 1 day, etc.

The support levels are 5 minutes, 1 hour, 1 week, 1 month, 1 year.
Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
* version bump

* add logs

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
…1164)

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
…ai#1166)

Add feature value wrapper for 3rdparity feature value compatibility
* Seq join bug fix

* Address comments

* version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
* Fix bug in SWA with missing feature data

* remove unwanted code

* Address feedback and version bump

---------

Co-authored-by: Rakesh Kashyap Hanasoge Padmanabha <rkashyap@rkashyap-mn3.linkedin.biz>
Co-authored-by: Anirudh Agarwal <aniagarwal@linkedin.biz>
@Yuqing-cat Yuqing-cat requested a review from donegjookim as a code owner May 23, 2023 03:50
@Yuqing-cat Yuqing-cat added the safe to test Tag to execute build pipeline for a PR from forked repo label May 23, 2023
@xiaoyongzhu xiaoyongzhu merged commit 480e194 into feathr-ai:main May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Tag to execute build pipeline for a PR from forked repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] SparkSQL docs needed