Skip to content

py: kafka connector#1807

Merged
abhizer merged 2 commits intomainfrom
py_kafka
Jun 4, 2024
Merged

py: kafka connector#1807
abhizer merged 2 commits intomainfrom
py_kafka

Conversation

@abhizer
Copy link
Copy Markdown
Contributor

@abhizer abhizer commented May 30, 2024

  • also rename Client to FelderaClient

Is this a user-visible change (yes/no): yes

In a follow up PR, add builders for Connectors.

@abhizer abhizer added the python-sdk Issues related to the feldera python sdk label May 30, 2024
@abhizer abhizer requested a review from ryzhyk May 30, 2024 10:53
@abhizer abhizer assigned ryzhyk and unassigned ryzhyk May 30, 2024
Copy link
Copy Markdown
Contributor

@snkas snkas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

"auto.offset.reset": "earliest",
}

kafka_format = JSONFormat().with_update_format(UpdateFormat.InsertDelete).with_array(False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be a class with default values for its instantiation JSONFormat(update_format=UpdateFormat.Raw, array=False), and then the user supplies arguments to change any of the defaults. Either works though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am split between having a default and requiring explicit configuration.
Having a default might mean that people just use the default and might have issues when the format doesn't match the data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not have defaults for this, given we don't have data on what the typical format might be.

@lalithsuresh
Copy link
Copy Markdown
Contributor

This is a user-facing change and needs a changelog.

:param table_name: The name of the table.
:param connector_name: The unique name for this connector.
:param config: The configuration for the kafka connector.
:param fmt: The format of the data in the kafka topic.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide a reference to the format descriptions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format takes an instance of JSONFormat or CSVFormat.

fmt = fmt.to_dict()

if fmt.get("config").get("update_format") is None:
raise ValueError("update_format not set in the format config try using method: .with_update_format()")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"; consider using"

"""
Tells Feldera to write the data to the specified kafka topic.

:param view_name: The name of the view whose output is sent to kafka topic.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is really the view changes that are sent

])
print("Topics ready")

# Produce of rows into the input topic
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no "of"

sql.connect_source_kafka(TABLE_NAME, "kafka_conn_in", source_config, kafka_format)
sql.connect_sink_kafka(VIEW_NAME, "kafka_conn_out", sink_config, kafka_format)

out = sql.listen(VIEW_NAME)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so many empty lines?


InsertDelete = 1
"""
InsertDelete: When used, the data is inside a key either "insert" or "delete". This is used to represent changes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trouble parsing this phrase

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. I am not sure how to word this properly.

@abhizer abhizer added documentation Improvements or additions to documentation User-facing For PRs that lead to Feldera-user visible changes labels May 30, 2024
@abhizer abhizer requested a review from mihaibudiu May 30, 2024 18:02

InsertDelete = 1
"""
This is a way to represented changes in the data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

represent

InsertDelete = 1
"""
This is a way to represented changes in the data.
When used, the actual JSON data is expected to be inside one of the keys "insert" or "delete".
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the example is clearer than this text, which can be removed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just explain that 'id' and 'name' below are the columns of the table where the insertion and deletion happen.


Raw = 2
"""
This format represents an individual row in a SQL table or view.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really the same as an insertion, I presume

@abhizer abhizer mentioned this pull request May 31, 2024
16 tasks
To setup Kafka as the source use :meth:`.SQLContext.connect_source_kafka` and as the sink use
:meth:`.SQLContext.connect_sink_kafka`.

Both of these methods require a ``config`` which is a key-value pair.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Both of these methods require a ``config`` which is a key-value pair.
Both of these methods require a ``config`` which is a dictionary.

"auto.offset.reset": "earliest",
}

kafka_format = JSONFormat().with_update_format(UpdateFormat.InsertDelete).with_array(False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kafka_format = JSONFormat().with_update_format(UpdateFormat.InsertDelete).with_array(False)
format = JSONFormat().with_update_format(UpdateFormat.InsertDelete).with_array(False)

"auto.offset.reset": "earliest",
}

kafka_format = JSONFormat().with_update_format(UpdateFormat.InsertDelete).with_array(False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that in addition to Kafka config, the Kafka connector also requires data format config and point to data format docs.

"array": False,
}

def with_update_format(self, update_format: UpdateFormat) -> Self:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment

@ryzhyk
Copy link
Copy Markdown
Contributor

ryzhyk commented May 31, 2024

Please also add links to Python docs from connector documentation pages (maybe it has to be a separate PR).

abhizer added 2 commits June 4, 2024 00:47
* also rename `Client` to `FelderaClient`

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>

py: update the kafka test

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>

py: update documentation

* include an example for Kafka connector in the python docs
* refactor `Client` to `FelderaClient`
* add an entry in the changelog

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>

py: update documentation

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>
* py: HTTP GET input connector

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>

* py: refactor: common function to check format to create connector

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>

* py: update docs, rename `UpdateFormat` to `JSONUpdateFormat`

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>

---------

Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>
@abhizer abhizer merged commit 4fee55f into main Jun 4, 2024
@abhizer abhizer deleted the py_kafka branch June 4, 2024 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python-sdk Issues related to the feldera python sdk User-facing For PRs that lead to Feldera-user visible changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants