Conversation
| Yield successive n-sized chunks from the given dataframe. | ||
| """ | ||
|
|
||
| for i in range(0, len(df), chunk_size): |
There was a problem hiding this comment.
Interesting that iloc does not throw any errors when selecting a range beyond its size.
snkas
left a comment
There was a problem hiding this comment.
The chunking of input and the fixes are nice additions! Regarding serialization, I think it'd be useful to consider the client interface and how generic the push_to_pipeline function should be (or if it should be restructured / renamed / other tailored functions added). For instance, send_request should be kept as generic as possible, either requiring body to be bytes or having an optional serialization function passed that turns it into bytes.
python/feldera/rest/_httprequests.py
Outdated
| :param content_type: The value for `Content-Type` HTTP header. "application/json" by default. | ||
| :param params: The query parameters part of this request. | ||
| :param stream: True if the response is expected to be a HTTP stream. | ||
| :param dont_serialize: True if the body is already serialized. |
There was a problem hiding this comment.
The negative seems unnecessary with the default value, why not have it serialize: bool = True?
| array: bool = False, | ||
| force: bool = False, | ||
| update_format: str = "raw", | ||
| dont_serialize: bool = False, |
There was a problem hiding this comment.
This function based on signature supports both JSON and CSV as the data format, but it seems the fields are tailored towards JSON?
There was a problem hiding this comment.
Yeah. We use JSON most of the time, maybe they should be two different functions.
ryzhyk
left a comment
There was a problem hiding this comment.
Looks good, we can work on the Feldera-compatible timestamp encoding in another PR
|
@abhizer , does Pandas support Date, Time, and Decimal types? If so, we will also need to make sure we encode those correctly. |
|
I don't think there are Date and Time separate types in Pandas. Even if it is only just the date, it seems to be DateTime and Decimals seem to be serialized as Double. |
Fixes: #1840 Also does the following things: * chunk dataframes into smaller groups of 1000 rows per request while ingesting data * avoids adding empty dataframes to output buffer * ignores the index while concatenating output dataframes Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Introduces a new JSON dialect that matches how Pandas encodes timestamp types as millis since epoch. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Signed-off-by: Abhinav Gyawali <22275402+abhizer@users.noreply.github.com>
Fixes: #1840
Also does the following things:
Is this a user-visible change (yes/no): no