Newest 'stream-processing' Questions

-1 votes

0 answers

48 views

Chronological Order Lost After Two-Level LeftSemiJoinEngine [closed]

I'm using a two-level LeftSemiJoinEngine in DolphinDB to join streaming tables, but the final output data cannot be correctly sorted by the specified column. Scenario Description: I need to process ...

xinyu zhang

29

asked Dec 12 at 6:05

0 votes

0 answers

71 views

Time Series Aggregator output has chronological order issue

I created a time series aggregator using createTimeSeriesAggregator, but the chronological order of the output results is problematic. Scenario Description: The data in the source table MD_PTB is ...

haru

11

asked Dec 12 at 2:45

0 votes

2 answers

113 views

Is it safe to mutate, emit, and snapshot the same operator-state instance in Apache Flink?

I'm building a single global Topology object in a non-keyed ProcessFunction with parallelism = 1. I keep it as a local mutable object and update it for every input event using topology.apply(GnmiEvent)...

Kvn

1

asked Nov 7 at 11:49

-1 votes

1 answer

47 views

Does Flink throughput decrease proportionally with the number of side outputs?

I have a Flink job with multiple downstream operators I want to route tuples to based on a condition. Side outputs are advertised for this use case in the Flink documentation. However, when sending ...

keezar

111

asked Nov 6 at 16:40

0 votes

0 answers

65 views

Azure Event Hubs +Apache Flink + Cassandra – how to handle downtime without losing events (one Replay Hub vs multiple Replay Hubs)?

We use Azure Event Hubs (Kafka API) with Apache Flink consumers, and a shared Cassandra DB as the sink. There are 7 Event Hubs (one per application) → each has its own Flink consumer writing to the ...

Sadhanala Akhil Kumar

1

asked Aug 26 at 16:04

0 votes

1 answer

62 views

How to handle multiple aggregations on the same Kafka event stream when each needs a different partitioning key?

I'm working on a system that processes events from Kafka, and I'm running into a design problem related to scaling aggregations. The events contain fields like this (purchased SKUs in an order): { &...

Aleksey Usatov

369

asked Aug 1 at 13:24

0 votes

1 answer

42 views

How does Kafka Streams' KTable.groupBy + aggregate guarantee correct order of the re-partitioned messages?

If I have a kafka input topic with multiple partitions and then in Kafka Streams I use kStream.map to change the key of each record and write that to an output topic, I will face the problem, that ...

selbstereg

13

asked Apr 10 at 7:25

0 votes

0 answers

54 views

Average aggregation of data stream in bytewax

I want to aggregate the values of my DataStream in tumbling windows of 10 seconds. Unfortunately is the documentation in Bytewax very limited and I also don't find any other source where an average of ...

LeXXan

27

asked Mar 20 at 15:23

1 vote

1 answer

81 views

Docker connection between Kafka and Bytewax refused

I want to consume a stream from Kafka using Bytewax to perform aggregations. Unfortunately I'm not able to connect to Kafka and the connection is always refused. I assume something with the port setup ...

LeXXan

27

asked Mar 17 at 9:51

0 votes

1 answer

134 views

Apache Flink Branching

Problem Overview: I am working on a Flink application that allows users to design dataflows dynamically. The core engine is built around stages, where a DataStream is passed sequentially through these ...

Mohamed Sallam

3

asked Jan 23 at 9:15

0 votes

1 answer

37 views

Timestamps of result elements of window function in Apache Flink

The Flink documentation says "The only relevant information that is set on the result elements is the element timestamp [...] which is [set to] end timestamp - 1 [...]" In case I have to ...

keezar

111

asked Nov 19, 2024 at 20:36

0 votes

1 answer

51 views

Flink State Processor API with Windowed CoGroup and Unaligned Checkpoints

How can I use the Flink State Processor API to process the state of a windowed CoGroup (or Join) function? The documentation does not give such an example. Is there a way to use the Flink State ...

keezar

111

asked Nov 18, 2024 at 11:22

1 vote

0 answers

42 views

ksqldb explode multiple list elements

I have data like this: One Kafka Message: [{"source": "858256_6052+571", "numericValue": null, "created": 1725969039288, "textValue": "mytestData&...

Max Muster

63

asked Sep 18, 2024 at 14:19

0 votes

1 answer

358 views

Kafka Streams Processor has no access to StateStore null the store is not connected to the processor

I have the code in A. After we call builtTopology = builder.build, the call to new org.apache.kafka.streams.TopologyTestDriver(builtTopology, properties) gives me the error in B. I've combed through ...

Nutritioustim

2,738

asked Aug 2, 2024 at 15:43

1 vote

1 answer

119 views

Is there any chance to limit database sessions using jdbc sinks with apache flink?

We are using jdbc sinks (apache flink) with which we are hitting the database maximal session count, especially when we increase parallelism. Our tests showed that if we increase our default ...

user26565994

13

asked Jul 30, 2024 at 5:49

0 votes

1 answer

49 views

Flink GlobalWindow Trigger only process the trigger event

I have datastream keyby by an event property, that is then passed to a globalwindow, trigged when a specific event comes in, the issue is that when the window is trigged to process the events, it only ...

car_dev

3

asked Jul 26, 2024 at 15:20

0 votes

1 answer

142 views

How to correlate HTTP request to its corresponding response with Apache Flink?

I'm looking for the best practice to correlate requests and their corresponding responses in an Apache Flink stream processing job. The key attributes of the problem are: Conditions: Each request and ...

Cauchy H

1

asked May 21, 2024 at 2:36

0 votes

2 answers

105 views

Slowly Updating Side Inputs & Session Windows - Transform node AppliedPTransform was not replaced as expected

In my apache beam streaming pipeline, I have an unbounded pub/sub source which I use with session windows. There is some bounded configuration data which I need to pass into some of the DoFns of the ...

Thomas W.

560

asked May 16, 2024 at 19:48

2 votes

1 answer

77 views

Why does PyFlink give me a time in the past

I have a Kafka topic in which I produce an entry every 2-3 seconds Then I have PyFlink job that will format the entries and send them to a db here's my Flink env setup env = StreamExecutionEnvironment....

GeorgeSedhom

90

asked Apr 30, 2024 at 12:27

0 votes

1 answer

151 views

Order Preserving Union/Merge Executor in Flink

I want to merge two (multiple) streams using Flink. Both streams are themselves ordered, I want merged result to be ordered also. As an example [1,2,4,5,7,8, ...] and [2,3,6,7, ..] should produce ...

akurmustafa

122

asked Jan 17, 2024 at 13:48

2 votes

0 answers

384 views

Apache Beam portable runner to run golang based job on flink

I am trying to learn Apache Beam and trying to create a sample project to learn stream processing. For now, I want to read from a Kafka topic "word" and print the data on console. I have ...

Swapnil1456

45

asked Sep 9, 2023 at 19:53

1 vote

1 answer

1k views

Are checkpoints needed for a stream processed in databricks job running via continuous trigger?

We have a requirement to process a data stream in a databricks notebook job and load to a delta table. I noted that there was a new "Continuous" trigger available for databricks jobs and we ...

azuresnowflake1

187

asked May 12, 2023 at 3:47

0 votes

2 answers

3k views

Apache flink job not printing anything to stdout and not sending to Kafka sink topic

I am experimenting with Apache Flink for a personal project and I have been struggling to make the resulting stream output to StdOut and to send it to a Kafka topic orders-output. My goal is to ...

Annis99

135

asked Apr 30, 2023 at 10:46

0 votes

1 answer

49 views

How to access the results from ProcessFunction?

I am running a query against two topics and calculating the results. In the main class: tableEnv.createTemporaryView("tbl1", stream1); tableEnv.createTemporaryView("tbl2", stream2);...

newbie5050

51

asked Apr 14, 2023 at 13:09

2 votes

2 answers

1k views

Good way to parallel process elements of 'stream' while keeping output in order

I have an app that receives a stream of XML events from Kafka. These events have to be deserialized/parsed and otherwise converted, before being handed in-order to some business logic. (This logic ...

Jannick

57

asked Mar 22, 2023 at 20:02

0 votes

1 answer

79 views

Unable to perform 'maxBy' on a Flink windowed stream of type <SampleClass> by using the SampleClass's field as a parameter does not work

Lets say this is my sample stream like so: SingleOutputStreamOperator<Tuple2<String, SampleClass>> sampleStream = previousStream .keyBy(value -> ...

FelixNavidad

1

asked Mar 21, 2023 at 21:46

2 votes

1 answer

2k views

Is there a suitable python library for doing stream processing with Kafka topics? [duplicate]

I am trying to find a suitable Python library to do stream processing with streams Kafka topics, Kafka streams. Specifically, I am looking for libraries that support the following operations. KStream-...

Eagle1992

23

asked Jan 21, 2023 at 18:29

0 votes

1 answer

1k views

In Kafka Streams, how do you parallelize complex operations (or sub-topologies) using multiple topics and partitions?

I am currently trying to understand how Kafka Streams achieves parallelism. My main concern boils down to three questions: Can multiple sub-topologies read from the same partition? How can you ...

donare

1

asked Jan 8, 2023 at 18:16

0 votes

0 answers

181 views

Is it possable to stream images to a PDF using tesseract?

I need to stream images from a scanner to a PDF document. Ultimately I want to OCR the images and save the text to the PDF as well, but I'm more concerned with getting the streaming working first. ...

tlum

933

asked Dec 26, 2022 at 19:45

1 vote

0 answers

216 views

Pipelined vs blocking data exchange in a Flink job

I've been reading about pipelined region scheduling in Flink and am a bit confused about what they mean. My understanding of it is that a Streaming job is always pipelined whereas a Batch job can ...

sunny

39

asked Dec 18, 2022 at 10:52

1 vote

0 answers

142 views

Event-Driven-API with Real Time Streaming Analytics from Datastream? (Kappa-Architecture, IoT)

I've recently read up common Big Data architectures (Lambda and Kappa) and I'm trying to put it into practice in the context of an IoT Application. As of right now, events are produced, ingested into ...

QUE

11

asked Nov 9, 2022 at 16:15

1 vote

0 answers

248 views

Upgrade Apache Storm from 1.2.3. to 2.2.0

We are seeing below error when we migrate to Apache Storm 2.2.0 from 1.2.3. We do not see any permission issues as well on the file which it is unable to find. Also, the file exists but just the /data ...

Deepti

138

asked Nov 1, 2022 at 17:58

0 votes

1 answer

207 views

How to configure logback-based logging to handle log masking with Apache Storm?

I am trying to configure logback based log masking for Apache Storm topologies. When I try to replace logback.xml file inside Apache Storm log4j2- directory and update worker.xml and cluster.xml file, ...

Rajan Kasodariya

27

asked Oct 5, 2022 at 4:58

4 votes

1 answer

878 views

One consumer to multiple tables or many consumers per table

I have a kafka topic with millions of sale events. I have a consumer which on every message will insert the data into 4 table: 1 for the raw sales, 1 for the sales sum by date by product category (...

friartuck

3,171

asked Oct 5, 2022 at 3:35

0 votes

2 answers

79 views

Is it possible to implement a timer in a Apache Storm bolt?

A Bolt's code is triggered when data arrives (an input tuple). How can we program code inside a Bolt to run even in the case of missing input data? I mean, if no tuple arrives how can we force an ...

ellav

1

asked Sep 23, 2022 at 16:27

0 votes

1 answer

336 views

Azure Stream Analytics - JSON Array, windowing and previous value

I receive containers of senor data as an input for ASA. Containers look like this. { "data": [ { "sensor_id": 55, "timestamp": 1663075725000, "value&...

mananana

413

asked Sep 14, 2022 at 6:14

2 votes

2 answers

3k views

Apache flink vs Apache Beam (With flink runner)

I am considering using Flink or Apache Beam (with the flink runner) for different stream processing applications. I am trying to compare the two options and make the better choice. Here are the ...

Guillaume Delmas-Frenette

43

asked Jul 14, 2022 at 19:35

2 votes

1 answer

665 views

How to apply an offset to Tumbling Window, in order to delay the starting of Windows<TimeWindow> in Kafka Streams

I'm calculating a simple mean on a dataset with values for May 2022, using different windows sizes. Using 1 hour windows there are no problems, while using 1 week and 1 month windows, records are not ...

sixpain

354

asked Jun 28, 2022 at 11:40

2 votes

3 answers

869 views

Flink AggregateFunction in TumblingWindow is automatically splitted in two windows for big window size

I'm calculating a simple mean on some records, using different windows sizes. Using 1 hour and 1 week windows there are no problems, and the results are computed correctly. var keyed = src ....

sixpain

354

asked Jun 27, 2022 at 11:24

1 vote

1 answer

72 views

Sink for user activity data stream to build Online ML model

I am writing a consumer that consumes (user activity data, (activityid, userid, timestamp, cta, duration) from Google Pub/Sub and I want to create a sink for this such that I can train my ML model in ...

amor.fati95

119

asked May 27, 2022 at 12:05

2 votes

1 answer

522 views

How to drain the window after a Flink join using coGroup()?

I'd like to join data coming in from two Kafka topics ("left" and "right"). Matching records are to be joined using an ID, but if a "left" or a "right" record ...

Beryllium

13k

asked May 12, 2022 at 17:30

3 votes

3 answers

1k views

How to do stream processing with Redpanda?

Redpanda seems easy to work with, but how would one process streams in real-time? We have a few thousand IoT devices that send us data every second. We would like to get the running average of the ...

NorwegianClassic

1,155

asked Mar 28, 2022 at 14:45

1 vote

0 answers

774 views

Use Redis streams for MQTT queue

Redis has changed a lot in the later years, and it's difficult to keep up with the latest features. We have a few thousand IoT devices that all send MQTT messages every second. We want different ...

NorwegianClassic

1,155

asked Mar 25, 2022 at 13:12

3 votes

4 answers

279 views

Stream processing alternatives

We have a few thousand IoT devices that send us their temperature every second. The input source can be MQTT or JSON (or a queue if needed). Our goal is to near continuously process data for each of ...

NorwegianClassic

1,155

asked Mar 24, 2022 at 15:11

2 votes

1 answer

372 views

How to inject delay between the window and sink operator?

Context - Application We have an Apache Flink application which processes events The application uses event time characteristics The application shards (keyBy) events based on the sessionId field The ...

Peter Csala

23.7k

asked Mar 7, 2022 at 10:58

1 vote

1 answer

384 views

Python program to use Elasticsearch as sink in Apache Flink

I am trying to read data from a kafka topic do some processing and dump the data into elasticsearch. But I could not find example in python ti use Elastisearch as sink. Can anyone help me with a ...

Madhuri Desai

147

asked Feb 9, 2022 at 6:23

4 votes

3 answers

5k views

Flink Python Datastream API Kafka Consumer

Im new to pyflink. Im tryig to write a python program to read data from kafka topic and prints data to stdout. I followed the link Flink Python Datastream API Kafka Producer Sink Serializaion. But i ...

Madhuri Desai

147

asked Feb 3, 2022 at 15:42

0 votes

2 answers

66 views

Are Hazelcast Jet Reliable Topic Sinks idempotent? (Hazelcast fault-tolerance of a websocket source)

I cannot find this in the Hazelcast Jet 5.0 (or 4.x) documentation, so I hope someone can answer this here - can a reliable topic be used as an idempotent sink, for example to de-duplicate events ...

siddhadev

16.6k

asked Dec 23, 2021 at 19:20

0 votes

1 answer

518 views

Easiest stream processing framework for python?

I'm working on a project that will determine whether or not I score an internship. The project focuses on stream processing and is due in 2 weeks. It's pretty simple, just deriving some statistics ...

geisha-and-guis

27

asked Nov 21, 2021 at 21:54

0 votes

1 answer

439 views

How to scale flink on an un-keyed stream

I have a relatively basic use case. My data lives in a few 100 kafka partitions and I need to pass the events through a map operator before I send them to a custom HTTP sink. For performance reasons ...

jlunavtgrad

1,015

asked Nov 12, 2021 at 0:25

Collectives™ on Stack Overflow