Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
0 answers
48 views

I'm using a two-level LeftSemiJoinEngine in DolphinDB to join streaming tables, but the final output data cannot be correctly sorted by the specified column. Scenario Description: I need to process ...
xinyu zhang's user avatar
0 votes
0 answers
71 views

I created a time series aggregator using createTimeSeriesAggregator, but the chronological order of the output results is problematic. Scenario Description: The data in the source table MD_PTB is ...
haru's user avatar
  • 11
0 votes
2 answers
113 views

I'm building a single global Topology object in a non-keyed ProcessFunction with parallelism = 1. I keep it as a local mutable object and update it for every input event using topology.apply(GnmiEvent)...
Kvn's user avatar
  • 1
-1 votes
1 answer
47 views

I have a Flink job with multiple downstream operators I want to route tuples to based on a condition. Side outputs are advertised for this use case in the Flink documentation. However, when sending ...
keezar's user avatar
  • 111
0 votes
0 answers
65 views

We use Azure Event Hubs (Kafka API) with Apache Flink consumers, and a shared Cassandra DB as the sink. There are 7 Event Hubs (one per application) → each has its own Flink consumer writing to the ...
Sadhanala Akhil Kumar's user avatar
0 votes
1 answer
62 views

I'm working on a system that processes events from Kafka, and I'm running into a design problem related to scaling aggregations. The events contain fields like this (purchased SKUs in an order): { &...
Aleksey Usatov's user avatar
0 votes
1 answer
42 views

If I have a kafka input topic with multiple partitions and then in Kafka Streams I use kStream.map to change the key of each record and write that to an output topic, I will face the problem, that ...
selbstereg's user avatar
0 votes
0 answers
54 views

I want to aggregate the values of my DataStream in tumbling windows of 10 seconds. Unfortunately is the documentation in Bytewax very limited and I also don't find any other source where an average of ...
LeXXan's user avatar
  • 27
1 vote
1 answer
81 views

I want to consume a stream from Kafka using Bytewax to perform aggregations. Unfortunately I'm not able to connect to Kafka and the connection is always refused. I assume something with the port setup ...
LeXXan's user avatar
  • 27
0 votes
1 answer
134 views

Problem Overview: I am working on a Flink application that allows users to design dataflows dynamically. The core engine is built around stages, where a DataStream is passed sequentially through these ...
Mohamed Sallam's user avatar
0 votes
1 answer
37 views

The Flink documentation says "The only relevant information that is set on the result elements is the element timestamp [...] which is [set to] end timestamp - 1 [...]" In case I have to ...
keezar's user avatar
  • 111
0 votes
1 answer
51 views

How can I use the Flink State Processor API to process the state of a windowed CoGroup (or Join) function? The documentation does not give such an example. Is there a way to use the Flink State ...
keezar's user avatar
  • 111
1 vote
0 answers
42 views

I have data like this: One Kafka Message: [{"source": "858256_6052+571", "numericValue": null, "created": 1725969039288, "textValue": "mytestData&...
Max Muster's user avatar
0 votes
1 answer
358 views

I have the code in A. After we call builtTopology = builder.build, the call to new org.apache.kafka.streams.TopologyTestDriver(builtTopology, properties) gives me the error in B. I've combed through ...
Nutritioustim's user avatar
1 vote
1 answer
119 views

We are using jdbc sinks (apache flink) with which we are hitting the database maximal session count, especially when we increase parallelism. Our tests showed that if we increase our default ...
user26565994's user avatar
0 votes
1 answer
49 views

I have datastream keyby by an event property, that is then passed to a globalwindow, trigged when a specific event comes in, the issue is that when the window is trigged to process the events, it only ...
car_dev's user avatar
0 votes
1 answer
142 views

I'm looking for the best practice to correlate requests and their corresponding responses in an Apache Flink stream processing job. The key attributes of the problem are: Conditions: Each request and ...
Cauchy H's user avatar
0 votes
2 answers
105 views

In my apache beam streaming pipeline, I have an unbounded pub/sub source which I use with session windows. There is some bounded configuration data which I need to pass into some of the DoFns of the ...
Thomas W.'s user avatar
  • 560
2 votes
1 answer
77 views

I have a Kafka topic in which I produce an entry every 2-3 seconds Then I have PyFlink job that will format the entries and send them to a db here's my Flink env setup env = StreamExecutionEnvironment....
GeorgeSedhom's user avatar
0 votes
1 answer
151 views

I want to merge two (multiple) streams using Flink. Both streams are themselves ordered, I want merged result to be ordered also. As an example [1,2,4,5,7,8, ...] and [2,3,6,7, ..] should produce ...
akurmustafa's user avatar
2 votes
0 answers
384 views

I am trying to learn Apache Beam and trying to create a sample project to learn stream processing. For now, I want to read from a Kafka topic "word" and print the data on console. I have ...
Swapnil1456's user avatar
1 vote
1 answer
1k views

We have a requirement to process a data stream in a databricks notebook job and load to a delta table. I noted that there was a new "Continuous" trigger available for databricks jobs and we ...
azuresnowflake1's user avatar
0 votes
2 answers
3k views

I am experimenting with Apache Flink for a personal project and I have been struggling to make the resulting stream output to StdOut and to send it to a Kafka topic orders-output. My goal is to ...
 Annis99's user avatar
0 votes
1 answer
49 views

I am running a query against two topics and calculating the results. In the main class: tableEnv.createTemporaryView("tbl1", stream1); tableEnv.createTemporaryView("tbl2", stream2);...
newbie5050's user avatar
2 votes
2 answers
1k views

I have an app that receives a stream of XML events from Kafka. These events have to be deserialized/parsed and otherwise converted, before being handed in-order to some business logic. (This logic ...
Jannick's user avatar
  • 57
0 votes
1 answer
79 views

Lets say this is my sample stream like so: SingleOutputStreamOperator<Tuple2<String, SampleClass>> sampleStream = previousStream .keyBy(value -> ...
FelixNavidad's user avatar
2 votes
1 answer
2k views

I am trying to find a suitable Python library to do stream processing with streams Kafka topics, Kafka streams. Specifically, I am looking for libraries that support the following operations. KStream-...
Eagle1992's user avatar
0 votes
1 answer
1k views

I am currently trying to understand how Kafka Streams achieves parallelism. My main concern boils down to three questions: Can multiple sub-topologies read from the same partition? How can you ...
donare's user avatar
  • 1
0 votes
0 answers
181 views

I need to stream images from a scanner to a PDF document. Ultimately I want to OCR the images and save the text to the PDF as well, but I'm more concerned with getting the streaming working first. ...
tlum's user avatar
  • 933
1 vote
0 answers
216 views

I've been reading about pipelined region scheduling in Flink and am a bit confused about what they mean. My understanding of it is that a Streaming job is always pipelined whereas a Batch job can ...
sunny's user avatar
  • 39
1 vote
0 answers
142 views

I've recently read up common Big Data architectures (Lambda and Kappa) and I'm trying to put it into practice in the context of an IoT Application. As of right now, events are produced, ingested into ...
QUE's user avatar
  • 11
1 vote
0 answers
248 views

We are seeing below error when we migrate to Apache Storm 2.2.0 from 1.2.3. We do not see any permission issues as well on the file which it is unable to find. Also, the file exists but just the /data ...
Deepti's user avatar
  • 138
0 votes
1 answer
207 views

I am trying to configure logback based log masking for Apache Storm topologies. When I try to replace logback.xml file inside Apache Storm log4j2- directory and update worker.xml and cluster.xml file, ...
Rajan Kasodariya's user avatar
4 votes
1 answer
878 views

I have a kafka topic with millions of sale events. I have a consumer which on every message will insert the data into 4 table: 1 for the raw sales, 1 for the sales sum by date by product category (...
friartuck's user avatar
  • 3,171
0 votes
2 answers
79 views

A Bolt's code is triggered when data arrives (an input tuple). How can we program code inside a Bolt to run even in the case of missing input data? I mean, if no tuple arrives how can we force an ...
ellav's user avatar
  • 1
0 votes
1 answer
336 views

I receive containers of senor data as an input for ASA. Containers look like this. { "data": [ { "sensor_id": 55, "timestamp": 1663075725000, "value&...
mananana's user avatar
  • 413
2 votes
2 answers
3k views

I am considering using Flink or Apache Beam (with the flink runner) for different stream processing applications. I am trying to compare the two options and make the better choice. Here are the ...
Guillaume Delmas-Frenette's user avatar
2 votes
1 answer
665 views

I'm calculating a simple mean on a dataset with values for May 2022, using different windows sizes. Using 1 hour windows there are no problems, while using 1 week and 1 month windows, records are not ...
sixpain's user avatar
  • 354
2 votes
3 answers
869 views

I'm calculating a simple mean on some records, using different windows sizes. Using 1 hour and 1 week windows there are no problems, and the results are computed correctly. var keyed = src ....
sixpain's user avatar
  • 354
1 vote
1 answer
72 views

I am writing a consumer that consumes (user activity data, (activityid, userid, timestamp, cta, duration) from Google Pub/Sub and I want to create a sink for this such that I can train my ML model in ...
amor.fati95's user avatar
2 votes
1 answer
522 views

I'd like to join data coming in from two Kafka topics ("left" and "right"). Matching records are to be joined using an ID, but if a "left" or a "right" record ...
Beryllium's user avatar
  • 13k
3 votes
3 answers
1k views

Redpanda seems easy to work with, but how would one process streams in real-time? We have a few thousand IoT devices that send us data every second. We would like to get the running average of the ...
NorwegianClassic's user avatar
1 vote
0 answers
774 views

Redis has changed a lot in the later years, and it's difficult to keep up with the latest features. We have a few thousand IoT devices that all send MQTT messages every second. We want different ...
NorwegianClassic's user avatar
3 votes
4 answers
279 views

We have a few thousand IoT devices that send us their temperature every second. The input source can be MQTT or JSON (or a queue if needed). Our goal is to near continuously process data for each of ...
NorwegianClassic's user avatar
2 votes
1 answer
372 views

Context - Application We have an Apache Flink application which processes events The application uses event time characteristics The application shards (keyBy) events based on the sessionId field The ...
Peter Csala's user avatar
  • 23.7k
1 vote
1 answer
384 views

I am trying to read data from a kafka topic do some processing and dump the data into elasticsearch. But I could not find example in python ti use Elastisearch as sink. Can anyone help me with a ...
Madhuri Desai's user avatar
4 votes
3 answers
5k views

Im new to pyflink. Im tryig to write a python program to read data from kafka topic and prints data to stdout. I followed the link Flink Python Datastream API Kafka Producer Sink Serializaion. But i ...
Madhuri Desai's user avatar
0 votes
2 answers
66 views

I cannot find this in the Hazelcast Jet 5.0 (or 4.x) documentation, so I hope someone can answer this here - can a reliable topic be used as an idempotent sink, for example to de-duplicate events ...
siddhadev's user avatar
  • 16.6k
0 votes
1 answer
518 views

I'm working on a project that will determine whether or not I score an internship. The project focuses on stream processing and is due in 2 weeks. It's pretty simple, just deriving some statistics ...
geisha-and-guis's user avatar
0 votes
1 answer
439 views

I have a relatively basic use case. My data lives in a few 100 kafka partitions and I need to pass the events through a map operator before I send them to a custom HTTP sink. For performance reasons ...
jlunavtgrad's user avatar
  • 1,015

1
2 3 4 5 6