Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
0 answers
34 views

after coming across these KIPs: https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC. https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+...
Chris Nicholas's user avatar
-2 votes
0 answers
40 views

Setup: We have a Flink (v1.18.1) job deployed over 5 task managers. State is stored in RocksDB (using local SSD drives) and incremental checkpoints enabled every minute. Serde: PojoType State format: ...
Dan's user avatar
  • 231
1 vote
0 answers
63 views

i encountered one kafka sink exception when started from a savepoint. msg as below: java.lang.IllegalStateException: Received element after endOfInput: Record @ (undef) : org.apache.flink.table.data....
user32053573's user avatar
0 votes
0 answers
57 views

I am trying to run a very simple Flink (Java) job that: Creates an Iceberg JDBC catalog backed by PostgreSQL Sets the Iceberg warehouse to the Hadoop FileSystem The job is built successfully with ...
Tai Lu's user avatar
  • 43
Best practices
0 votes
2 replies
35 views

I'm mantaining a Flink application and I'm confused about which classes need to be POJOs (serializable) for Flink to reach the State compatibility between different versions of the app. What I ...
Marco's user avatar
  • 54
1 vote
1 answer
50 views

The Flink docs mention channels and gates. I am having difficulties inferring what a channel and what a gate is and how they differ. Are these merely logical abstractions or is there also a one-to-one ...
keezar's user avatar
  • 111
1 vote
2 answers
42 views

Flink allows to define requirements for CPU cores using fine-grained resource management. I am wondering if this CPU request is strictly enforced or best effort? Example: A task manager has 4 CPU ...
keezar's user avatar
  • 111
-3 votes
1 answer
159 views

Issue: Flink application throws Thread 'jobmanager-io-thread-25' produced an uncaught exception. java.lang.OutOfMemoryError: Direct buffer memory and terminates after running for 2-3 days. No matter ...
Strange's user avatar
  • 1,514
Advice
0 votes
0 replies
87 views

I’m running a Flink DataStream job that reads events from a Kafka topic and writes them into an Apache Iceberg table using the REST catalog (Lakekeeper). Authentication to the REST catalog is ...
Andrey's user avatar
  • 47
0 votes
2 answers
113 views

I'm building a single global Topology object in a non-keyed ProcessFunction with parallelism = 1. I keep it as a local mutable object and update it for every input event using topology.apply(GnmiEvent)...
Kvn's user avatar
  • 1
-1 votes
1 answer
47 views

I have a Flink job with multiple downstream operators I want to route tuples to based on a condition. Side outputs are advertised for this use case in the Flink documentation. However, when sending ...
keezar's user avatar
  • 111
0 votes
1 answer
56 views

I'm upgrading a PyFlink job to 2.0 and want to write to a Kafka compacted topic using the new KafkaSink. The stream produces (key, value) tuples (key is a string, value is a JSON payload). I configure ...
Sudhakar's user avatar
Advice
0 votes
1 replies
54 views

we are using Flink's AsyncIO function with Futures to make external gRPC calls. Currently, we have set the async capacity to 1, and we are using a blocking stub to make those calls. For each event, we ...
Sidharth Ramalingam's user avatar
0 votes
1 answer
70 views

Flink Version:1.17.1 There is a KeyedBroadcastProcessFunction in my project like this: public class MyOperator extends KeyedBroadcastProcessFunction<..> { private MapState<String, String&...
jinqiangshou's user avatar
0 votes
1 answer
49 views

Question Async operation & Future callback was added as the State API was upgraded to v2. Will it be thread-safe to call the Timer service & Collector from that callback? Example final var ...
thekey.kim's user avatar
0 votes
1 answer
66 views

I'm using a ValueState with TTL and I want to understand the difference (if any) in the checkpointed state size/memory between two scenarios: First scenario I create/obtain the ValueState but never ...
Marco's user avatar
  • 54
0 votes
0 answers
74 views

We recently refactored all of our flink jobs to use a single vertex with no splitting. Since the change the flink autoscaler is having issues calculating when to scale up or down. We aren't seeing any ...
James Parker's user avatar
1 vote
2 answers
63 views

In my Flink app, I found this log: Field EnrichmentInfo#groupIds will be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the ...
Marco's user avatar
  • 54
0 votes
1 answer
51 views

I have an Apache Flink app that is using a kafksink with a setTopicSelector KafkaSink<T>> sink = KafkaSink.<T>>builder() .setBootstrapServers(sink_brokers) ...
raphaelauv's user avatar
  • 1,039
0 votes
0 answers
57 views

I am currently running a small application that periodically polls data from my DB and then puts it in a Kafka topic. While running the application code independently, when I comment my Kafka sink, ...
Parth Vyas's user avatar
0 votes
0 answers
69 views

I am compiling a java project that uses maven to manage the project. The java version is 17.0.16 I included the following dependency inside the dependencies section (i.e. ...) <dependency> &...
Ricardo De la Rosa's user avatar
0 votes
0 answers
59 views

I'm using flink 1.20.2. My flink job reads a table from a parquet file in 3, which had data for many tenants. It gets the list of distinct tenant IDs, then inserts those to new parquet files, 1 ...
LuckyGambler's user avatar
1 vote
0 answers
57 views

I am trying to use AvroRowSerializationSchema with PyFlink 2.1.0, but I keep getting a Py4JError saying that the class does not exist in the JVM, even though I have the right JARs. Environment: ...
Edgar Meva's user avatar
0 votes
1 answer
33 views

We have an Flink job (batch mode) that runs on AWS KDA (ver: 1.20.0), where its logical operators look like: FileSource -> map() -> AssignTimestamps() -> filter() -> keyBy -> ...
JackatWaterloo's user avatar
0 votes
0 answers
28 views

I have an application that Streams data from Kafka Inserts the data received into Flink Table-Api Perform Join on tables and emit event StreamExecutionEnvironment and StreamTableEnvironment is used ...
Kamakshi's user avatar
-4 votes
1 answer
96 views

We recently experimented with Flink, in particular BATCH execution mode to setup an ETL job processing an bounded data-set. It works quite well but I'd like to get some clarifications about my ...
JackatWaterloo's user avatar
0 votes
1 answer
39 views

I submit a Flink job to Hadoop-Yarn, and use Flink application mode. Everything is normal on the client side, but the app master starts failing on the NodeManager, with the following logs. ...
menghe's user avatar
  • 11
3 votes
1 answer
92 views

We want to track all visits by country. so, our click tracker will send payload containing country to its corresponding topic (1 country maps to 1 topic) where one visit, no matter the page, so long ...
caballeros's user avatar
0 votes
0 answers
67 views

I'm using Flink CDC + Apache Hudi in Flink to sync data from MySQl to AWS S3. My Flink job looks like: parallelism = 1 env = StreamExecutionEnvironment.get_execution_environment(config) ...
Rinze's user avatar
  • 834
0 votes
1 answer
52 views

I have a Table API pipeline that does a 1-minute Tumbling Count aggregation over a set of 15 columns. FlameGraph shows that most of the CPU (~40%) goes into serializing each row, despite using ...
Tomás Cerdá's user avatar
0 votes
2 answers
76 views

I am building a Flink application roughly modeled after Flink's demo fraud detection application, where events come into my system out-of-order, are keyed by some criteria, and then are stored in a ...
Andrew Rueckert's user avatar
0 votes
1 answer
52 views

I have 2 table, both use kafka connector to read data. I join these source and write data to a another kafka topic. We checkpoint every 10 minutes, so when job restart, we use execution.savepoint.path ...
hitesh's user avatar
  • 389
0 votes
0 answers
36 views

The documentation displays a way to create a RemoteExecutionEnvironment in java: public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment ....
Calicoder's user avatar
  • 1,462
0 votes
0 answers
90 views

I have a Flink ETL job that reads from ~13 Kafka topics and writes data into HDFS using a FileSink with compaction enabled. Right now, we have around 40 different output paths (buckets), and roughly ...
Hello's user avatar
  • 1
0 votes
0 answers
65 views

We use Azure Event Hubs (Kafka API) with Apache Flink consumers, and a shared Cassandra DB as the sink. There are 7 Event Hubs (one per application) → each has its own Flink consumer writing to the ...
Sadhanala Akhil Kumar's user avatar
0 votes
1 answer
132 views

When I use in Apache Flink the KafkaRecordSerializationSchema with settings for the schema registry serialization , the registryConfigs settings are not taken in account settings like auto.register....
raphaelauv's user avatar
  • 1,039
0 votes
1 answer
71 views

I have the following config set for my job 'table.exec.sink.upsert-materialize': 'NONE', 'table.exec.mini-batch.enabled': true, 'table.exec.mini-batch.allow-latency'...
hitesh's user avatar
  • 389
0 votes
0 answers
76 views

The Apache Flink 2.1 does not support mongodb python connectors. So I make the sample python codes by using SinkFunction. from pyflink.datastream import StreamExecutionEnvironment from pyflink....
Joseph Hwang's user avatar
  • 1,433
0 votes
0 answers
24 views

When kafkaSource connected a broadcastStream is set with idleness, the watermark of downStream is abnormal. My question is how to make the watermark normal to use in window. Here's a case. KafkaSource ...
pre5T's user avatar
  • 1
0 votes
0 answers
31 views

I have a topic in kafka with the following messages: {"time":"2025-07-31T17:25:31.483425Z","EventID":4624,"ComputerName":"workstation"} {"time&...
Adrian Cincu's user avatar
2 votes
1 answer
362 views

So, I'm getting started on researching Apache Flink and Temporal to understand if they can be integrated into my current stack. So far, what I understand is both Flink and Temporal is used to replay ...
caballeros's user avatar
2 votes
1 answer
145 views

I am using Flink with Debezium to consume CDC changes from Oracle DB tables via LogMiner. For some tables, everything works fine. For example, the following table works without issues: CREATE TABLE ...
Parth Vyas's user avatar
0 votes
0 answers
71 views

I'm encountering a Flink job failure and would appreciate any input on what might be misconfigured: 2025‑07‑28 17:30:52 org.apache.flink.runtime.JobException: Recovery is suppressed by ...
iman soltani's user avatar
2 votes
2 answers
130 views

This question is specific to AWS Managed Flink (1.19). I know how to control logging in most Java apps, but everything I know is failing in this case. I have a Java application running in Amazon ...
dlipofsky's user avatar
  • 421
0 votes
1 answer
48 views

I would like to understand better the functioning of Buffer Debloating in Apache Flink. Assume that: I have a Flink job structured like a pipeline (A -> B -> C -> D -> E) aligned ...
Marco's user avatar
  • 54
0 votes
0 answers
33 views

I’m working with Apache Flink 1.16 on an ETL job that reads data from Kafka and writes the output to HDFS in Parquet format. I’m using a FileSink with BulkFormat (ParquetAvroWriters), a ...
Ronmeir's user avatar
  • 23
0 votes
1 answer
65 views

I was using Flink in batch mode to read data from one source and then directly write the data into file system as Parquet format. The code was like: hudi_source_ddl = f""" ...
Rinze's user avatar
  • 834
0 votes
0 answers
90 views

Context : Flink Deployment Application cluster Mode Usecase : Flink Job to read data from kafka , transform and produce to kafka Deployment : On k8s cluster using k8s Flink operator To start with ...
Jaiprasad's user avatar
  • 339
0 votes
0 answers
52 views

Flink CDC Oracle Task Fails with DebeziumException: The db history topic is missing I have set up Flink and used it to synchronize Oracle data to Doris. Now I’m testing multiple Oracle ...
jq l's user avatar
  • 1
0 votes
1 answer
33 views

I'm using Flink SQL to join data from multiple Kafka topics. Sometimes the resulting join is non-trivial to debug, and I want to log the state access involved in the join — specifically, I’d like to ...
hitesh's user avatar
  • 389

1
2 3 4 5
159