Newest 'beam-sql' Questions

1 vote

1 answer

60 views

How to define nullable fields for SqlTransform

I'm using Beam SqlTransform in python, trying to define/pass nullable fields. This code works just fine: with beam.Pipeline(options=options) as p: # ... # Use beam.Row to create a schema-aware ...

Yair Maron

1,978

asked Jun 2, 2025 at 11:04

0 votes

1 answer

129 views

Apache Beam SqlTransform does not process data distributed. It doesn't use multiple workers. How to deal with Dataflow pipeline "straggler detected"

Dataflow pipeline straggler I run a pipeline with a SqlTransform component. The SqlTransform compute some windowing aggregates like rolling average. The pipeline is set to use up to three workers but ...

crbl

397

asked Feb 28, 2024 at 16:10

0 votes

1 answer

243 views

How do I use Apache Beam to trigger an aggregation based on a new incoming event?

Problem: I'm building a mobile game app with real-time scoring. Each time a player performs an action, it sends a message to Pub/Sub with the following keys: {"event_ts","event_id",...

Sid

1

asked Mar 15, 2022 at 23:35

1 vote

0 answers

193 views

Python Beam SqlTransform unknown coder exception when getting data from PTransform

There's this PTransform that is mapping data to a beam.Row: class MapToBeamRow(beam.PTransform): def expand(self, pcoll: PCollection[Any]) -> PCollection[beam.Row]: return ( ...

Ben Konz

53

asked Mar 2, 2022 at 15:50

0 votes

2 answers

394 views

Exception while writing multipart empty csv file from Apache Beam into netApp Storage Grid

Problem Statement We are consuming multiple csv files into pcollections -> Apply beam SQL to transform data -> write resulted pcollection. This is working absolutely fine if we have some data in ...

Jaysukh Kalasariya

73

asked Feb 3, 2022 at 10:17

1 vote

2 answers

2k views

How to convert PCollection<TableRow> to PCollection<KV<String, String>> in JAVA

I'm trying to convert a tablerow containing multiple values to a KV. I can achieve this in a DoFn but that adds more complexity to the code that I want to write further and makes my job harder. (...

Shriyut Jha

79

asked Oct 29, 2021 at 17:39

0 votes

1 answer

491 views

Dataflow / Beam Accumulator coder

I am developing a Dataflow pipeline that uses the SqlTransform Library and also the beam aggregation function defined in org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf . Here a slide of ...

Pato Navarro

340

asked Oct 26, 2021 at 9:47

0 votes

1 answer

525 views

How to output nested Row from Beam SQL (SqlTransform)?

I want to have Row with nested Row from output of Beam SQL (SqlTransform), but failing. Questions: What is the proper way to output Row with nested Row from SqlTransform? (Row type is described in ...

Shinichi TAMURA

150

asked Sep 18, 2021 at 6:23

0 votes

2 answers

3k views

TypeError: expected bytes, str found [while running 'Writing to DB/ParDo(_WriteToRelationalDBFn) while writing to db from using beam-nuggets

@mohaseeb I am trying below example to write data from pub\sub to postgresql.Getting below error while writing pub\sub data into postgresql. "/usr/local/lib/python3.7/site-packages/sqlalchemy/...

Ramesh3076

3

asked Jan 23, 2021 at 17:47

0 votes

1 answer

509 views

How to cast int to boolean when doing SQL transform in Apache Beam

I'm trying to do a SQL transform with Apache Beam using Calcite SQL syntax. I'm doing an int to boolean cast. My sql looks like this: ,CASE WHEN cast(IsService as BOOLEAN) THEN CASE WHEN IsEligible ...

artofdoe

197

asked Jan 8, 2021 at 1:52

0 votes

2 answers

169 views

How to specify BeamSQL UDF for Numeric Types

I'm trying to add a User Defined Function (UDF) to a SqlTransform in a Beam pipeline, and the SQL parser doesn't seem to understand the function's type. The error i get is: No match found for ...

Mark P Neyer

1,009

asked Nov 16, 2020 at 22:49

0 votes

1 answer

476 views

How to integrate Beam SQL windowing query with KafkaIO?

First, we have a kafka input source in JSON format: {"event_time": "2020-08-23 18:36:10", "word": "apple", "cnt": 1} {"event_time": "...

wumrwds

1

asked Aug 24, 2020 at 17:40

1 vote

1 answer

406 views

Apache Beam SQL error in Python - ValueError: Unsupported type: Any

I wrote an example based on the following code https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py I am getting an error message /usr/local/lib/python3.6/dist-...

Jean Bouez

41

asked Aug 16, 2020 at 9:59

0 votes

0 answers

232 views

Beam SQL CURRENT_TIMESTAMP

My Unix Spark Server timezone is CDT but when I'm running Beam SQL CURRENT_TIMESTAMP as below it is always coming as UTC. I tried locally also but it is always displaying UTC. I want this to be CDT ...

Syed Mohammed Mehdi

313

asked Aug 12, 2020 at 12:09

1 vote

0 answers

102 views

BEAM SQL and RECORD column type

I am trying to select records from a data file into a PCollection using Beam SQL. My data file has the below AVRO schema: "name":"str-field", "type":[ "null", &...

Vinod

123

asked Jun 23, 2020 at 0:28

0 votes

2 answers

683 views

How to select a set of fields from input data as an array of repeated fields in beam SQL

Problem Statement: I have an input PCollection with following fields: { firstname_1, lastname_1, dob, firstname_2, lastname_2, firstname_3, lastname_3, } then I execute ...

Spiriter_rider

83

asked May 29, 2020 at 14:37

0 votes

1 answer

962 views

row_number in Apache Beam SQL

I'm trying to generate row_number using Apache Beam SQL with below code: PCollection<Row> rwrtg = PCollectionTuple.of(new TupleTag<>("trrtg"), rrtg) .apply(...

Syed Mohammed Mehdi

313

asked Apr 19, 2020 at 12:24

1 vote

1 answer

2k views

What's the difference between Dataflow sql, Beam SQL (Zeta sql or CALCITE SQL)?

While browsing I just came across Dataflow SQL. Is it any different from beamSQL?

Krishnakumar Konar

393

asked Feb 17, 2020 at 12:01

0 votes

1 answer

358 views

How can I increase the thread stack size on Apache Beam pipeline workers with Google Cloud Dataflow?

I'm getting a StackOverflowError on my Beam workers due to running out the thread stack, and because it's deep within the running of a SqlTransform it's not straightforward to reduce the number of ...

wrp

99

asked Dec 4, 2019 at 18:19

0 votes

2 answers

944 views

Errors trying to start ZetaSQL planner

I'm trying to run a Beam pipeline with SQL transforms, parsed with ZetaSQL. I begin with setting options with options.setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner"); ...

wrp

99

asked Dec 3, 2019 at 21:40

0 votes

2 answers

398 views

Apache beam get kafka data execute SQL error:Cannot call getSchema when there is no schema

I will input data of multiple tables to kafka, and beam will execute SQL after getting the data, but now there are the following errors: Exception in thread "main" java.lang.IllegalStateException: ...

smarctor

3

asked Nov 22, 2019 at 10:17

2 votes

1 answer

1k views

Query Avro Schema using Beam SQL

I'm trying to read avro files with Apache Beam and use Beam SQL to transform the data. I'm still new in Beam and Java. Here's my simple code: public class BeamSQLReadAvro { @SuppressWarnings("...

Yusata

309

asked Oct 28, 2019 at 11:43

0 votes

1 answer

1k views

ZetaSQL Sample Using Apache beam

I am Facing Issues while Using ZetaSQL in Apache beam Framework (2.17.0-SNAPSHOT). After Going through documentation of the apache beam I am not able to find any sample for ZetaSQL. I tried to add ...

BackBenChers

304

asked Oct 15, 2019 at 8:21

0 votes

1 answer

418 views

How can we use row_number() in apache beam sql

I tried that but getting following error. eg: SELECT RELEASE_ORDER_KEY,ORDER_LINE_KEY,ORDER_HEADER_KEY,ROW_NUMBER() OVER (PARTITION BY ORDER_LINE_KEY ORDER BY RELEASE_ORDER_KEY) row_num FROM ...

Manish Bajpai

11

asked Sep 17, 2019 at 4:22

2 votes

1 answer

83 views

Beam SQL Not Firing

I am building a simple prototype wherein I am reading data from Pubsub and using BeamSQL, code snippet as below val eventStream: SCollection[String] = sc.pubsubSubscription[String]("projects/...

Jayadeep Jayaraman

2,845

asked Aug 22, 2019 at 18:38

0 votes

1 answer

1k views

Delete Big query table using Apache Beam java

Is it possible to delete a table available in bigQuery using Apache beam using Java? p.apply("Delete Table name", BigQueryIO.readTableRows().fromQuery("DELETE FROM Table_name where condition"));

Pirama

11

asked Aug 22, 2019 at 14:35

1 vote

1 answer

536 views

What is the alternative for side inputs in apache beam

I am trying to join multiple kafka streams & lookups using Apache Beam. Im using side inputs for handling lookup tables and everything worked out in direct runner. But, when i try to run it in ...

Gowtham

87

asked Aug 1, 2019 at 12:07

2 votes

2 answers

567 views

Apache beam: SQL aggregation outputs no results for Unbounded/Bounded join

I am working on an apache beam pipeline to run a SQL aggregation function.Reference: https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/...

Akshata

1,025

asked Jul 19, 2019 at 18:55

0 votes

1 answer

2k views

Apache calcite: cast integer to datetime

I am using Beam SQL and trying to cast integer to datetime field. Schema resultSchema = Schema.builder() .addInt64Field("detectedCount") .addStringField("sensor") ....

Akshata

1,025

asked Jul 19, 2019 at 18:34

0 votes

1 answer

2k views

How to remove duplicates in sliding window - Apache Beam

I have implemented a data pipeline with multiple unbounded sources & side inputs, join data with sliding window (30s & every 10s) and emit the transformed output into a Kafka Topic. The issue ...

Gowtham

87

asked Jul 17, 2019 at 4:43

1 vote

1 answer

2k views

How to refresh/reload side input on every window

I am using Apache beam to join multiple streams along with some lookups. I have 2 scenarios, If, the lookup size is huge, I wanted the side input to reload/refresh for every record processing (i.e. I ...

Gowtham

87

asked Jul 15, 2019 at 4:29

2 votes

2 answers

2k views

How to fix "Joining unbounded PCollections is currently only supported for non-global windows with triggers" in Apache Beam

I'm trying to join 2 unbounded sources using Apache Beam Java SDK. While Joining Im getting the below error message. Exception in thread "main" java.lang.UnsupportedOperationException: Joining ...

Gowtham

87

asked Jul 8, 2019 at 3:50

0 votes

3 answers

2k views

Apache Beam SQLTransform: Cannot call getSchema when there is no schema

I am trying to apply SQLTransform on a PCollection<Object>. Here, CustomSource transform generates a Pojo at runtime.Hence, the type of the Object on which the SQLTransform runs is not known at ...

Akshata

1,025

asked Jul 3, 2019 at 22:22

0 votes

1 answer

300 views

Beam SQL - SqlValidatorException: Object 'PCOLLECTION' not found

I am doing some experiments with Beam SQL. I get a PCollection<Row> from the transform SampleSource and pass its output to a SqlTransform. String sql1 = "select c1, c2, c3 from PCOLLECTION ...

Akshata

1,025

asked Jul 1, 2019 at 21:43

0 votes

2 answers

769 views

What is the equivalent Data type for Numeric in apache.beam.sdk.schemas.Schema.FieldType

Trying to write the data into BigQuery table using BeamSQL. To write the data we need schema of that data. Used org.apache.beam.sdk.schemas for defining schema of the data collection. We have Numeric ...

lourdu rajan

379

asked Jun 19, 2019 at 6:43

1 vote

1 answer

670 views

Build Nested structure using BeamSQL

In BigQuery we have "ARRAY_AGG" function which helps to convert the normal collection to Nested collection. Is there a similar way to build same kind of nested structure collection using BeamSQL?. ...

lourdu rajan

379

asked May 30, 2019 at 12:53

1 vote

1 answer

671 views

BeamSQL Group By query problem with Float value

Tried to get the unique value from the BigQuery table using BeamSQL in Google Dataflow. Using Group By clause implemented the condition in BeamSQL (sample query below). One of the column has float ...

lourdu rajan

379

asked May 29, 2019 at 13:30

1 vote

1 answer

273 views

How to add google cloud pubsub as a source in Beam SQL shell?

I am trying out BeamSQL in shell and want to test how unbounded sources work in terms of usability and performance. Reading the documentation over here, I created an external table as follows- CREATE ...

Abhishek

717

asked May 16, 2019 at 9:43

0 votes

1 answer

712 views

Unnest the nested PCollection using BeamSQL

Try to use BeamSQL for unnest the nested type of PCollection. Lets assume the PCollection which have the Employees and its details. Here details are in nested collection. So if we use the BeamSQL like ...

lourdu rajan

379

asked May 7, 2019 at 17:50

0 votes

1 answer

140 views

Can't call `ApproximateDistinct.ApproximateDistinctFn` from ApacheBeam sql

Trying to use aggregate function ApproximateDistinct.ApproximateDistinctFn from apache beam sql, this failed. my SQL: SELECT ApproximateDistinct(user_id) as distinct_count, profile, ...

Brachi

747

asked Apr 8, 2019 at 10:18

0 votes

1 answer

343 views

RexCall cannot be cast to RexInputRef exception in Apache Beam SQL

I'm trying to do a simple join using Beam SQL but I'm getting an exception while compilation: Exception in thread "main" java.lang.ClassCastException: org.apache.beam.repackaged....

rish0097

1,104

asked Feb 12, 2019 at 14:01

0 votes

1 answer

2k views

Apache beam SqlTransforms schema issue

I'm trying to perform ETL which involves loading files from HDFS, apply transforms and write them to Hive. While using SqlTransforms for performing transformations by following this doc, I'm ...

Bluecrow

609

asked Oct 25, 2018 at 6:05

0 votes

2 answers

5k views

How does Calcite deal with data conversion?

I am trying to convert a date that's stored as a string to a date, e.g. YYYYMMDD (string) to YYYY-MM-DD (date) As far as I know there is no conversion function that checks input format and output ...

Agni

11

asked Oct 17, 2018 at 19:07

0 votes

1 answer

1k views

Beam SQL won't work when using aggregation in statement: "Cannot plan execution"

I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and writes the results to BigQuery. When I don't do any aggregation in my SQL statement it works fine: .. PCollection<...

Graham Polley

14.8k

asked Sep 13, 2018 at 12:01

0 votes

1 answer

1k views

Beam SQL / Apache Beam is Slower when Running Multiple Joins

While doing joins on 2 tables using Beam SQL then it's working properly provide expected performance but as my Joining Tables increases then the performance become worst. Below is my snippet which ...

BackBenChers

304

asked Aug 14, 2018 at 4:46

0 votes

1 answer

96 views

Is there a work around for 'LIKE' in BeamSQL?

We have an Apache Beam 2.4.0 pipeline that runs BeamSql queries. In BeamSql the SQL statement 'LIKE' throws an exception 'LIKE is not implemented yet'. Is there a work around for 'LIKE' in BeamSql? We ...

Anna Kasikova

37

asked May 23, 2018 at 2:44

Collectives™ on Stack Overflow