Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
60 views

I'm using Beam SqlTransform in python, trying to define/pass nullable fields. This code works just fine: with beam.Pipeline(options=options) as p: # ... # Use beam.Row to create a schema-aware ...
Yair Maron's user avatar
  • 1,978
0 votes
1 answer
129 views

Dataflow pipeline straggler I run a pipeline with a SqlTransform component. The SqlTransform compute some windowing aggregates like rolling average. The pipeline is set to use up to three workers but ...
crbl's user avatar
  • 397
0 votes
1 answer
243 views

Problem: I'm building a mobile game app with real-time scoring. Each time a player performs an action, it sends a message to Pub/Sub with the following keys: {"event_ts","event_id",...
Sid's user avatar
  • 1
1 vote
0 answers
193 views

There's this PTransform that is mapping data to a beam.Row: class MapToBeamRow(beam.PTransform): def expand(self, pcoll: PCollection[Any]) -> PCollection[beam.Row]: return ( ...
Ben Konz's user avatar
0 votes
2 answers
394 views

Problem Statement We are consuming multiple csv files into pcollections -> Apply beam SQL to transform data -> write resulted pcollection. This is working absolutely fine if we have some data in ...
Jaysukh Kalasariya's user avatar
1 vote
2 answers
2k views

I'm trying to convert a tablerow containing multiple values to a KV. I can achieve this in a DoFn but that adds more complexity to the code that I want to write further and makes my job harder. (...
Shriyut Jha's user avatar
0 votes
1 answer
491 views

I am developing a Dataflow pipeline that uses the SqlTransform Library and also the beam aggregation function defined in org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf . Here a slide of ...
Pato Navarro's user avatar
0 votes
1 answer
525 views

I want to have Row with nested Row from output of Beam SQL (SqlTransform), but failing. Questions: What is the proper way to output Row with nested Row from SqlTransform? (Row type is described in ...
Shinichi TAMURA's user avatar
0 votes
2 answers
3k views

@mohaseeb I am trying below example to write data from pub\sub to postgresql.Getting below error while writing pub\sub data into postgresql. "/usr/local/lib/python3.7/site-packages/sqlalchemy/...
Ramesh3076's user avatar
0 votes
1 answer
509 views

I'm trying to do a SQL transform with Apache Beam using Calcite SQL syntax. I'm doing an int to boolean cast. My sql looks like this: ,CASE WHEN cast(IsService as BOOLEAN) THEN CASE WHEN IsEligible ...
artofdoe's user avatar
  • 197
0 votes
2 answers
169 views

I'm trying to add a User Defined Function (UDF) to a SqlTransform in a Beam pipeline, and the SQL parser doesn't seem to understand the function's type. The error i get is: No match found for ...
Mark P Neyer's user avatar
  • 1,009
0 votes
1 answer
476 views

First, we have a kafka input source in JSON format: {"event_time": "2020-08-23 18:36:10", "word": "apple", "cnt": 1} {"event_time": "...
wumrwds's user avatar
1 vote
1 answer
406 views

I wrote an example based on the following code https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py I am getting an error message /usr/local/lib/python3.6/dist-...
Jean Bouez's user avatar
0 votes
0 answers
232 views

My Unix Spark Server timezone is CDT but when I'm running Beam SQL CURRENT_TIMESTAMP as below it is always coming as UTC. I tried locally also but it is always displaying UTC. I want this to be CDT ...
Syed Mohammed Mehdi's user avatar
1 vote
0 answers
102 views

I am trying to select records from a data file into a PCollection using Beam SQL. My data file has the below AVRO schema: "name":"str-field", "type":[ "null", &...
Vinod's user avatar
  • 123
0 votes
2 answers
683 views

Problem Statement: I have an input PCollection with following fields: { firstname_1, lastname_1, dob, firstname_2, lastname_2, firstname_3, lastname_3, } then I execute ...
Spiriter_rider's user avatar
0 votes
1 answer
962 views

I'm trying to generate row_number using Apache Beam SQL with below code: PCollection<Row> rwrtg = PCollectionTuple.of(new TupleTag<>("trrtg"), rrtg) .apply(...
Syed Mohammed Mehdi's user avatar
1 vote
1 answer
2k views

While browsing I just came across Dataflow SQL. Is it any different from beamSQL?
Krishnakumar Konar's user avatar
0 votes
1 answer
358 views

I'm getting a StackOverflowError on my Beam workers due to running out the thread stack, and because it's deep within the running of a SqlTransform it's not straightforward to reduce the number of ...
wrp's user avatar
  • 99
0 votes
2 answers
944 views

I'm trying to run a Beam pipeline with SQL transforms, parsed with ZetaSQL. I begin with setting options with options.setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner"); ...
wrp's user avatar
  • 99
0 votes
2 answers
398 views

I will input data of multiple tables to kafka, and beam will execute SQL after getting the data, but now there are the following errors: Exception in thread "main" java.lang.IllegalStateException: ...
smarctor's user avatar
2 votes
1 answer
1k views

I'm trying to read avro files with Apache Beam and use Beam SQL to transform the data. I'm still new in Beam and Java. Here's my simple code: public class BeamSQLReadAvro { @SuppressWarnings("...
Yusata's user avatar
  • 309
0 votes
1 answer
1k views

I am Facing Issues while Using ZetaSQL in Apache beam Framework (2.17.0-SNAPSHOT). After Going through documentation of the apache beam I am not able to find any sample for ZetaSQL. I tried to add ...
BackBenChers's user avatar
0 votes
1 answer
418 views

I tried that but getting following error. eg: SELECT RELEASE_ORDER_KEY,ORDER_LINE_KEY,ORDER_HEADER_KEY,ROW_NUMBER() OVER (PARTITION BY ORDER_LINE_KEY ORDER BY RELEASE_ORDER_KEY) row_num FROM ...
Manish Bajpai's user avatar
2 votes
1 answer
83 views

I am building a simple prototype wherein I am reading data from Pubsub and using BeamSQL, code snippet as below val eventStream: SCollection[String] = sc.pubsubSubscription[String]("projects/...
Jayadeep Jayaraman's user avatar
0 votes
1 answer
1k views

Is it possible to delete a table available in bigQuery using Apache beam using Java? p.apply("Delete Table name", BigQueryIO.readTableRows().fromQuery("DELETE FROM Table_name where condition"));
Pirama's user avatar
  • 11
1 vote
1 answer
536 views

I am trying to join multiple kafka streams & lookups using Apache Beam. Im using side inputs for handling lookup tables and everything worked out in direct runner. But, when i try to run it in ...
Gowtham's user avatar
  • 87
2 votes
2 answers
567 views

I am working on an apache beam pipeline to run a SQL aggregation function.Reference: https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/...
Akshata's user avatar
  • 1,025
0 votes
1 answer
2k views

I am using Beam SQL and trying to cast integer to datetime field. Schema resultSchema = Schema.builder() .addInt64Field("detectedCount") .addStringField("sensor") ....
Akshata's user avatar
  • 1,025
0 votes
1 answer
2k views

I have implemented a data pipeline with multiple unbounded sources & side inputs, join data with sliding window (30s & every 10s) and emit the transformed output into a Kafka Topic. The issue ...
Gowtham's user avatar
  • 87
1 vote
1 answer
2k views

I am using Apache beam to join multiple streams along with some lookups. I have 2 scenarios, If, the lookup size is huge, I wanted the side input to reload/refresh for every record processing (i.e. I ...
Gowtham's user avatar
  • 87
2 votes
2 answers
2k views

I'm trying to join 2 unbounded sources using Apache Beam Java SDK. While Joining Im getting the below error message. Exception in thread "main" java.lang.UnsupportedOperationException: Joining ...
Gowtham's user avatar
  • 87
0 votes
3 answers
2k views

I am trying to apply SQLTransform on a PCollection<Object>. Here, CustomSource transform generates a Pojo at runtime.Hence, the type of the Object on which the SQLTransform runs is not known at ...
Akshata's user avatar
  • 1,025
0 votes
1 answer
300 views

I am doing some experiments with Beam SQL. I get a PCollection<Row> from the transform SampleSource and pass its output to a SqlTransform. String sql1 = "select c1, c2, c3 from PCOLLECTION ...
Akshata's user avatar
  • 1,025
0 votes
2 answers
769 views

Trying to write the data into BigQuery table using BeamSQL. To write the data we need schema of that data. Used org.apache.beam.sdk.schemas for defining schema of the data collection. We have Numeric ...
lourdu rajan's user avatar
1 vote
1 answer
670 views

In BigQuery we have "ARRAY_AGG" function which helps to convert the normal collection to Nested collection. Is there a similar way to build same kind of nested structure collection using BeamSQL?. ...
lourdu rajan's user avatar
1 vote
1 answer
671 views

Tried to get the unique value from the BigQuery table using BeamSQL in Google Dataflow. Using Group By clause implemented the condition in BeamSQL (sample query below). One of the column has float ...
lourdu rajan's user avatar
1 vote
1 answer
273 views

I am trying out BeamSQL in shell and want to test how unbounded sources work in terms of usability and performance. Reading the documentation over here, I created an external table as follows- CREATE ...
Abhishek's user avatar
  • 717
0 votes
1 answer
712 views

Try to use BeamSQL for unnest the nested type of PCollection. Lets assume the PCollection which have the Employees and its details. Here details are in nested collection. So if we use the BeamSQL like ...
lourdu rajan's user avatar
0 votes
1 answer
140 views

Trying to use aggregate function ApproximateDistinct.ApproximateDistinctFn from apache beam sql, this failed. my SQL: SELECT ApproximateDistinct(user_id) as distinct_count, profile, ...
Brachi's user avatar
  • 747
0 votes
1 answer
343 views

I'm trying to do a simple join using Beam SQL but I'm getting an exception while compilation: Exception in thread "main" java.lang.ClassCastException: org.apache.beam.repackaged....
rish0097's user avatar
  • 1,104
0 votes
1 answer
2k views

I'm trying to perform ETL which involves loading files from HDFS, apply transforms and write them to Hive. While using SqlTransforms for performing transformations by following this doc, I'm ...
Bluecrow's user avatar
  • 609
0 votes
2 answers
5k views

I am trying to convert a date that's stored as a string to a date, e.g. YYYYMMDD (string) to YYYY-MM-DD (date) As far as I know there is no conversion function that checks input format and output ...
Agni's user avatar
  • 11
0 votes
1 answer
1k views

I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and writes the results to BigQuery. When I don't do any aggregation in my SQL statement it works fine: .. PCollection<...
Graham Polley's user avatar
0 votes
1 answer
1k views

While doing joins on 2 tables using Beam SQL then it's working properly provide expected performance but as my Joining Tables increases then the performance become worst. Below is my snippet which ...
BackBenChers's user avatar
0 votes
1 answer
96 views

We have an Apache Beam 2.4.0 pipeline that runs BeamSql queries. In BeamSql the SQL statement 'LIKE' throws an exception 'LIKE is not implemented yet'. Is there a work around for 'LIKE' in BeamSql? We ...
Anna Kasikova's user avatar