46 questions
1
vote
1
answer
60
views
How to define nullable fields for SqlTransform
I'm using Beam SqlTransform in python, trying to define/pass nullable fields.
This code works just fine:
with beam.Pipeline(options=options) as p:
# ...
# Use beam.Row to create a schema-aware ...
0
votes
1
answer
129
views
Apache Beam SqlTransform does not process data distributed. It doesn't use multiple workers. How to deal with Dataflow pipeline "straggler detected"
Dataflow pipeline straggler
I run a pipeline with a SqlTransform component. The SqlTransform compute some windowing aggregates like rolling average. The pipeline is set to use up to three workers but ...
0
votes
1
answer
243
views
How do I use Apache Beam to trigger an aggregation based on a new incoming event?
Problem: I'm building a mobile game app with real-time scoring. Each time a player performs an action, it sends a message to Pub/Sub with the following keys:
{"event_ts","event_id",...
1
vote
0
answers
193
views
Python Beam SqlTransform unknown coder exception when getting data from PTransform
There's this PTransform that is mapping data to a beam.Row:
class MapToBeamRow(beam.PTransform):
def expand(self, pcoll: PCollection[Any]) -> PCollection[beam.Row]:
return (
...
0
votes
2
answers
394
views
Exception while writing multipart empty csv file from Apache Beam into netApp Storage Grid
Problem Statement
We are consuming multiple csv files into pcollections -> Apply beam SQL to transform data -> write resulted pcollection.
This is working absolutely fine if we have some data in ...
1
vote
2
answers
2k
views
How to convert PCollection<TableRow> to PCollection<KV<String, String>> in JAVA
I'm trying to convert a tablerow containing multiple values to a KV. I can achieve this in a DoFn but that adds more complexity to the code that I want to write further and makes my job harder.
(...
0
votes
1
answer
491
views
Dataflow / Beam Accumulator coder
I am developing a Dataflow pipeline that uses the SqlTransform Library and also the beam aggregation function defined in org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf .
Here a slide of ...
0
votes
1
answer
525
views
How to output nested Row from Beam SQL (SqlTransform)?
I want to have Row with nested Row from output of Beam SQL (SqlTransform), but failing.
Questions:
What is the proper way to output Row with nested Row from SqlTransform? (Row type is described in ...
0
votes
2
answers
3k
views
TypeError: expected bytes, str found [while running 'Writing to DB/ParDo(_WriteToRelationalDBFn) while writing to db from using beam-nuggets
@mohaseeb
I am trying below example to write data from pub\sub to postgresql.Getting below error while writing pub\sub data into postgresql. "/usr/local/lib/python3.7/site-packages/sqlalchemy/...
0
votes
1
answer
509
views
How to cast int to boolean when doing SQL transform in Apache Beam
I'm trying to do a SQL transform with Apache Beam using Calcite SQL syntax. I'm doing an int to boolean cast. My sql looks like this:
,CASE WHEN cast(IsService as BOOLEAN) THEN CASE WHEN IsEligible ...
0
votes
2
answers
169
views
How to specify BeamSQL UDF for Numeric Types
I'm trying to add a User Defined Function (UDF) to a SqlTransform in a Beam pipeline, and the SQL parser doesn't seem to understand the function's type. The error i get is:
No match found for ...
0
votes
1
answer
476
views
How to integrate Beam SQL windowing query with KafkaIO?
First, we have a kafka input source in JSON format:
{"event_time": "2020-08-23 18:36:10", "word": "apple", "cnt": 1}
{"event_time": "...
1
vote
1
answer
406
views
Apache Beam SQL error in Python - ValueError: Unsupported type: Any
I wrote an example based on the following code
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py
I am getting an error message
/usr/local/lib/python3.6/dist-...
0
votes
0
answers
232
views
Beam SQL CURRENT_TIMESTAMP
My Unix Spark Server timezone is CDT but when I'm running Beam SQL CURRENT_TIMESTAMP as below it is always coming as UTC. I tried locally also but it is always displaying UTC. I want this to be CDT ...
1
vote
0
answers
102
views
BEAM SQL and RECORD column type
I am trying to select records from a data file into a PCollection using Beam SQL.
My data file has the below AVRO schema:
"name":"str-field",
"type":[
"null",
&...
0
votes
2
answers
683
views
How to select a set of fields from input data as an array of repeated fields in beam SQL
Problem Statement:
I have an input PCollection with following fields:
{
firstname_1,
lastname_1,
dob,
firstname_2,
lastname_2,
firstname_3,
lastname_3,
}
then I execute ...
0
votes
1
answer
962
views
row_number in Apache Beam SQL
I'm trying to generate row_number using Apache Beam SQL with below code:
PCollection<Row> rwrtg =
PCollectionTuple.of(new TupleTag<>("trrtg"), rrtg)
.apply(...
1
vote
1
answer
2k
views
What's the difference between Dataflow sql, Beam SQL (Zeta sql or CALCITE SQL)?
While browsing I just came across Dataflow SQL. Is it any different from beamSQL?
0
votes
1
answer
358
views
How can I increase the thread stack size on Apache Beam pipeline workers with Google Cloud Dataflow?
I'm getting a StackOverflowError on my Beam workers due to running out the thread stack, and because it's deep within the running of a SqlTransform it's not straightforward to reduce the number of ...
0
votes
2
answers
944
views
Errors trying to start ZetaSQL planner
I'm trying to run a Beam pipeline with SQL transforms, parsed with ZetaSQL. I begin with setting options with
options.setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner");
...
0
votes
2
answers
398
views
Apache beam get kafka data execute SQL error:Cannot call getSchema when there is no schema
I will input data of multiple tables to kafka, and beam will execute SQL after getting the data, but now there are the following errors:
Exception in thread "main"
java.lang.IllegalStateException: ...
2
votes
1
answer
1k
views
Query Avro Schema using Beam SQL
I'm trying to read avro files with Apache Beam and use Beam SQL to transform the data.
I'm still new in Beam and Java. Here's my simple code:
public class BeamSQLReadAvro {
@SuppressWarnings("...
0
votes
1
answer
1k
views
ZetaSQL Sample Using Apache beam
I am Facing Issues while Using ZetaSQL in Apache beam Framework (2.17.0-SNAPSHOT). After Going through documentation of the apache beam I am not able to find any sample for ZetaSQL.
I tried to add ...
0
votes
1
answer
418
views
How can we use row_number() in apache beam sql
I tried that but getting following error.
eg:
SELECT RELEASE_ORDER_KEY,ORDER_LINE_KEY,ORDER_HEADER_KEY,ROW_NUMBER() OVER (PARTITION BY ORDER_LINE_KEY ORDER BY RELEASE_ORDER_KEY) row_num FROM ...
2
votes
1
answer
83
views
Beam SQL Not Firing
I am building a simple prototype wherein I am reading data from Pubsub and using BeamSQL, code snippet as below
val eventStream: SCollection[String] = sc.pubsubSubscription[String]("projects/...
0
votes
1
answer
1k
views
Delete Big query table using Apache Beam java
Is it possible to delete a table available in bigQuery using Apache beam using Java?
p.apply("Delete Table name", BigQueryIO.readTableRows().fromQuery("DELETE FROM Table_name where condition"));
1
vote
1
answer
536
views
What is the alternative for side inputs in apache beam
I am trying to join multiple kafka streams & lookups using Apache Beam. Im using side inputs for handling lookup tables and everything worked out in direct runner. But, when i try to run it in ...
2
votes
2
answers
567
views
Apache beam: SQL aggregation outputs no results for Unbounded/Bounded join
I am working on an apache beam pipeline to run a SQL aggregation function.Reference: https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/...
0
votes
1
answer
2k
views
Apache calcite: cast integer to datetime
I am using Beam SQL and trying to cast integer to datetime field.
Schema resultSchema =
Schema.builder()
.addInt64Field("detectedCount")
.addStringField("sensor")
....
0
votes
1
answer
2k
views
How to remove duplicates in sliding window - Apache Beam
I have implemented a data pipeline with multiple unbounded sources & side inputs, join data with sliding window (30s & every 10s) and emit the transformed output into a Kafka Topic. The issue ...
1
vote
1
answer
2k
views
How to refresh/reload side input on every window
I am using Apache beam to join multiple streams along with some lookups. I have 2 scenarios, If, the lookup size is huge, I wanted the side input to reload/refresh for every record processing (i.e. I ...
2
votes
2
answers
2k
views
How to fix "Joining unbounded PCollections is currently only supported for non-global windows with triggers" in Apache Beam
I'm trying to join 2 unbounded sources using Apache Beam Java SDK. While Joining Im getting the below error message.
Exception in thread "main" java.lang.UnsupportedOperationException: Joining ...
0
votes
3
answers
2k
views
Apache Beam SQLTransform: Cannot call getSchema when there is no schema
I am trying to apply SQLTransform on a PCollection<Object>. Here, CustomSource transform generates a Pojo at runtime.Hence, the type of the Object on which the SQLTransform runs is not known at ...
0
votes
1
answer
300
views
Beam SQL - SqlValidatorException: Object 'PCOLLECTION' not found
I am doing some experiments with Beam SQL.
I get a PCollection<Row> from the transform SampleSource and pass its output to a SqlTransform.
String sql1 = "select c1, c2, c3 from PCOLLECTION ...
0
votes
2
answers
769
views
What is the equivalent Data type for Numeric in apache.beam.sdk.schemas.Schema.FieldType
Trying to write the data into BigQuery table using BeamSQL. To write the data we need schema of that data. Used org.apache.beam.sdk.schemas for defining schema of the data collection. We have Numeric ...
1
vote
1
answer
670
views
Build Nested structure using BeamSQL
In BigQuery we have "ARRAY_AGG" function which helps to convert the normal collection to Nested collection. Is there a similar way to build same kind of nested structure collection using BeamSQL?. ...
1
vote
1
answer
671
views
BeamSQL Group By query problem with Float value
Tried to get the unique value from the BigQuery table using BeamSQL in Google Dataflow. Using Group By clause implemented the condition in BeamSQL (sample query below). One of the column has float ...
1
vote
1
answer
273
views
How to add google cloud pubsub as a source in Beam SQL shell?
I am trying out BeamSQL in shell and want to test how unbounded sources work in terms of usability and performance. Reading the documentation over here, I created an external table as follows-
CREATE ...
0
votes
1
answer
712
views
Unnest the nested PCollection using BeamSQL
Try to use BeamSQL for unnest the nested type of PCollection. Lets assume the PCollection which have the Employees and its details. Here details are in nested collection. So if we use the BeamSQL like ...
0
votes
1
answer
140
views
Can't call `ApproximateDistinct.ApproximateDistinctFn` from ApacheBeam sql
Trying to use aggregate function ApproximateDistinct.ApproximateDistinctFn from apache beam sql, this failed.
my SQL:
SELECT
ApproximateDistinct(user_id) as distinct_count,
profile,
...
0
votes
1
answer
343
views
RexCall cannot be cast to RexInputRef exception in Apache Beam SQL
I'm trying to do a simple join using Beam SQL but I'm getting an exception while compilation:
Exception in thread "main" java.lang.ClassCastException: org.apache.beam.repackaged....
0
votes
1
answer
2k
views
Apache beam SqlTransforms schema issue
I'm trying to perform ETL which involves loading files from HDFS, apply transforms and write them to Hive. While using SqlTransforms for performing transformations by following this doc, I'm ...
0
votes
2
answers
5k
views
How does Calcite deal with data conversion?
I am trying to convert a date that's stored as a string to a date, e.g.
YYYYMMDD (string) to YYYY-MM-DD (date)
As far as I know there is no conversion function that checks input format and output ...
0
votes
1
answer
1k
views
Beam SQL won't work when using aggregation in statement: "Cannot plan execution"
I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and writes the results to BigQuery.
When I don't do any aggregation in my SQL statement it works fine:
..
PCollection<...
0
votes
1
answer
1k
views
Beam SQL / Apache Beam is Slower when Running Multiple Joins
While doing joins on 2 tables using Beam SQL then it's working properly provide expected performance but as my Joining Tables increases then the performance become worst.
Below is my snippet which ...
0
votes
1
answer
96
views
Is there a work around for 'LIKE' in BeamSQL?
We have an Apache Beam 2.4.0 pipeline that runs BeamSql queries. In BeamSql the SQL statement 'LIKE' throws an exception 'LIKE is not implemented yet'. Is there a work around for 'LIKE' in BeamSql? We ...