Newest 'presto' Questions

1 vote

4 answers

146 views

Get previous latest if required latest not available

I have some data like this: Id timestamp 100 2025-01-27 10:00:00 100 2025-01-26 10:00:00 100 2025-01-25 10:00:00 100 2024-04-20 10:00:00 100 2024-03-25 10:00:00 100 2023-05-05 10:00:00 100 2022-08-01 ...

nut

61

asked Jul 4 at 15:10

2 votes

1 answer

88 views

"INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar" in Athena JSON

I have a requirement to print the following output: matchId rainyDate sunnyDate 1 2024-09-04 2024-09-04 1 2024-09-11 2024-09-12 1 2024-09-18 2024-09-19 2 2024-10-04 2024-10-04 2 2024-10-11 3 There is ...

Dhruva Sen Gupta

45

asked Jun 14 at 15:35

2 votes

1 answer

96 views

Safe hash of arbitrary map in Trino SQL / Athena

Using Trino SQL (actually AWS Athena implementation of Trino), I want to compute safe hashs of arbitrary MAP columns. By "arbitrary" I mean MAP that may have other MAP as values for certain ...

fweber

395

asked May 27 at 13:53

0 votes

0 answers

179 views

How to optimize a Starburst/Presto query that joins on two different keys without reading the same table twice?

I'm using Starburst (Presto) to query a large trip dataset (trip_data) from S3. I have a list of users (user_id) coming from a Kafka table (kafka_table), and I want to join this with trip data from ...

Rishabh Bansal

402

asked Apr 11 at 10:05

0 votes

0 answers

42 views

How parse PrestoTrino create table query with apache Calcite?

Consider the code: import org.apache.calcite.config.Lex; import org.apache.calcite.sql.SqlNode; import org.apache.calcite.sql.parser.SqlParser; import org.apache.calcite.sql.parser.ddl....

Cherry

34k

asked Mar 13 at 11:55

0 votes

1 answer

127 views

Referencing a CTE in a where clause

I'm trying to use a CTE at the top of my query to be referenced in WHERE clause in a separate CTE further down. For Example: with ref as ( null email, '("720884","70540")' ...

tl1310

13

asked Mar 7 at 8:28

0 votes

1 answer

85 views

Can't Query AWS Athena Presto Table Because of Dash Character in Column name

I have a file in S3 with the following contents: {"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}...

Samer A.

65

asked Mar 3 at 17:01

1 vote

2 answers

116 views

Amazon Athena - SQL Query to Return all rows for ID where one row meets a condition and does not meet a condition

I am trying to write a query to return ALL rows for an ID where a condition is met and a condition is not met for each ID on the Order table. The conditions I want are to return all rows where the ID'...

CWZY

37

asked Feb 27 at 16:52

2 votes

1 answer

112 views

How to find overlapping records and select the latest record?

I am trying to get on record from a list of overlapping records in Presto(Trillo) DB. There are columns like loc_id, prod_id, line_id, start_dt, end_dt, update_dt. The records are like this ( source ...

Parag

23

asked Jan 31 at 6:03

1 vote

1 answer

107 views

In Presto sql, how to grab the maximum timestamp in between two events within a table

I have a table df like below, my goal is to find the largest 'focus' timestamp after each visit for each [user:session] pair. (Timestamp as BIGINT data type and in ascending order) event user_id ...

Crubal Chenxi Li

313

asked Jan 23 at 18:43

0 votes

2 answers

54 views

Optimization of attribution based on timestamps

I am struggling to optimize my query where I need to combine orders data and events data using timestamp so that I attribute certain type of the event to the order in this way that the closest event (...

Norah Jones

467

asked Jan 20 at 13:48

0 votes

2 answers

92 views

Remove duplicate record using Unnest | Aws Athena

I am facing issue while filtering data with array I have columns userid,event_name,attributes,ti Attributes column have value like this {"bool_sample":true,"array_int":[10,20,25,38]...

Nishant Dixit

5,512

asked Jan 18 at 14:51

1 vote

0 answers

38 views

Error due to S3 Partitions having different Datatype in Hudi Presto Table

We have data written to S3 in Hudi format with dt partition. Recently, we started receiving very large numbers for some columns stored as long datatype. These numbers exceeded the maximum limit of the ...

Abhishek Gupta

11

asked Jan 9 at 2:40

0 votes

2 answers

282 views

Extract keys / values from a nested JSON object stored as string in Presto?

I am getting below payload as string and need to extract all the values for key "id" . I am writing like this: JSON_EXTRACT_SCALAR(api_response, '$.0.id') as extract_id But it is extracting ...

SDD

11

asked Jan 7 at 0:08

1 vote

1 answer

55 views

How to validate data is Hexadecimal in Presto

How can I use Presto sql and check if value of a column 'cola' is Hexadecimal? My goal is to resolve the issue of 'Not a valid base-16 number' when data is messy. I've tried below but not working. ...

Crubal Chenxi Li

313

asked Jan 4 at 4:31

0 votes

1 answer

90 views

Presto worker nodes memory consumption is gradually increasing over a period of time & eventually its killing the service

Presto version: 0.278.1 Presto worker nodes memory consumption is gradually increasing over a period of time & eventually its killing the service. Hi, we are using Presto with the below configs: ...

pavan kalyan

107

asked Jan 2 at 13:47

2 votes

1 answer

132 views

Extracting JSON data without knowing the key

I'm getting a headache trying to figure out how to read a JSON with the following format in Athena { "id": "1", "key1": { "dynamic_key_here": [ {&...

Victor Leprince

21

asked Dec 11, 2024 at 10:30

1 vote

2 answers

127 views

Most direct way to check if array contains any non-null values

Currently doing this: cardinality(filter(my_array, x -> x is not null)) != 0 Is there a more direct way?

John Roberts

6,004

asked Dec 5, 2024 at 14:01

1 vote

1 answer

36 views

How to create an array by replacing values from another array?

I am looking for a way to create an array from another array based on the values of the second array with value mapping. E.g. Table A has columns id, some_array and I have some value mapping in mind, ...

Dyson

165

asked Dec 3, 2024 at 23:27

0 votes

1 answer

145 views

Error in Athena when trying to query records between dates that are stored as string

I have an issue with Athena when trying to query records between dates that are stored as string. My csv dataset, spread among several files across directories, has a quote_date column with 10/8/2024 ...

Stefano

723

asked Nov 20, 2024 at 20:18

0 votes

1 answer

55 views

given a sql query, how to know the data contribution from each table

I have iceberg tables on aws. i am trying to use Athena or presto to query them. My question is: how to know the data contribution from each table to the result. e.g. How to know how many rows in the ...

hehe123456

171

asked Nov 19, 2024 at 1:20

0 votes

1 answer

217 views

how to properly configure timestamp format column with athena table

Here is a rip-off table configuration I am working on in Athena. the data is in a bucket as json gzip files. The column is timestsamp with the format yyyyMMddTHH:mm:ss CREATE external TABLE json_tab( ...

Nir

2,677

asked Nov 13, 2024 at 10:16

2 votes

1 answer

55 views

How can I query attributes with struck array data type?

I have this value with data type array<struct<id:string,name:string,values:array<string>> [ {id=gid://test/1234, name=Size, values=[L, M, S, XS]}, {id=gid://test/12345, name=...

lexmadness

31

asked Nov 12, 2024 at 10:24

0 votes

1 answer

91 views

How to search for a value that matches a condition

I'm looking for the best X% of data in a data set, where "best" is defined as having a minimum sum of values. I can do this by running a bunch of tests to cast around for the result I want: ...

Whatabrain

270

asked Nov 10, 2024 at 18:43

1 vote

2 answers

73 views

Set True to duplicate records that occur within 1 hour window

I’d like to set is_duplicate to TRUE for records that occur within a 1-hour window of an earlier record. The rule is that each record should check against the most recent prior record where ...

JYJ

23

asked Nov 6, 2024 at 18:54

1 vote

2 answers

124 views

How to duplicate same row X number of times based on the value of a column?

I have a dataset looks like this: Link_number Houband Time Mean_speed Sample_Number Link 1 8 8:00 52 2 Link 1 8 8:30 55 5 Link 2 9 9:00 20 3 Link 2 9 9:30 40 4 I need to do duplicate each row X number ...

ritarita

55

asked Oct 31, 2024 at 4:12

0 votes

2 answers

85 views

SQL join: unnesting rows with same column names (but different data) without resorting to aliases

Let's say I have three tables: t1: client_id transmission_id timestamp column_A 1 AAA1 2024-10-16 10:31:27 Banana 1 AAA2 2024-10-16 11:31:27 Citrus 2 BBB1 2024-10-16 09:12:14 Apple t2: client_id ...

Shotey

3

asked Oct 16, 2024 at 14:16

0 votes

1 answer

38 views

Joining 2 tables but only show values from 2nd table once in 1st table

I've been racking my brain trying to solve this problem. I'm using Presto SQL. I have 2 tables: trx (transactions per day): | date | usage | deposit_id | | -----------| -------- |------------|...

Sam Widjaja

1

asked Oct 16, 2024 at 8:29

0 votes

1 answer

65 views

Rolling sums of last 3 days and 1 days in single query

I have a table with data for last 6 days: I need 3 new columns for device: (1) the sum of October 3rd to 5th, regardless of the day (2) a rolling sum for the dates October 3rd to 5th using their ...

codenoodles

143

asked Oct 15, 2024 at 3:34

-2 votes

2 answers

130 views

Multiple Join Operations on the same columns with the same criteria. Is there a way to make this more efficient? [closed]

In Presto/Hive SQL, I have multiple tables, say T1, T2, T3, T4, X1, X2, X3, X4 Assume that the table size is T1 > T2 > T3 > T4 X1 > X3 > X4 > X4 We have SELECT T1.a, T2.b, T3....

user98235

926

asked Oct 13, 2024 at 9:59

1 vote

1 answer

76 views

How to calculate an array of differences from an original array?

I have an array of integers and I want to calculate the differences between adjacent elements and return an array of differences. SELECT ARRAY[3, 2, 5, 1, 2] AS my_arr -- original array -- my_arr ...

Emman

4,313

asked Oct 10, 2024 at 20:08

0 votes

1 answer

148 views

`lag()` with `over` and `range between` returns a value if even the previous record is out of range

I want to get the previous value using lag(), over a partition defined with RANGE BETWEEN. I followed an example from the documentation: WITH orders (custkey, orderdate, totalprice) AS ( ...

Emman

4,313

asked Oct 10, 2024 at 12:41

-1 votes

4 answers

101 views

How to find records which are present in multiple groups

Below is my scenario Column A Column B Group-A 1 Group-A 2 Group-A 1 Group-A 1 Group-B 3 Group-B 1 Group-B 5 Group-B 3 I need to Flag value 1 from Column-B as it is present in multiple group (Group-A ...

user22435160

21

asked Oct 9, 2024 at 15:51

0 votes

1 answer

52 views

what is the equivalent of Substitute function in excel to presto sql

what is the equivalent of substitute function of excel to presto sql. I have code here that i need to convert to presto sql. below is the code and sample data. excel code =CONCAT(SUBSTITUTE(B2,LEFT(B2,...

Jon

159

asked Oct 3, 2024 at 1:31

0 votes

1 answer

65 views

Not able to extract nested array subfield in JSON from Athena table

We have an Athena table in which there is a column that contains JSON values. The datatype of the main column(which contains JSON values) in Athena is a string datatype. DDL of Athena table is like ...

Beginner

91

asked Sep 25, 2024 at 15:16

2 votes

0 answers

380 views

Trino file base authorization and problem with create role

I installed Trino 455 on the Ubuntu VM, I use LDAP for authentication and file system access control for authorization. I can set rules for users in rules.json file but I can not set rules per role ...

Nestor

91

asked Sep 24, 2024 at 16:07

2 votes

1 answer

42 views

Sum ordered rows OVER with distinct duplication

This is the data I have: Customer Sales A 3 B 10 C 4 D 2 E 4 This is where I want to get: # Top Customers Total Sales 1 10 2 14 3 18 4 21 5 23 I am trying to do this with ROW_NUMBER and OVER (a window ...

gust

965

asked Sep 18, 2024 at 23:15

1 vote

2 answers

467 views

How to lag using dynamic offset in SQL AWS Athena?

I have a table with a column of numbers and then other columns with other data. I have created column "floor10" that has floor of those numbers as a multiple of 10 (e.g. 8 -> 0, 17 -> ...

combperm

25

asked Sep 17, 2024 at 12:48

0 votes

1 answer

67 views

Generate all n-tuples of array values in Presto SQL

I have the following dataset in Presto. WITH A (name, distinct_values) AS ( VALUES ('color', ARRAY['red', 'yellow']), ('shape', ARRAY['triangle', 'square', 'circle']), ('size', ARRAY['...

crf

1,890

asked Sep 16, 2024 at 22:29

0 votes

1 answer

208 views

PrestoSQL/Trino - How to query all cases without explicit where clauses in subequery

I'm trying to write a query in Athena where I get a list of unique_usage_ids in which the unique_usage_start_date is atleast 1 month before the contract_sign_date I know how to write a query to find ...

louieriksson

25

asked Sep 12, 2024 at 18:06

1 vote

1 answer

208 views

Preserve columns names case in Parquet produced by UNLOAD

By default in Athena (probably more generally Presto/Trino) SELECT * lowercases columns names. I've found a workaround by explicitly specifying the columns names in the proper case SELECT SomeColumn, ...

Pragmateek

13.6k

asked Sep 12, 2024 at 13:12

0 votes

3 answers

88 views

How to validate that each value of 'IN' statement are present in the SQL query result?

I have a set of tableA Name City Paulo Rome Rudy Singapore Ming Singapore Takeshi Tokyo Judy Jakarta Yuki Tokyo Steve Singapore I want to make sure that person from Berlin, Singapore, and Tokyo, all 3 ...

YazerieL

15

asked Aug 29, 2024 at 10:40

1 vote

1 answer

71 views

Issue with using contains array function in SQL presto

trying to get the following SQL Presto code to work so that I can perform business day calculations. It works if I write "d -> day_of_week(d) not in (6,7)" but I also need to filter out ...

Wesley Young

85

asked Aug 22, 2024 at 19:23

0 votes

2 answers

290 views

Set value of one column in the select result

I am trying to replace the value of one of the columns in the SELECT result to 0. Normally I can do: SELECT 0 AS column_to_set, other_col_1, other_col_2, ... FROM tbl However, there ...

Dyson

165

asked Aug 21, 2024 at 18:50

0 votes

1 answer

383 views

Aggregating to json format - Trino / Hive

I use Trino and Hive in Jupyter Notebook. I want to aggregate a table in the following way: q = f""" CREATE TABLE {aggregated_table} AS WITH aggregated_data AS ( SELECT id, ...

qwerty

907

asked Aug 20, 2024 at 4:36

0 votes

1 answer

244 views

How to use CASE with CONTAINS and Array Aggregation

I am trying to write a query using a CASE statement that is based off whether a value is in array, but I run into the catch 22 of not including the case statement in the GROUP BY and getting "...

BigPanda

3

asked Aug 19, 2024 at 21:55

-1 votes

1 answer

133 views

Convert number date/time to human readable date/time in Athena

I am using Hudi MoR for storing the data in AWS S3 Lake and using the Athena for querying. My data comes through kafka streaming. I have a date column and time column in source db with below example ...

JanakaRao

149

asked Aug 19, 2024 at 12:07

0 votes

0 answers

27 views

Summing transactions over a 2-day period

I am trying to find a way to sum a transaction amount over a 30-day review period if the account had 2 or more distinct transactions that totaled $100 or more in a two-day period. Edited to add these ...

steven fuchs

29

asked Aug 18, 2024 at 23:45

1 vote

1 answer

147 views

In SQL Athena, how can I get a CSV's creation date and time?

Based on the creation date and time of each CSV file, I want to build a table showing how fresh the data is. In SQL Athena, how can I get a CSV's creation date and time?

Kadio

223

asked Aug 18, 2024 at 10:33

1 vote

1 answer

166 views

Time Cost of UNNEST operation in query?

This is for PrestoSQL Assuming col1, col2, col3 are of same cardinality, and assuming the Table has N rows SELECT c1 from Table, UNNEST(col1) AS t(c1) SELECT c1, c2 from Table, UNNEST(col1, col2) ...

user98235

926

asked Aug 12, 2024 at 6:14

Collectives™ on Stack Overflow