3,242 questions
1
vote
4
answers
146
views
Get previous latest if required latest not available
I have some data like this:
Id
timestamp
100
2025-01-27 10:00:00
100
2025-01-26 10:00:00
100
2025-01-25 10:00:00
100
2024-04-20 10:00:00
100
2024-03-25 10:00:00
100
2023-05-05 10:00:00
100
2022-08-01 ...
2
votes
1
answer
88
views
"INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar" in Athena JSON
I have a requirement to print the following output:
matchId
rainyDate
sunnyDate
1
2024-09-04
2024-09-04
1
2024-09-11
2024-09-12
1
2024-09-18
2024-09-19
2
2024-10-04
2024-10-04
2
2024-10-11
3
There is ...
2
votes
1
answer
96
views
Safe hash of arbitrary map in Trino SQL / Athena
Using Trino SQL (actually AWS Athena implementation of Trino), I want to compute safe hashs of arbitrary MAP columns. By "arbitrary" I mean MAP that may have other MAP as values for certain ...
0
votes
0
answers
179
views
How to optimize a Starburst/Presto query that joins on two different keys without reading the same table twice?
I'm using Starburst (Presto) to query a large trip dataset (trip_data) from S3. I have a list of users (user_id) coming from a Kafka table (kafka_table), and I want to join this with trip data from ...
0
votes
0
answers
42
views
How parse PrestoTrino create table query with apache Calcite?
Consider the code:
import org.apache.calcite.config.Lex;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.parser.SqlParser;
import org.apache.calcite.sql.parser.ddl....
0
votes
1
answer
127
views
Referencing a CTE in a where clause
I'm trying to use a CTE at the top of my query to be referenced in WHERE clause in a separate CTE further down. For Example:
with ref as (
null email,
'("720884","70540")' ...
0
votes
1
answer
85
views
Can't Query AWS Athena Presto Table Because of Dash Character in Column name
I have a file in S3 with the following contents:
{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}...
1
vote
2
answers
116
views
Amazon Athena - SQL Query to Return all rows for ID where one row meets a condition and does not meet a condition
I am trying to write a query to return ALL rows for an ID where a condition is met and a condition is not met for each ID on the Order table.
The conditions I want are to return all rows where the ID'...
2
votes
1
answer
112
views
How to find overlapping records and select the latest record?
I am trying to get on record from a list of overlapping records in Presto(Trillo) DB. There are columns like loc_id, prod_id, line_id, start_dt, end_dt, update_dt.
The records are like this
( source ...
1
vote
1
answer
107
views
In Presto sql, how to grab the maximum timestamp in between two events within a table
I have a table df like below, my goal is to find the largest 'focus' timestamp after each visit for each [user:session] pair. (Timestamp as BIGINT data type and in ascending order)
event
user_id
...
0
votes
2
answers
54
views
Optimization of attribution based on timestamps
I am struggling to optimize my query where I need to combine orders data and events data using timestamp so that I attribute certain type of the event to the order in this way that the closest event (...
0
votes
2
answers
92
views
Remove duplicate record using Unnest | Aws Athena
I am facing issue while filtering data with array
I have columns userid,event_name,attributes,ti
Attributes column have value like this
{"bool_sample":true,"array_int":[10,20,25,38]...
1
vote
0
answers
38
views
Error due to S3 Partitions having different Datatype in Hudi Presto Table
We have data written to S3 in Hudi format with dt partition. Recently, we started receiving very large numbers for some columns stored as long datatype. These numbers exceeded the maximum limit of the ...
0
votes
2
answers
282
views
Extract keys / values from a nested JSON object stored as string in Presto?
I am getting below payload as string and need to extract all the values for key "id" .
I am writing like this:
JSON_EXTRACT_SCALAR(api_response, '$.0.id') as extract_id
But it is extracting ...
1
vote
1
answer
55
views
How to validate data is Hexadecimal in Presto
How can I use Presto sql and check if value of a column 'cola' is Hexadecimal? My goal is to resolve the issue of 'Not a valid base-16 number' when data is messy.
I've tried below but not working.
...
0
votes
1
answer
90
views
Presto worker nodes memory consumption is gradually increasing over a period of time & eventually its killing the service
Presto version: 0.278.1
Presto worker nodes memory consumption is gradually increasing over a
period of time & eventually its killing the service.
Hi, we are using Presto with the below configs:
...
2
votes
1
answer
132
views
Extracting JSON data without knowing the key
I'm getting a headache trying to figure out how to read a JSON with the following format in Athena
{
"id": "1",
"key1": {
"dynamic_key_here": [
{&...
1
vote
2
answers
127
views
Most direct way to check if array contains any non-null values
Currently doing this:
cardinality(filter(my_array, x -> x is not null)) != 0
Is there a more direct way?
1
vote
1
answer
36
views
How to create an array by replacing values from another array?
I am looking for a way to create an array from another array based on the values of the second array with value mapping.
E.g.
Table A has columns id, some_array
and I have some value mapping in mind, ...
0
votes
1
answer
145
views
Error in Athena when trying to query records between dates that are stored as string
I have an issue with Athena when trying to query records between dates that are stored as string.
My csv dataset, spread among several files across directories, has a quote_date column with 10/8/2024 ...
0
votes
1
answer
55
views
given a sql query, how to know the data contribution from each table
I have iceberg tables on aws.
i am trying to use Athena or presto to query them.
My question is: how to know the data contribution from each table to the result.
e.g. How to know how many rows in the ...
0
votes
1
answer
217
views
how to properly configure timestamp format column with athena table
Here is a rip-off table configuration I am working on in Athena. the data is in a bucket as json gzip files.
The column is timestsamp with the format yyyyMMddTHH:mm:ss
CREATE external TABLE json_tab(
...
2
votes
1
answer
55
views
How can I query attributes with struck array data type?
I have this value with data type array<struct<id:string,name:string,values:array<string>>
[
{id=gid://test/1234, name=Size, values=[L, M, S, XS]},
{id=gid://test/12345, name=...
0
votes
1
answer
91
views
How to search for a value that matches a condition
I'm looking for the best X% of data in a data set, where "best" is defined as having a minimum sum of values. I can do this by running a bunch of tests to cast around for the result I want:
...
1
vote
2
answers
73
views
Set True to duplicate records that occur within 1 hour window
I’d like to set is_duplicate to TRUE for records that occur within a 1-hour window of an earlier record.
The rule is that each record should check against the most recent prior record where ...
1
vote
2
answers
124
views
How to duplicate same row X number of times based on the value of a column?
I have a dataset looks like this:
Link_number
Houband
Time
Mean_speed
Sample_Number
Link 1
8
8:00
52
2
Link 1
8
8:30
55
5
Link 2
9
9:00
20
3
Link 2
9
9:30
40
4
I need to do duplicate each row X number ...
0
votes
2
answers
85
views
SQL join: unnesting rows with same column names (but different data) without resorting to aliases
Let's say I have three tables:
t1:
client_id
transmission_id
timestamp
column_A
1
AAA1
2024-10-16 10:31:27
Banana
1
AAA2
2024-10-16 11:31:27
Citrus
2
BBB1
2024-10-16 09:12:14
Apple
t2:
client_id
...
0
votes
1
answer
38
views
Joining 2 tables but only show values from 2nd table once in 1st table
I've been racking my brain trying to solve this problem. I'm using Presto SQL.
I have 2 tables:
trx (transactions per day):
| date | usage | deposit_id |
| -----------| -------- |------------|...
0
votes
1
answer
65
views
Rolling sums of last 3 days and 1 days in single query
I have a table with data for last 6 days:
I need 3 new columns for device: (1) the sum of October 3rd to 5th, regardless of the day (2) a rolling sum for the dates October 3rd to 5th using their ...
-2
votes
2
answers
130
views
Multiple Join Operations on the same columns with the same criteria. Is there a way to make this more efficient? [closed]
In Presto/Hive SQL, I have multiple tables, say T1, T2, T3, T4, X1, X2, X3, X4
Assume that the table size is
T1 > T2 > T3 > T4
X1 > X3 > X4 > X4
We have
SELECT
T1.a, T2.b, T3....
1
vote
1
answer
76
views
How to calculate an array of differences from an original array?
I have an array of integers and I want to calculate the differences between adjacent elements and return an array of differences.
SELECT ARRAY[3, 2, 5, 1, 2] AS my_arr -- original array
-- my_arr ...
0
votes
1
answer
148
views
`lag()` with `over` and `range between` returns a value if even the previous record is out of range
I want to get the previous value using lag(), over a partition defined with RANGE BETWEEN. I followed an example from the documentation:
WITH orders (custkey, orderdate, totalprice)
AS
(
...
-1
votes
4
answers
101
views
How to find records which are present in multiple groups
Below is my scenario
Column A
Column B
Group-A
1
Group-A
2
Group-A
1
Group-A
1
Group-B
3
Group-B
1
Group-B
5
Group-B
3
I need to Flag value 1 from Column-B as it is present in multiple group (Group-A ...
0
votes
1
answer
52
views
what is the equivalent of Substitute function in excel to presto sql
what is the equivalent of substitute function of excel to presto sql. I have code here that i need to convert to presto sql. below is the code and sample data.
excel code
=CONCAT(SUBSTITUTE(B2,LEFT(B2,...
0
votes
1
answer
65
views
Not able to extract nested array subfield in JSON from Athena table
We have an Athena table in which there is a column that contains JSON values.
The datatype of the main column(which contains JSON values) in Athena is a string datatype.
DDL of Athena table is like ...
2
votes
0
answers
380
views
Trino file base authorization and problem with create role
I installed Trino 455 on the Ubuntu VM, I use LDAP for authentication and file system access control for authorization. I can set rules for users in rules.json file but I can not set rules per role ...
2
votes
1
answer
42
views
Sum ordered rows OVER with distinct duplication
This is the data I have:
Customer
Sales
A
3
B
10
C
4
D
2
E
4
This is where I want to get:
# Top Customers
Total Sales
1
10
2
14
3
18
4
21
5
23
I am trying to do this with ROW_NUMBER and OVER (a window ...
1
vote
2
answers
467
views
How to lag using dynamic offset in SQL AWS Athena?
I have a table with a column of numbers and then other columns with other data. I have created column "floor10" that has floor of those numbers as a multiple of 10 (e.g. 8 -> 0, 17 -> ...
0
votes
1
answer
67
views
Generate all n-tuples of array values in Presto SQL
I have the following dataset in Presto.
WITH A (name, distinct_values) AS (
VALUES
('color', ARRAY['red', 'yellow']),
('shape', ARRAY['triangle', 'square', 'circle']),
('size', ARRAY['...
0
votes
1
answer
208
views
PrestoSQL/Trino - How to query all cases without explicit where clauses in subequery
I'm trying to write a query in Athena where I get a list of unique_usage_ids in which the unique_usage_start_date is atleast 1 month before the contract_sign_date
I know how to write a query to find ...
1
vote
1
answer
208
views
Preserve columns names case in Parquet produced by UNLOAD
By default in Athena (probably more generally Presto/Trino) SELECT * lowercases columns names.
I've found a workaround by explicitly specifying the columns names in the proper case SELECT SomeColumn, ...
0
votes
3
answers
88
views
How to validate that each value of 'IN' statement are present in the SQL query result?
I have a set of tableA
Name
City
Paulo
Rome
Rudy
Singapore
Ming
Singapore
Takeshi
Tokyo
Judy
Jakarta
Yuki
Tokyo
Steve
Singapore
I want to make sure that person from Berlin, Singapore, and Tokyo, all 3 ...
1
vote
1
answer
71
views
Issue with using contains array function in SQL presto
trying to get the following SQL Presto code to work so that I can perform business day calculations. It works if I write "d -> day_of_week(d) not in (6,7)" but I also need to filter out ...
0
votes
2
answers
290
views
Set value of one column in the select result
I am trying to replace the value of one of the columns in the SELECT result to 0. Normally I can do:
SELECT
0 AS column_to_set,
other_col_1,
other_col_2,
...
FROM tbl
However, there ...
0
votes
1
answer
383
views
Aggregating to json format - Trino / Hive
I use Trino and Hive in Jupyter Notebook. I want to aggregate a table in the following way:
q = f"""
CREATE TABLE {aggregated_table} AS
WITH aggregated_data AS (
SELECT
id,
...
0
votes
1
answer
244
views
How to use CASE with CONTAINS and Array Aggregation
I am trying to write a query using a CASE statement that is based off whether a value is in array, but I run into the catch 22 of not including the case statement in the GROUP BY and getting "...
-1
votes
1
answer
133
views
Convert number date/time to human readable date/time in Athena
I am using Hudi MoR for storing the data in AWS S3 Lake and using the Athena for querying. My data comes through kafka streaming.
I have a date column and time column in source db with below example ...
0
votes
0
answers
27
views
Summing transactions over a 2-day period
I am trying to find a way to sum a transaction amount over a 30-day review period if the account had 2 or more distinct transactions that totaled $100 or more in a two-day period.
Edited to add these ...
1
vote
1
answer
147
views
In SQL Athena, how can I get a CSV's creation date and time?
Based on the creation date and time of each CSV file, I want to build a table showing how fresh the data is. In SQL Athena, how can I get a CSV's creation date and time?
1
vote
1
answer
166
views
Time Cost of UNNEST operation in query?
This is for PrestoSQL
Assuming col1, col2, col3 are of same cardinality, and assuming the Table has N rows
SELECT c1 from Table, UNNEST(col1) AS t(c1)
SELECT c1, c2 from Table, UNNEST(col1, col2) ...