Skip to main content
Filter by
Sorted by
Tagged with
1 vote
4 answers
146 views

I have some data like this: Id timestamp 100 2025-01-27 10:00:00 100 2025-01-26 10:00:00 100 2025-01-25 10:00:00 100 2024-04-20 10:00:00 100 2024-03-25 10:00:00 100 2023-05-05 10:00:00 100 2022-08-01 ...
nut's user avatar
  • 61
2 votes
1 answer
88 views

I have a requirement to print the following output: matchId rainyDate sunnyDate 1 2024-09-04 2024-09-04 1 2024-09-11 2024-09-12 1 2024-09-18 2024-09-19 2 2024-10-04 2024-10-04 2 2024-10-11 3 There is ...
Dhruva Sen Gupta's user avatar
2 votes
1 answer
96 views

Using Trino SQL (actually AWS Athena implementation of Trino), I want to compute safe hashs of arbitrary MAP columns. By "arbitrary" I mean MAP that may have other MAP as values for certain ...
fweber's user avatar
  • 395
0 votes
0 answers
179 views

I'm using Starburst (Presto) to query a large trip dataset (trip_data) from S3. I have a list of users (user_id) coming from a Kafka table (kafka_table), and I want to join this with trip data from ...
Rishabh Bansal's user avatar
0 votes
0 answers
42 views

Consider the code: import org.apache.calcite.config.Lex; import org.apache.calcite.sql.SqlNode; import org.apache.calcite.sql.parser.SqlParser; import org.apache.calcite.sql.parser.ddl....
Cherry's user avatar
  • 34k
0 votes
1 answer
127 views

I'm trying to use a CTE at the top of my query to be referenced in WHERE clause in a separate CTE further down. For Example: with ref as ( null email, '("720884","70540")' ...
tl1310's user avatar
  • 13
0 votes
1 answer
85 views

I have a file in S3 with the following contents: {"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}...
Samer A.'s user avatar
1 vote
2 answers
116 views

I am trying to write a query to return ALL rows for an ID where a condition is met and a condition is not met for each ID on the Order table. The conditions I want are to return all rows where the ID'...
CWZY's user avatar
  • 37
2 votes
1 answer
112 views

I am trying to get on record from a list of overlapping records in Presto(Trillo) DB. There are columns like loc_id, prod_id, line_id, start_dt, end_dt, update_dt. The records are like this ( source ...
Parag's user avatar
  • 23
1 vote
1 answer
107 views

I have a table df like below, my goal is to find the largest 'focus' timestamp after each visit for each [user:session] pair. (Timestamp as BIGINT data type and in ascending order) event user_id ...
Crubal Chenxi Li's user avatar
0 votes
2 answers
54 views

I am struggling to optimize my query where I need to combine orders data and events data using timestamp so that I attribute certain type of the event to the order in this way that the closest event (...
Norah Jones's user avatar
0 votes
2 answers
92 views

I am facing issue while filtering data with array I have columns userid,event_name,attributes,ti Attributes column have value like this {"bool_sample":true,"array_int":[10,20,25,38]...
Nishant Dixit's user avatar
1 vote
0 answers
38 views

We have data written to S3 in Hudi format with dt partition. Recently, we started receiving very large numbers for some columns stored as long datatype. These numbers exceeded the maximum limit of the ...
Abhishek Gupta's user avatar
0 votes
2 answers
282 views

I am getting below payload as string and need to extract all the values for key "id" . I am writing like this: JSON_EXTRACT_SCALAR(api_response, '$.0.id') as extract_id But it is extracting ...
SDD's user avatar
  • 11
1 vote
1 answer
55 views

How can I use Presto sql and check if value of a column 'cola' is Hexadecimal? My goal is to resolve the issue of 'Not a valid base-16 number' when data is messy. I've tried below but not working. ...
Crubal Chenxi Li's user avatar
0 votes
1 answer
90 views

Presto version: 0.278.1 Presto worker nodes memory consumption is gradually increasing over a period of time & eventually its killing the service. Hi, we are using Presto with the below configs: ...
pavan kalyan's user avatar
2 votes
1 answer
132 views

I'm getting a headache trying to figure out how to read a JSON with the following format in Athena { "id": "1", "key1": { "dynamic_key_here": [ {&...
Victor Leprince's user avatar
1 vote
2 answers
127 views

Currently doing this: cardinality(filter(my_array, x -> x is not null)) != 0 Is there a more direct way?
John Roberts's user avatar
  • 6,004
1 vote
1 answer
36 views

I am looking for a way to create an array from another array based on the values of the second array with value mapping. E.g. Table A has columns id, some_array and I have some value mapping in mind, ...
Dyson's user avatar
  • 165
0 votes
1 answer
145 views

I have an issue with Athena when trying to query records between dates that are stored as string. My csv dataset, spread among several files across directories, has a quote_date column with 10/8/2024 ...
Stefano's user avatar
  • 723
0 votes
1 answer
55 views

I have iceberg tables on aws. i am trying to use Athena or presto to query them. My question is: how to know the data contribution from each table to the result. e.g. How to know how many rows in the ...
hehe123456's user avatar
0 votes
1 answer
217 views

Here is a rip-off table configuration I am working on in Athena. the data is in a bucket as json gzip files. The column is timestsamp with the format yyyyMMddTHH:mm:ss CREATE external TABLE json_tab( ...
Nir's user avatar
  • 2,677
2 votes
1 answer
55 views

I have this value with data type array<struct<id:string,name:string,values:array<string>> [ {id=gid://test/1234, name=Size, values=[L, M, S, XS]}, {id=gid://test/12345, name=...
lexmadness's user avatar
0 votes
1 answer
91 views

I'm looking for the best X% of data in a data set, where "best" is defined as having a minimum sum of values. I can do this by running a bunch of tests to cast around for the result I want: ...
Whatabrain's user avatar
1 vote
2 answers
73 views

I’d like to set is_duplicate to TRUE for records that occur within a 1-hour window of an earlier record. The rule is that each record should check against the most recent prior record where ...
JYJ's user avatar
  • 23
1 vote
2 answers
124 views

I have a dataset looks like this: Link_number Houband Time Mean_speed Sample_Number Link 1 8 8:00 52 2 Link 1 8 8:30 55 5 Link 2 9 9:00 20 3 Link 2 9 9:30 40 4 I need to do duplicate each row X number ...
ritarita 's user avatar
0 votes
2 answers
85 views

Let's say I have three tables: t1: client_id transmission_id timestamp column_A 1 AAA1 2024-10-16 10:31:27 Banana 1 AAA2 2024-10-16 11:31:27 Citrus 2 BBB1 2024-10-16 09:12:14 Apple t2: client_id ...
Shotey's user avatar
  • 3
0 votes
1 answer
38 views

I've been racking my brain trying to solve this problem. I'm using Presto SQL. I have 2 tables: trx (transactions per day): | date | usage | deposit_id | | -----------| -------- |------------|...
Sam Widjaja's user avatar
0 votes
1 answer
65 views

I have a table with data for last 6 days: I need 3 new columns for device: (1) the sum of October 3rd to 5th, regardless of the day (2) a rolling sum for the dates October 3rd to 5th using their ...
codenoodles's user avatar
-2 votes
2 answers
130 views

In Presto/Hive SQL, I have multiple tables, say T1, T2, T3, T4, X1, X2, X3, X4 Assume that the table size is T1 > T2 > T3 > T4 X1 > X3 > X4 > X4 We have SELECT T1.a, T2.b, T3....
user98235's user avatar
  • 926
1 vote
1 answer
76 views

I have an array of integers and I want to calculate the differences between adjacent elements and return an array of differences. SELECT ARRAY[3, 2, 5, 1, 2] AS my_arr -- original array -- my_arr ...
Emman's user avatar
  • 4,313
0 votes
1 answer
148 views

I want to get the previous value using lag(), over a partition defined with RANGE BETWEEN. I followed an example from the documentation: WITH orders (custkey, orderdate, totalprice) AS ( ...
Emman's user avatar
  • 4,313
-1 votes
4 answers
101 views

Below is my scenario Column A Column B Group-A 1 Group-A 2 Group-A 1 Group-A 1 Group-B 3 Group-B 1 Group-B 5 Group-B 3 I need to Flag value 1 from Column-B as it is present in multiple group (Group-A ...
user22435160's user avatar
0 votes
1 answer
52 views

what is the equivalent of substitute function of excel to presto sql. I have code here that i need to convert to presto sql. below is the code and sample data. excel code =CONCAT(SUBSTITUTE(B2,LEFT(B2,...
Jon's user avatar
  • 159
0 votes
1 answer
65 views

We have an Athena table in which there is a column that contains JSON values. The datatype of the main column(which contains JSON values) in Athena is a string datatype. DDL of Athena table is like ...
Beginner's user avatar
2 votes
0 answers
380 views

I installed Trino 455 on the Ubuntu VM, I use LDAP for authentication and file system access control for authorization. I can set rules for users in rules.json file but I can not set rules per role ...
Nestor's user avatar
  • 91
2 votes
1 answer
42 views

This is the data I have: Customer Sales A 3 B 10 C 4 D 2 E 4 This is where I want to get: # Top Customers Total Sales 1 10 2 14 3 18 4 21 5 23 I am trying to do this with ROW_NUMBER and OVER (a window ...
gust's user avatar
  • 965
1 vote
2 answers
467 views

I have a table with a column of numbers and then other columns with other data. I have created column "floor10" that has floor of those numbers as a multiple of 10 (e.g. 8 -> 0, 17 -> ...
combperm's user avatar
0 votes
1 answer
67 views

I have the following dataset in Presto. WITH A (name, distinct_values) AS ( VALUES ('color', ARRAY['red', 'yellow']), ('shape', ARRAY['triangle', 'square', 'circle']), ('size', ARRAY['...
crf's user avatar
  • 1,890
0 votes
1 answer
208 views

I'm trying to write a query in Athena where I get a list of unique_usage_ids in which the unique_usage_start_date is atleast 1 month before the contract_sign_date I know how to write a query to find ...
louieriksson's user avatar
1 vote
1 answer
208 views

By default in Athena (probably more generally Presto/Trino) SELECT * lowercases columns names. I've found a workaround by explicitly specifying the columns names in the proper case SELECT SomeColumn, ...
Pragmateek's user avatar
  • 13.6k
0 votes
3 answers
88 views

I have a set of tableA Name City Paulo Rome Rudy Singapore Ming Singapore Takeshi Tokyo Judy Jakarta Yuki Tokyo Steve Singapore I want to make sure that person from Berlin, Singapore, and Tokyo, all 3 ...
YazerieL's user avatar
1 vote
1 answer
71 views

trying to get the following SQL Presto code to work so that I can perform business day calculations. It works if I write "d -> day_of_week(d) not in (6,7)" but I also need to filter out ...
Wesley Young's user avatar
0 votes
2 answers
290 views

I am trying to replace the value of one of the columns in the SELECT result to 0. Normally I can do: SELECT 0 AS column_to_set, other_col_1, other_col_2, ... FROM tbl However, there ...
Dyson's user avatar
  • 165
0 votes
1 answer
383 views

I use Trino and Hive in Jupyter Notebook. I want to aggregate a table in the following way: q = f""" CREATE TABLE {aggregated_table} AS WITH aggregated_data AS ( SELECT id, ...
qwerty's user avatar
  • 907
0 votes
1 answer
244 views

I am trying to write a query using a CASE statement that is based off whether a value is in array, but I run into the catch 22 of not including the case statement in the GROUP BY and getting "...
BigPanda's user avatar
-1 votes
1 answer
133 views

I am using Hudi MoR for storing the data in AWS S3 Lake and using the Athena for querying. My data comes through kafka streaming. I have a date column and time column in source db with below example ...
JanakaRao's user avatar
  • 149
0 votes
0 answers
27 views

I am trying to find a way to sum a transaction amount over a 30-day review period if the account had 2 or more distinct transactions that totaled $100 or more in a two-day period. Edited to add these ...
steven fuchs's user avatar
1 vote
1 answer
147 views

Based on the creation date and time of each CSV file, I want to build a table showing how fresh the data is. In SQL Athena, how can I get a CSV's creation date and time?
Kadio's user avatar
  • 223
1 vote
1 answer
166 views

This is for PrestoSQL Assuming col1, col2, col3 are of same cardinality, and assuming the Table has N rows SELECT c1 from Table, UNNEST(col1) AS t(c1) SELECT c1, c2 from Table, UNNEST(col1, col2) ...
user98235's user avatar
  • 926

1
2 3 4 5
65