Newest 'amazon-redshift-spectrum' Questions

0 votes

0 answers

132 views

Grant SELECT on AWS Redshift external table not supported — how to implement principle of least privilege?

I'm working with Amazon Redshift Spectrum and trying to follow the principle of least privilege by granting only the necessary access to users or roles. I attempted the following command: GRANT SELECT ...

Jeff A

649

asked Jun 9 at 14:53

0 votes

2 answers

151 views

I need to filter records with nested column in Redshift

I have data stored in s3 with parquet format, and have multiple column with array of struct and also array inside array of struct Here I am query one category column which have array of struct Sample ...

Nishant Dixit

5,512

asked Apr 5 at 13:53

0 votes

1 answer

85 views

Extracting JSON value in redshift

I am trying to get JSON extract from a column value in redshift. The column value is like: [{'IDIndex': '0001', 'History': 4, 'Name': '08-SA-21-C1', 'ActiveFlag': 1, 'Category': 3, 'TotalCount': 0, '...

nodev_101

109

asked Feb 19 at 6:35

0 votes

1 answer

215 views

AWS Redshift spectrum not able to return data for external table where data type timestamp

I'm trying to query data through Redshift Spectrum using an external schema from the Glue catalog but encountering an issue with a column that has a timestamp data type. When I run the query SELECT * ...

Jeff A

649

asked Dec 16, 2024 at 16:53

0 votes

1 answer

344 views

I need to access values from a MAP datatype in Redshift Spectrum

I have very large (1 billion + records) files in S3, that I am querying via Amazon Redshift using Spectrum. I have a datatype in Redshift as follows: map<string,struct<string_value:string,...

Nick Edwards

1

asked Sep 13, 2024 at 22:32

0 votes

1 answer

49 views

The data changed while importing JSON into Redshift using Super

I inserted the JSON: {key1:xxx, key2:xxx, ..., key3:1.0000000123456789, ..., keyn:xxx} into a Super type column in Redshift, and it resulted in the following on Redshift: {key1:xxx, key2:xxx, ..., ...

ezail

23

asked Aug 29, 2024 at 3:46

-1 votes

1 answer

39 views

Trailing 12 months SUM for each customer

I have a table Sales with the following columns - Emp ID Activity Date Sales 1234 2024-01-01 254.22 1234 2024-05-08 227.10 5678 2023-02-01 254.22 5678 2024-05-01 227.10 I need to find the total ...

Vertika Sharma

1

asked Aug 6, 2024 at 0:20

0 votes

1 answer

175 views

How to create 1000s of tables in redshift serverless automatically using boto3?

I have an S3 bucket which contains 1000s of folders which are basically table_names and those contains parquet files. I'm trying to create tables with that schema in redshift. I'm using redshift-data ...

Poreddy Siva Sukumar Reddy US

15

asked Jul 25, 2024 at 9:03

0 votes

1 answer

187 views

Extract element in STRUCT data type Redshift Spectrum

I have a spectrum table with the following schema: TABLE spectrum.table ( realmcode struct < @code: string >, typeid struct < @extension: string, @root: string >, ...

Edoardo De Gaspari

1

asked Jul 2, 2024 at 8:59

2 votes

1 answer

86 views

Redshift spectrum table query stuck on discover attribute {column name}

I have an external table in Redshift spectrum. It works fine when I add it in a view. But when I try to query it directly, it gets stuck on Discover attribute {column name}. The query takes ages but ...

abd

81

asked Jun 27, 2024 at 12:13

0 votes

0 answers

280 views

Parse JSON array into rows and columns in AWS Redshift accessing data through Redshift Spectrum

Had a previous post asking about parsing an array , (JSON data) in AWS Athena into rows and columns which was answered (AWS Athena Parse array of JSON objects to rows) but had a new twist added. We ...

Jeff A

649

asked Jun 5, 2024 at 17:22

0 votes

0 answers

40 views

Parsing Redshift metalized view Issue

Input stream. { "awsRegion": "us-west-2", "eventID": "101", "eventName": "TEST", "userIdentity": null, "...

Anand

1

asked May 1, 2024 at 18:47

0 votes

1 answer

1k views

Cannot query nested array within Redshift Struct

I have a table in AWS Redshift Spectrum that contains a column called "data". Each cell in "Data" contains an array of JSON objects. A single data cell may look like this (this is ...

Benjamin Bingham

87

asked Apr 26, 2024 at 13:14

0 votes

1 answer

370 views

Create MATERIALIZED VIEW in redshift using kinesis stream

I am using below query to create metalized view in redshift , `CREATE MATERIALIZED VIEW test_sch."new_vw" AUTO REFRESH YES AS SELECT approximate_arrival_timestamp, JSON_PARSE(kinesis_data) ...

Anand

1

asked Apr 10, 2024 at 17:24

0 votes

2 answers

1k views

fetch key and value from a super field in redshift

I have a super field that holds JSON formatted data - ** { "awsRegion": "us-west-2", "dynamodb": { "ApproximateCreationDateTime": 1712584702997808, "Keys&...

Biswa Patra

61

asked Apr 9, 2024 at 19:27

0 votes

1 answer

269 views

How to get the path of s3 file while loading into the physical table in redshift?

I am getting the path of the s3 files from external table using : create or replace view raw_view as select *,"$path" as sourcefilename from raw_external_table WITH NO SCHEMA BINDING; Now I ...

Rikesh Kayastha

13

asked Apr 3, 2024 at 5:08

0 votes

1 answer

308 views

Loading around 50gb of parquet data to Redshift taking indefinite time to load

I am loading around 50 gb of Parquet data into DataFrame using Glue ETL job and then trying to load into Redshift table which is taking more 6-7 hrs and not even completing. datasink=glueContext....

RickyS

23

asked Mar 27, 2024 at 1:22

0 votes

1 answer

276 views

redshift spectrum type conversion from String to Varchar

When I scan the data from S3 using a Glue crawler I get this schema: {id: integer, value: String} This is because Spark writes data back in String type and not varchar type. Although there is a ...

Neelanjoy B

26

asked Mar 25, 2024 at 5:25

0 votes

0 answers

52 views

How to process cedilla delimiter in redshift copy,unload commands inside a stored procedure(Copy delimiter single byte character)?

My query is like this: unload('select * from table') to 's3://path' credentials '*******' header parallel off delimiter as '\307' The delimiter is Cedilla Ç. and I have to unload it like this '\307'. ...

Rikky Bhai

1,018

asked Mar 20, 2024 at 11:53

0 votes

0 answers

141 views

Redshift: Join super arrays of different rows

Given a table with this schema: id name values 1 a [1,2,3] 1 b [4,5,6] 1 c [x,x,y] Can I query it to receive this: id a b c 1 1 4 x 1 2 5 x 1 3 6 y And be then able to filter e.g. WHERE c = 'x' or ...

jacksbox

929

asked Mar 5, 2024 at 16:23

0 votes

0 answers

151 views

Why is Redshift query plan showing join between tables not joined in the query

I'm facing a query performance on a quite complex query in Redshift and as a noob in Redshift I don't understand. The query is made of several joins (INNER and LEFT) and doesn't return data in hours. ...

Teuh1975

11

asked Feb 5, 2024 at 20:45

0 votes

1 answer

284 views

Redshift Materialized View: error when creating with glue catalog

I try to create a materialized view via spektrum from an external table in the glue data catalog CREATE MATERIALIZED VIEW "dev"."public"."table_name" AS SELECT DISTINCT * ...

jacksbox

929

asked Jan 31, 2024 at 11:03

0 votes

1 answer

482 views

How to read binary type column in parquet file by AWS Redshift Spectrum?

I have a parquet file generated by clickhouse, if use pyarrow to show its schema: import pyarrow.parquet as pq data = pq.read_table('test.pqt') print(data.schema) It shows the schema was like this: ...

Rinze

834

asked Dec 26, 2023 at 7:38

0 votes

3 answers

2k views

function json_extract_path_text(super, "unknown") does not exist - Redshift

Alright. I have a table that has SUPER type fields. These fields hold values like below: id mycol --------------------------------- 1 [{"Title":"first"},{"Title":"...

Rick

1,619

asked Nov 29, 2023 at 15:49

-1 votes

1 answer

3k views

Convert super field to string/varchar in redshift

I have a super field that holds JSON formatted data - [{"Title":"First Last"}] I want to extract the JSON value string First Last and to do so, I tried converting this field to ...

Rick

1,619

asked Nov 28, 2023 at 3:33

0 votes

1 answer

133 views

Convert days into hours in Amazon redshift

I want to convert a column with mixed formats like "1 day 07:00:00" and "2 days" into hours. Here's a query that should work in Amazon Redshift: SELECT CASE WHEN ...

Keshav Sahu

1

asked Nov 7, 2023 at 15:11

1 vote

0 answers

144 views

Redshift Spectrum returns null WITHOUT error

I'm querying redshift spectrum and certain fields are showing up null without any explanation. I've checked SVL_S3LOG SVL_SPECTRUM_SCAN_ERROR SYS_EXTERNAL_QUERY_ERROR And they are all empty. In the ...

RSHAP

2,446

asked Oct 26, 2023 at 17:34

0 votes

1 answer

944 views

How to ignore some columns when copy Parquet file into AWS Redshift?

I want to copy some parquet files into AWS Redshift, but the Redshift table schema has fewer columns compared to the parquet files, because those columns contain sensitive information. Therefore, I ...

Rinze

834

asked Oct 26, 2023 at 10:10

3 votes

1 answer

782 views

How do I get the details about a Spectrum Scan Error on an external table on Redshift Serverless?

According to the list of available monitoring views at the bottom of Monitoring queries and workloads with Amazon Redshift Serverless, sys_external_query_error is not available in Redshift Serverless. ...

Kris Bixler

206

asked Sep 7, 2023 at 19:38

1 vote

0 answers

150 views

Skip corrupted rows in SQL while inserting

I have 2 tables, let's say MAIN (Redshift) and TEMP (Spectrum), and a simple query that inserts all the data from TEMP into MAIN. But sometimes it may fail and rise an error like this:error: Invalid ...

Vahagn

11

asked Jul 28, 2023 at 7:07

0 votes

1 answer

1k views

How do I pull the first item in an array that's inside a stuct column in a Redshift nested table

I am working on some github data and would like to pull the first commit email ([email protected]) in the data below. All of the below data is stored as a struct column. "struct<action:string,...

Bob

1

asked Jun 24, 2023 at 14:23

0 votes

2 answers

257 views

Redshift Spectrum Procedure cannot be called inside a select SQL

I am currently migrating a PostgreSQL procedure to work in Redshift Spectrum Serverless. I was able to have a working procedure that works as intended in Redshift. However, its originally used inside ...

viralshah009

81

asked Jun 6, 2023 at 18:40

0 votes

1 answer

63 views

Ingest s3 file to redshift maintaining line order

I have files in s3 I need to read into redshift, but I need to maintain the line number from the file somehow. I tried inserting from a spectrum table into a table with an identity column but it ...

user433342

1,118

asked May 19, 2023 at 21:16

0 votes

1 answer

574 views

Redshift Spectrum query fails with Parsed manifest is not a valid JSON object

I have an S3 bucket with 5 prefixes / "sub-folders", each containing a set of CSV files that were exported from a legacy database. The CSV files have been crawled and created a Glue database ...

RobD

1,704

asked May 17, 2023 at 22:08

1 vote

2 answers

185 views

How to extract the values assigned to x from this string?

I am trying to create a regex that I can use to extract out the values assigned to variable x in the following string: (req.idf=6ca9a AND (req.ster=201 OR req.ster=st_home) AND (req.ste=hi OR req....

Moh

21

asked May 1, 2023 at 21:38

0 votes

1 answer

1k views

Query failed due to LIMIT error in redshift merge query

I was just trying to run one merge statement redshift, But continuously I am getting "ERROR: syntax error at or near "limit" Position: 229". But no where in code I have used this &...

nodev_101

109

asked Apr 25, 2023 at 9:06

1 vote

2 answers

1k views

Redshift Spectrum over 40x slower than Athena for simple queries

I have a S3 data lake that I can query with Athena. The same data lake is hooked up to Amazon Redshift as well. However when I run queries in Redshift I get insanely longer query times compared to ...

Killerpixler

4,080

asked Apr 11, 2023 at 16:03

0 votes

1 answer

608 views

How to workaround limitation for Iceberg Tables not accessible from Redshift Spectrum?

I have different Iceberg tables built and updated using Python scripts on Glue. I need now to access to them via Redshift Spectrum. From documentation (and some personal tests) it seems not possible ...

Randomize

9,163

asked Apr 11, 2023 at 9:37

0 votes

1 answer

927 views

Unnest two columns with Redshift Spectrum

I have this table called results, with two nested arrays in columns students and grades: class students grades C1 [S1, S2, S3] [C, A, B] C2 [S3, S4] [A, B] I'd like to unnest it to the following: ...

Youssef

5

asked Mar 24, 2023 at 17:40

3 votes

1 answer

676 views

AWS Redshift Spectrum when accessing files in S3 Glacier deep archive

We have set up AWS Redshift external table accessing S3 using Spectrum. Due to the huge data amount, we decided to change S3 storage class for files older than 30 days to storage class S3 Glacier Deep ...

Edgars T.

1,149

asked Mar 24, 2023 at 12:46

0 votes

1 answer

2k views

Load into table 'users' failed. Check 'stl_load_errors' system table for details, Redshift

try: cur.execute(""" copy users from 's3://mybucketxxx/SampleCSVFile_119kb_Copy.csv' credentials 'aws_iam_role=arn:aws:iam::78942xxx:role/redshift-s3-access' delimiter ',' region 'ap-...

Arun Kumar

1

asked Mar 19, 2023 at 13:03

0 votes

2 answers

619 views

Can partitioning data in Apache Hudi optimize AWS Spectrum query?

I'm using AWS Redshift Spectrum to query a Hudi table. As we know, filtering data by partition column when querying data in Spectrum could reduce the size of the data scanned by Spectrum and speed up ...

Rinze

834

asked Feb 15, 2023 at 2:33

0 votes

1 answer

2k views

How to alter external schema IAM Role in Redshift?

Based on: https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html I have my schema declared in the following way: create external schema spectrum_schema from data catalog database ...

Vzzarr

5,882

asked Feb 9, 2023 at 16:44

0 votes

1 answer

339 views

How to skip files with specific extension on Redshift external tables?

I have a partitioned location on S3 with data I want to read via Redshift External Table, which I create with the SQL statement CREATE EXTERNAL TABLE.... Only thing is that I have some metadata files ...

Vzzarr

5,882

asked Feb 7, 2023 at 15:21

0 votes

1 answer

257 views

S3 Folder Containing Redshift Spectrum Table Deleted Randomly

I have an external table in Redshift. When I use UNLOAD to fill this table, sometimes the S3 folder that contains the data gets deleted randomly (or I couldn’t figure out the reason). Here's the ...

ένας

115

asked Jan 26, 2023 at 16:25

0 votes

1 answer

109 views

Is there a way to get list of files scanned for a RedShift query referencing tables in Spectrum?

We have a query which is performing an aggregation, like: SELECT t.date, COUNT(*) AS rec_count FROM our_schema.log_data t WHERE t.date BETWEEN '2011-01-01' AND '2012-01-01' GROUP BY t.date; I know we ...

bpeikes

3,739

asked Jan 12, 2023 at 15:12

0 votes

2 answers

947 views

How to run transactional SQL on Redshift using boto3

I'm trying to use boto3 redshift-data client to execute transactional SQL for external table (Redshift spectrum) with following statement, ALTER TABLE schema.table ADD IF NOT EXISTS PARTITION(key=...

PolarStorm

31

asked Jan 11, 2023 at 19:35

0 votes

0 answers

459 views

Concatenating Entire Row to Get Checksum of the Data Returns Error

I'm trying to concatenate an entire row of data to get their checksum output in AWS Redshift external tables to insert them in another external table. Here's a sample of my code (I have much more ...

ένας

115

asked Dec 29, 2022 at 6:49

1 vote

0 answers

874 views

Redshift Spectrum to Delta Lake integration using manifest files (an issue in the partitioned table when updating a partition column)

I am working with Delta Table and Redshift Spectrum and I notice strange behaviour. I follow this article to set up a Redshift Spectrum to Delta Lake integration using manifest files and query Delta ...

Antonio La Macchia

11

asked Dec 20, 2022 at 21:02

0 votes

1 answer

295 views

How does AWS charge the Redshift Spectrum cluster?

AWS doc on the pricing of AWS Redshift Spectrum says that we pay for only TB scanned. However, I still need to create a Redshift cluster and specify instance type as well as how many nodes in the ...

user159566

81

asked Dec 13, 2022 at 18:45

Collectives™ on Stack Overflow