307 questions
0
votes
0
answers
132
views
Grant SELECT on AWS Redshift external table not supported — how to implement principle of least privilege?
I'm working with Amazon Redshift Spectrum and trying to follow the principle of least privilege by granting only the necessary access to users or roles.
I attempted the following command:
GRANT SELECT ...
0
votes
2
answers
151
views
I need to filter records with nested column in Redshift
I have data stored in s3 with parquet format, and have multiple column with array of struct
and also array inside array of struct
Here I am query one category column which have array of struct
Sample ...
0
votes
1
answer
85
views
Extracting JSON value in redshift
I am trying to get JSON extract from a column value in redshift.
The column value is like:
[{'IDIndex': '0001', 'History': 4, 'Name': '08-SA-21-C1', 'ActiveFlag': 1, 'Category': 3, 'TotalCount': 0, '...
0
votes
1
answer
215
views
AWS Redshift spectrum not able to return data for external table where data type timestamp
I'm trying to query data through Redshift Spectrum using an external schema from the Glue catalog but encountering an issue with a column that has a timestamp data type. When I run the query SELECT * ...
0
votes
1
answer
344
views
I need to access values from a MAP datatype in Redshift Spectrum
I have very large (1 billion + records) files in S3, that I am querying via Amazon Redshift using Spectrum.
I have a datatype in Redshift as follows:
map<string,struct<string_value:string,...
0
votes
1
answer
49
views
The data changed while importing JSON into Redshift using Super
I inserted the JSON:
{key1:xxx, key2:xxx, ..., key3:1.0000000123456789, ..., keyn:xxx}
into a Super type column in Redshift, and it resulted in the following on Redshift:
{key1:xxx, key2:xxx, ..., ...
-1
votes
1
answer
39
views
Trailing 12 months SUM for each customer
I have a table Sales with the following columns -
Emp ID
Activity Date
Sales
1234
2024-01-01
254.22
1234
2024-05-08
227.10
5678
2023-02-01
254.22
5678
2024-05-01
227.10
I need to find the total ...
0
votes
1
answer
175
views
How to create 1000s of tables in redshift serverless automatically using boto3?
I have an S3 bucket which contains 1000s of folders which are basically table_names and those contains parquet files.
I'm trying to create tables with that schema in redshift.
I'm using redshift-data ...
0
votes
1
answer
187
views
Extract element in STRUCT data type Redshift Spectrum
I have a spectrum table with the following schema:
TABLE spectrum.table (
realmcode struct < @code: string >,
typeid struct < @extension: string,
@root: string >,
...
2
votes
1
answer
86
views
Redshift spectrum table query stuck on discover attribute {column name}
I have an external table in Redshift spectrum.
It works fine when I add it in a view.
But when I try to query it directly, it gets stuck on Discover attribute {column name}.
The query takes ages but ...
0
votes
0
answers
280
views
Parse JSON array into rows and columns in AWS Redshift accessing data through Redshift Spectrum
Had a previous post asking about parsing an array , (JSON data) in AWS Athena into rows and columns which was answered (AWS Athena Parse array of JSON objects to rows) but had a new twist added.
We ...
0
votes
0
answers
40
views
Parsing Redshift metalized view Issue
Input stream.
{
"awsRegion": "us-west-2",
"eventID": "101",
"eventName": "TEST",
"userIdentity": null,
"...
0
votes
1
answer
1k
views
Cannot query nested array within Redshift Struct
I have a table in AWS Redshift Spectrum that contains a column called "data".
Each cell in "Data" contains an array of JSON objects. A single data cell may look like this (this is ...
0
votes
1
answer
370
views
Create MATERIALIZED VIEW in redshift using kinesis stream
I am using below query to create metalized view in redshift ,
`CREATE MATERIALIZED VIEW test_sch."new_vw" AUTO REFRESH YES AS
SELECT approximate_arrival_timestamp,
JSON_PARSE(kinesis_data) ...
0
votes
2
answers
1k
views
fetch key and value from a super field in redshift
I have a super field that holds JSON formatted data -
** {
"awsRegion": "us-west-2",
"dynamodb": {
"ApproximateCreationDateTime": 1712584702997808,
"Keys&...
0
votes
1
answer
269
views
How to get the path of s3 file while loading into the physical table in redshift?
I am getting the path of the s3 files from external table using :
create or replace view raw_view as select *,"$path" as sourcefilename from raw_external_table WITH NO SCHEMA BINDING;
Now I ...
0
votes
1
answer
308
views
Loading around 50gb of parquet data to Redshift taking indefinite time to load
I am loading around 50 gb of Parquet data into DataFrame using Glue ETL job and then trying to load into Redshift table which is taking more 6-7 hrs and not even completing.
datasink=glueContext....
0
votes
1
answer
276
views
redshift spectrum type conversion from String to Varchar
When I scan the data from S3 using a Glue crawler I get this schema:
{id: integer, value: String}
This is because Spark writes data back in String type and not varchar type. Although there is a ...
0
votes
0
answers
52
views
How to process cedilla delimiter in redshift copy,unload commands inside a stored procedure(Copy delimiter single byte character)?
My query is like this:
unload('select * from table') to 's3://path'
credentials '*******'
header
parallel off
delimiter as '\307'
The delimiter is Cedilla Ç. and I have to unload it like this '\307'. ...
0
votes
0
answers
141
views
Redshift: Join super arrays of different rows
Given a table with this schema:
id
name
values
1
a
[1,2,3]
1
b
[4,5,6]
1
c
[x,x,y]
Can I query it to receive this:
id
a
b
c
1
1
4
x
1
2
5
x
1
3
6
y
And be then able to filter e.g.
WHERE c = 'x'
or
...
0
votes
0
answers
151
views
Why is Redshift query plan showing join between tables not joined in the query
I'm facing a query performance on a quite complex query in Redshift and as a noob in Redshift I don't understand.
The query is made of several joins (INNER and LEFT) and doesn't return data in hours.
...
0
votes
1
answer
284
views
Redshift Materialized View: error when creating with glue catalog
I try to create a materialized view via spektrum from an external table in the glue data catalog
CREATE MATERIALIZED VIEW "dev"."public"."table_name" AS
SELECT DISTINCT *
...
0
votes
1
answer
482
views
How to read binary type column in parquet file by AWS Redshift Spectrum?
I have a parquet file generated by clickhouse, if use pyarrow to show its schema:
import pyarrow.parquet as pq
data = pq.read_table('test.pqt')
print(data.schema)
It shows the schema was like this:
...
0
votes
3
answers
2k
views
function json_extract_path_text(super, "unknown") does not exist - Redshift
Alright. I have a table that has SUPER type fields. These fields hold values like below:
id mycol
---------------------------------
1 [{"Title":"first"},{"Title":"...
-1
votes
1
answer
3k
views
Convert super field to string/varchar in redshift
I have a super field that holds JSON formatted data - [{"Title":"First Last"}] I want to extract the JSON value string First Last and to do so, I tried converting this field to ...
0
votes
1
answer
133
views
Convert days into hours in Amazon redshift
I want to convert a column with mixed formats like "1 day 07:00:00" and "2 days" into hours.
Here's a query that should work in Amazon Redshift:
SELECT
CASE
WHEN ...
1
vote
0
answers
144
views
Redshift Spectrum returns null WITHOUT error
I'm querying redshift spectrum and certain fields are showing up null without any explanation. I've checked
SVL_S3LOG
SVL_SPECTRUM_SCAN_ERROR
SYS_EXTERNAL_QUERY_ERROR
And they are all empty.
In the ...
0
votes
1
answer
944
views
How to ignore some columns when copy Parquet file into AWS Redshift?
I want to copy some parquet files into AWS Redshift, but the Redshift table schema has fewer columns compared to the parquet files, because those columns contain sensitive information. Therefore, I ...
3
votes
1
answer
782
views
How do I get the details about a Spectrum Scan Error on an external table on Redshift Serverless?
According to the list of available monitoring views at the bottom of Monitoring queries and workloads with Amazon Redshift Serverless, sys_external_query_error is not available in Redshift Serverless. ...
1
vote
0
answers
150
views
Skip corrupted rows in SQL while inserting
I have 2 tables, let's say MAIN (Redshift) and TEMP (Spectrum), and a simple query that inserts all the data from TEMP into MAIN. But sometimes it may fail and rise an error like this:error: Invalid ...
0
votes
1
answer
1k
views
How do I pull the first item in an array that's inside a stuct column in a Redshift nested table
I am working on some github data and would like to pull the first commit email ([email protected]) in the data below. All of the below data is stored as a struct column.
"struct<action:string,...
0
votes
2
answers
257
views
Redshift Spectrum Procedure cannot be called inside a select SQL
I am currently migrating a PostgreSQL procedure to work in Redshift Spectrum Serverless. I was able to have a working procedure that works as intended in Redshift. However, its originally used inside ...
0
votes
1
answer
63
views
Ingest s3 file to redshift maintaining line order
I have files in s3 I need to read into redshift, but I need to maintain the line number from the file somehow. I tried inserting from a spectrum table into a table with an identity column but it ...
0
votes
1
answer
574
views
Redshift Spectrum query fails with Parsed manifest is not a valid JSON object
I have an S3 bucket with 5 prefixes / "sub-folders", each containing a set of CSV files that were exported from a legacy database.
The CSV files have been crawled and created a Glue database ...
1
vote
2
answers
185
views
How to extract the values assigned to x from this string?
I am trying to create a regex that I can use to extract out the values assigned to variable x in the following string:
(req.idf=6ca9a AND (req.ster=201 OR req.ster=st_home) AND (req.ste=hi OR req....
0
votes
1
answer
1k
views
Query failed due to LIMIT error in redshift merge query
I was just trying to run one merge statement redshift, But continuously I am getting "ERROR: syntax error at or near "limit" Position: 229". But no where in code I have used this &...
1
vote
2
answers
1k
views
Redshift Spectrum over 40x slower than Athena for simple queries
I have a S3 data lake that I can query with Athena. The same data lake is hooked up to Amazon Redshift as well. However when I run queries in Redshift I get insanely longer query times compared to ...
0
votes
1
answer
608
views
How to workaround limitation for Iceberg Tables not accessible from Redshift Spectrum?
I have different Iceberg tables built and updated using Python scripts on Glue. I need now to access to them via Redshift Spectrum. From documentation (and some personal tests) it seems not possible ...
0
votes
1
answer
927
views
Unnest two columns with Redshift Spectrum
I have this table called results, with two nested arrays in columns students and grades:
class
students
grades
C1
[S1, S2, S3]
[C, A, B]
C2
[S3, S4]
[A, B]
I'd like to unnest it to the following:
...
3
votes
1
answer
676
views
AWS Redshift Spectrum when accessing files in S3 Glacier deep archive
We have set up AWS Redshift external table accessing S3 using Spectrum. Due to the huge data amount, we decided to change S3 storage class for files older than 30 days to storage class S3 Glacier Deep ...
0
votes
1
answer
2k
views
Load into table 'users' failed. Check 'stl_load_errors' system table for details, Redshift
try:
cur.execute("""
copy users
from 's3://mybucketxxx/SampleCSVFile_119kb_Copy.csv'
credentials 'aws_iam_role=arn:aws:iam::78942xxx:role/redshift-s3-access'
delimiter ','
region 'ap-...
0
votes
2
answers
619
views
Can partitioning data in Apache Hudi optimize AWS Spectrum query?
I'm using AWS Redshift Spectrum to query a Hudi table. As we know, filtering data by partition column when querying data in Spectrum could reduce the size of the data scanned by Spectrum and speed up ...
0
votes
1
answer
2k
views
How to alter external schema IAM Role in Redshift?
Based on: https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
I have my schema declared in the following way:
create external schema spectrum_schema
from data catalog
database ...
0
votes
1
answer
339
views
How to skip files with specific extension on Redshift external tables?
I have a partitioned location on S3 with data I want to read via Redshift External Table, which I create with the SQL statement CREATE EXTERNAL TABLE....
Only thing is that I have some metadata files ...
0
votes
1
answer
257
views
S3 Folder Containing Redshift Spectrum Table Deleted Randomly
I have an external table in Redshift. When I use UNLOAD to fill this table, sometimes the S3 folder that contains the data gets deleted randomly (or I couldn’t figure out the reason).
Here's the ...
0
votes
1
answer
109
views
Is there a way to get list of files scanned for a RedShift query referencing tables in Spectrum?
We have a query which is performing an aggregation, like:
SELECT t.date, COUNT(*) AS rec_count
FROM our_schema.log_data t
WHERE t.date BETWEEN '2011-01-01' AND '2012-01-01'
GROUP BY t.date;
I know we ...
0
votes
2
answers
947
views
How to run transactional SQL on Redshift using boto3
I'm trying to use boto3 redshift-data client to execute transactional SQL for external table (Redshift spectrum) with following statement,
ALTER TABLE schema.table ADD IF NOT EXISTS
PARTITION(key=...
0
votes
0
answers
459
views
Concatenating Entire Row to Get Checksum of the Data Returns Error
I'm trying to concatenate an entire row of data to get their checksum output in AWS Redshift external tables to insert them in another external table.
Here's a sample of my code (I have much more ...
1
vote
0
answers
874
views
Redshift Spectrum to Delta Lake integration using manifest files (an issue in the partitioned table when updating a partition column)
I am working with Delta Table and Redshift Spectrum and I notice strange behaviour.
I follow this article to set up a Redshift Spectrum to Delta Lake integration using manifest files and query Delta ...
0
votes
1
answer
295
views
How does AWS charge the Redshift Spectrum cluster?
AWS doc on the pricing of AWS Redshift Spectrum says that we pay for only TB scanned. However, I still need to create a Redshift cluster and specify instance type as well as how many nodes in the ...