338 questions
0
votes
1
answer
115
views
Partitioned Table - Query filtering on partition field
I have a large table which I want to move to a partitioned model. I created the partitioned table, same fields as the original and partioning by a particular timestamp field (by range). I then ...
0
votes
0
answers
62
views
Does new S3 bucket quota change AWS data partitioning best practice for multi-tenant systems
I am trying to find updated information regarding aws best practices when it comes to multi-tenant data partitioning in S3.
From what I know and what I studied for when I did my AWS Solutions ...
1
vote
1
answer
135
views
Partition pruning in BigQuery with incremental model
I have a BigQuery table where a PubSub subscription inserts new web events every second.
This table is partition by:
column: derived_tstamp
type: timestamp
granularity: daily
To create a specific ...
2
votes
1
answer
357
views
Postgres partitioning with a primary key
I have a big database that represents a graph with a ton of data in it that is constantly growing. The database looks something like:
CREATE TABLE node (
id BIGSERIAL PRIMARY KEY,
created_at ...
0
votes
1
answer
442
views
Is changing date partitionning granularity a breaking change?
In Bigquery, suppose I create a table and partition it by a date column "mydate" with a "DAY" granularity.
Using DBT, this can be done using :
partition_by = {
"...
-1
votes
2
answers
217
views
What should be my partition key and sort key of dynamo db table?
I am about to create a dynamo db table which has below columns and each row will have unique data,
user id
profile Id
attribute1
1001
9001
x
1002
9002
x
table will have 1M records which means unique ...
0
votes
2
answers
116
views
Perform determinations within a data partition
I have a dataset as below from which I would like to draw some inferences.
Id
Nbr
Dt
Status
Cont1Sta1
DateLagInDays
Recurrence
1
2
2023-10-1
1
1
2
2023-11-2
0
1
2
2023-12-13
0
1
3
2023-10-1
0
1
3
2023-...
0
votes
1
answer
258
views
Postgres Data Partition in Rails 7.0.8
We have situation in the database, where we have to make one table schema of entire tables as data partitioned based on tenant id clause
Using
create_table "billing_schedule_lines_old", id: :...
1
vote
1
answer
116
views
Is it possible in PostgreSQL to restrict changes for files whose data is not actually changed?
Problem: We have a table “test”, consists of sections “test_202309”, “test_202310”, “test_202311”. The sections store data for September 2023, October 2023 and November 2023.
I using the command “...
1
vote
2
answers
2k
views
What is hybrid-columnar storage?
Snowflake stores data using a hybrid-columnar storage method. I understand what columnar storage is and its benefits, but what does the hybrid mean? Is this simply referring to Snowflake accessing ...
0
votes
1
answer
390
views
How Azure dedicated pool partition switch work efficiently when data is sharded over 60 distributions
A table contains xyz columns, with 3 years of data.
Index = clustered column index
Hash distribution column = product.
Partition column = date.
As the new year data arrive ...
0
votes
0
answers
23
views
Top or Sample N of subgroups in Teradata in a large data set ("No Spool Space" error)
I've tried several routes to getting the 10 records from each subset of a large dataset and the best I can do is querying each subgroup explicitly in the query.
My first attempt from the (Teradata ...
0
votes
1
answer
386
views
Splitting data into training, test and validation sets depending on variable dependent for machine learning
I am trying to split my data into training, test and validation groups within my data. I have 2 groups: control and TP and within these groups I have a secondary variable called Bio with numbers in ...
0
votes
1
answer
183
views
Creating a partitioned version of a BigQuery table scheduled for daily updates
I am faced with the following situation: among the BigQuery datasets which I am handling there is a rather large table - let us call it lt - that undergoes daily updates (more specifically, this table ...
0
votes
1
answer
2k
views
PySpark: querying Hudi partitioned table
I'm following the Apache Hudi documentation to write and read a Hudi table. Here's the code I'm using to create and save a PySpark DataFrame into Azure DataLake Gen2:
tableName = "my_hudi_table&...
0
votes
1
answer
209
views
Can we name an automatic list partition in Oracle as per our choice
Can we name an automatic list partition as per user's naming critria?
I have a table A for which I have created a partition which is automatic. Now when a new row with partitioning key will be ...
-1
votes
1
answer
542
views
Create partitioned table using sub-query
I want to create a unlogged table based on the result of another query something like
Create table table_1
as
select * from table_2
where <<conditions>>
partition by LIST(col);
Obviously ...
-1
votes
1
answer
67
views
Python semisort list of objects by attribute
I've got a list of an object:
class packet():
def __init__(self, id, data):
self.id, self.data = id, data
my_list = [packet(1,"blah"),packet(2,"blah"),packet(1,"...
1
vote
1
answer
79
views
How can I create an index based on values from another column in SQL?
For example if this is my table -
SeqNo Gap
20 Start
21 End
29 Start
30 End
42 Start
43 End
49 Start
50 Start
51 Start
52 Start
53 Start
54 Start
55 End
220 Start
221 Start
222 ...
1
vote
0
answers
359
views
Dask map_partitions returns a dask.Series instead of dask.DataFrame
I am having some issues understanding why I am getting a dask.Series back instead of a dask.DataFrame when using Dask's map_partitions()
ddf is one of several large data sets split-loaded as a dask....
0
votes
0
answers
214
views
Postgresql hash partition scaling
I have 10 partitions of a hash partitioned table and I want to increase this number.
What is the best way to do it ? Should I have to remove the 10 tables and recreate all of the new ones ?
I want 20 ...
1
vote
1
answer
689
views
How to combine sharding and consistent hashing within a distributed system?
Sharding, or say partitioning, is a technique widely used in distributed systems which logically splits data into partitions. Each node is assigned a set of partitions and hence the read/write ...
0
votes
1
answer
116
views
Notepad to Excel column conversion - How to Parse one mixed string to 2 different column in excel
I have data in notepad with more than 1000+ entries, which need to convert in to Excel with particular break based on length. can someone help
011000015FRB-BOS FEDERAL RESERVE BANK OF BOSTON ...
0
votes
0
answers
122
views
Microsoft SQL Server Table Partitions on Distinct character values
I am creating a partition table using some very specific character values.
If new rows aren't inserted that match any of the values in the partition function, I want it to go in the default partition ....
0
votes
1
answer
1k
views
How do I write a Dagster asset that depends on an earlier partition of itself?
I was using depends_on_past with Airflow. I'm now using Dagster, with software-defined assets, and I was told that the way to get similar functionality is with build_asset_reconciliation_sensor and a ...
3
votes
1
answer
2k
views
Estimating How Long It Takes To Partition A Large Table
I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any ...
0
votes
1
answer
493
views
Spark write speed performance test while loading data from Teradata to S3 in parquet format
I have a requirement wherein I need to migrate tables from Teradata to DELL ECS S3, with the data being written in parquet format. I have been given a Spark cluster with single worker node of 1GB size ...
0
votes
1
answer
153
views
How to select from partitioned table except a version in big query?
My data set have a partitioned table and I have select all from them:
Select Customer id
from Company.database.Customer_*
various from (2022-01-01 till today)
But have a error version on 2022-06-08 ...
0
votes
1
answer
53
views
How can I assign value to one of the columns based on the increasing value of date for those values?
So I have a table which looks like this:
StudentId
dateEnrolled
23
03/01
23
05/01
23
07/05
23
08/11
23
03/01
I need to select these records such that I get the following records:
StudentId
...
0
votes
1
answer
70
views
Partitioning an ordered dataset into N partitions with ~equal sum in spark (maintaining the order while assigning buckets)
I have a dataset as below having two columns - Id, UserCount.
The dataset is sorted on Id column in ascending order.
Id
UserCount
1
1000
2
800
3
300
4
400
5
500
I want to partition this dataset into n ...
1
vote
1
answer
879
views
How to find out the type of partitioning in a table in google bigquery using python apis
def partition(dataset1, dataset2):
try:
client.get_dataset(dataset2)
print("Dataset {} already exists".format(dataset2))
except NotFound:
print("Dataset ...
0
votes
2
answers
509
views
How to get the next value in a separate partition in a SQL query?
I have a single database table affiliations in the following format:
author_id
article_id
institution
publication_date
1
1
institution_1
2010-01-01
1
1
institution_2
2010-01-01
1
2
institution_2
2012-...
-1
votes
1
answer
110
views
How to groupby user_id and time using SQL Bigquery
I have a table that contains user_id, time (six hours interval), and average margin. I wanted to group by user_id and time (time in ascending order).
The table looks like this as shown below:
user_id
...
5
votes
1
answer
8k
views
How to read filtered partitioned parquet files efficiently using pandas's read_parquet?
Let say my data stored in object storage, say s3, with date time partition like this:
s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet
...
s3://my-bucket/year=2022/month=12/day=31/SOME-...
1
vote
1
answer
306
views
Shard a collection in mongo atlas
Is it possible to Shard a collection in MongoDB atlas? I tried to Shard a collection but when going to enable sharding to my database it gave this error.
MongoServerError: (Unauthorized) not ...
3
votes
1
answer
1k
views
How to make Spring Boot JPA support (Partition) in its Query
I am working on a transaction table in MySQL, and according to some requirements I have to ALTER table (Transaction) and apply a partition on it (Year) wise, and (Sub-Partition) month-wise, and it ...
0
votes
0
answers
121
views
Increasing spark workers and cassandra nodes takes more time
So, I have a small cluster with 3 Spark workers(2 executors each) and on the same nodes I have also installed Cassandra in order to achieve data locality. In order to evaluate the speed and times(from ...
0
votes
1
answer
189
views
BigQuery: iterating groups within a window of 28days before a start_date column using _TABLE_SUFFIX
I got a table like this:
group_id
start_date
end_date
19335
20220613
20220714
19527
20220620
20220719
19339
20220614
20220720
19436
20220616
20220715
20095
20220711
20220809
I am trying to retrieve ...
1
vote
1
answer
2k
views
AWS Athena: Partition projection using date-hour with mixed ranges
I am trying to create an Athena table using partition projection. I am delivering records to S3 using Kinesis Firehouse, grouped using a dynamic partitioning key. For example, the records look like ...
0
votes
1
answer
290
views
Weka Unsupervised resample filter for data partition
I want to divide my dataset into a training set(70%) and a test set(30%). I used unsupervised resample filter in this regard. The steps I followed for the partition are as follows
Select unsupervised ...
0
votes
0
answers
85
views
Separation of data traffic when data can come in any partition in event hub in Azure
I am using event hub with 32 partitions and all pods are deployed in AKS . We have only 1 dev environment . I would like to know is there any provision that we can direct our traffic to us . This ...
0
votes
1
answer
233
views
Trigger scheduled query
I have a partitioned table (bigquery) and records are streamed for each date multiple times during a few days period, eg: records for 02.06.2022 are streamed on 03.06, 04.06, 05.06 and etc.
Is there a ...
2
votes
2
answers
1k
views
PSQL determine the min value of date depending on another column
The input table looks like this:
ID
pid
method
date
111
A123
credit_card
12-03-2015
111
A128
ACH
11-28-2015
Now for the ID = 111, I need to select the MIN(date) and see what the method of payment for ...
0
votes
1
answer
278
views
Move Range Interval partition data from one table to history table in other database
We have a primary table that is Range partitioned by date with a 1-month interval. It's also a list sub-partitioned with 4 distinct values. So essentially it is one month partition having 4 sub-...
1
vote
0
answers
45
views
add generated column with aggregated over a partion and sort
I am trying to add a calculated column that computes a rolling average of a sorted partition. I can make it work as a query but cannot seem to get the result to become a calculated field.
ALTER TABLE ...
-2
votes
1
answer
86
views
Python JSON data parsing
Im trying to get the average of "active" for each place under a specific area. So say the output would be ("Andaman and Nicobar Islands": 10, "Andhra Pradesh": 12) ...
0
votes
2
answers
1k
views
Can I leverage BigQuery (BQ) partition via a join?
I am a Tableau designer, and we are building some views that get filtered by category a lot. Because of this, we tried to create a category_id that would serve as partition. The problem seems to be ...
1
vote
1
answer
1k
views
How to generate a single file per partition - Snowflake COPY into location
I've managed to unload my data into a partitions, but each one of them is also being partitioned into multiple files. Is there a way to force Snowflake to generate a single file per partition?
It also ...
0
votes
1
answer
176
views
how to know how many query jobs is submitted
I am working on a pipeline that takes data and do some partitioning on it, I am trying to load some data into bq table on gcp, but I got Too many partitions produced by query, allowed 4000, query ...
0
votes
3
answers
1k
views
Add partition in existing table Greenplum
I am trying to add monthly partition on a table for an year or so. But the issue is I cannot add them in a single query. While creating the table in the past, I have added the partition for each month ...