Newest 'data-partitioning' Questions

0 votes

1 answer

115 views

Partitioned Table - Query filtering on partition field

I have a large table which I want to move to a partitioned model. I created the partitioned table, same fields as the original and partioning by a particular timestamp field (by range). I then ...

skywalker

141

asked Apr 18 at 7:51

0 votes

0 answers

62 views

Does new S3 bucket quota change AWS data partitioning best practice for multi-tenant systems

I am trying to find updated information regarding aws best practices when it comes to multi-tenant data partitioning in S3. From what I know and what I studied for when I did my AWS Solutions ...

DeskCat

31

asked Feb 5 at 10:24

1 vote

1 answer

135 views

Partition pruning in BigQuery with incremental model

I have a BigQuery table where a PubSub subscription inserts new web events every second. This table is partition by: column: derived_tstamp type: timestamp granularity: daily To create a specific ...

Vega

3,020

asked Nov 4, 2024 at 15:50

2 votes

1 answer

357 views

Postgres partitioning with a primary key

I have a big database that represents a graph with a ton of data in it that is constantly growing. The database looks something like: CREATE TABLE node ( id BIGSERIAL PRIMARY KEY, created_at ...

rcv

6,348

asked Sep 30, 2024 at 19:34

0 votes

1 answer

442 views

Is changing date partitionning granularity a breaking change?

In Bigquery, suppose I create a table and partition it by a date column "mydate" with a "DAY" granularity. Using DBT, this can be done using : partition_by = { "...

Yas

63

asked Jul 3, 2024 at 6:14

-1 votes

2 answers

217 views

What should be my partition key and sort key of dynamo db table?

I am about to create a dynamo db table which has below columns and each row will have unique data, user id profile Id attribute1 1001 9001 x 1002 9002 x table will have 1M records which means unique ...

Shivkumar Deshmukh

1,148

asked Apr 5, 2024 at 20:53

0 votes

2 answers

116 views

Perform determinations within a data partition

I have a dataset as below from which I would like to draw some inferences. Id Nbr Dt Status Cont1Sta1 DateLagInDays Recurrence 1 2 2023-10-1 1 1 2 2023-11-2 0 1 2 2023-12-13 0 1 3 2023-10-1 0 1 3 2023-...

ramadongre

101

asked Apr 4, 2024 at 3:13

0 votes

1 answer

258 views

Postgres Data Partition in Rails 7.0.8

We have situation in the database, where we have to make one table schema of entire tables as data partitioned based on tenant id clause Using create_table "billing_schedule_lines_old", id: :...

Kunal Vashist

2,491

asked Feb 29, 2024 at 21:08

1 vote

1 answer

116 views

Is it possible in PostgreSQL to restrict changes for files whose data is not actually changed?

Problem: We have a table “test”, consists of sections “test_202309”, “test_202310”, “test_202311”. The sections store data for September 2023, October 2023 and November 2023. I using the command “...

Anastasia

23

asked Nov 10, 2023 at 12:49

1 vote

2 answers

2k views

What is hybrid-columnar storage?

Snowflake stores data using a hybrid-columnar storage method. I understand what columnar storage is and its benefits, but what does the hybrid mean? Is this simply referring to Snowflake accessing ...

Kymane Llewellyn

51

asked Oct 19, 2023 at 18:57

0 votes

1 answer

390 views

How Azure dedicated pool partition switch work efficiently when data is sharded over 60 distributions

A table contains xyz columns, with 3 years of data. Index = clustered column index Hash distribution column = product. Partition column = date. As the new year data arrive ...

Tony

25

asked Sep 12, 2023 at 12:20

0 votes

0 answers

23 views

Top or Sample N of subgroups in Teradata in a large data set ("No Spool Space" error)

I've tried several routes to getting the 10 records from each subset of a large dataset and the best I can do is querying each subgroup explicitly in the query. My first attempt from the (Teradata ...

n8.

1,742

asked Aug 23, 2023 at 18:58

0 votes

1 answer

386 views

Splitting data into training, test and validation sets depending on variable dependent for machine learning

I am trying to split my data into training, test and validation groups within my data. I have 2 groups: control and TP and within these groups I have a secondary variable called Bio with numbers in ...

Ruth Walker

1

asked Aug 17, 2023 at 14:27

0 votes

1 answer

183 views

Creating a partitioned version of a BigQuery table scheduled for daily updates

I am faced with the following situation: among the BigQuery datasets which I am handling there is a rather large table - let us call it lt - that undergoes daily updates (more specifically, this table ...

ΑΘΩ

131

asked Aug 2, 2023 at 10:05

0 votes

1 answer

2k views

PySpark: querying Hudi partitioned table

I'm following the Apache Hudi documentation to write and read a Hudi table. Here's the code I'm using to create and save a PySpark DataFrame into Azure DataLake Gen2: tableName = "my_hudi_table&...

jakeis

83

asked Jul 26, 2023 at 11:24

0 votes

1 answer

209 views

Can we name an automatic list partition in Oracle as per our choice

Can we name an automatic list partition as per user's naming critria? I have a table A for which I have created a partition which is automatic. Now when a new row with partitioning key will be ...

user22027949

11

asked Jun 6, 2023 at 6:24

-1 votes

1 answer

542 views

Create partitioned table using sub-query

I want to create a unlogged table based on the result of another query something like Create table table_1 as select * from table_2 where <<conditions>> partition by LIST(col); Obviously ...

Aayush Bhatnagar

19

asked Apr 26, 2023 at 10:12

-1 votes

1 answer

67 views

Python semisort list of objects by attribute

I've got a list of an object: class packet(): def __init__(self, id, data): self.id, self.data = id, data my_list = [packet(1,"blah"),packet(2,"blah"),packet(1,"...

Jakob Lovern

1,351

asked Apr 21, 2023 at 23:41

1 vote

1 answer

79 views

How can I create an index based on values from another column in SQL?

For example if this is my table - SeqNo Gap 20 Start 21 End 29 Start 30 End 42 Start 43 End 49 Start 50 Start 51 Start 52 Start 53 Start 54 Start 55 End 220 Start 221 Start 222 ...

Sabreen Sageer

11

asked Apr 21, 2023 at 1:53

1 vote

0 answers

359 views

Dask map_partitions returns a dask.Series instead of dask.DataFrame

I am having some issues understanding why I am getting a dask.Series back instead of a dask.DataFrame when using Dask's map_partitions() ddf is one of several large data sets split-loaded as a dask....

dtarakiuw

11

asked Apr 20, 2023 at 17:40

0 votes

0 answers

214 views

Postgresql hash partition scaling

I have 10 partitions of a hash partitioned table and I want to increase this number. What is the best way to do it ? Should I have to remove the 10 tables and recreate all of the new ones ? I want 20 ...

Inconvenient9

51

asked Apr 18, 2023 at 14:11

1 vote

1 answer

689 views

How to combine sharding and consistent hashing within a distributed system?

Sharding, or say partitioning, is a technique widely used in distributed systems which logically splits data into partitions. Each node is assigned a set of partitions and hence the read/write ...

lzx071021

21

asked Mar 8, 2023 at 16:05

0 votes

1 answer

116 views

Notepad to Excel column conversion - How to Parse one mixed string to 2 different column in excel

I have data in notepad with more than 1000+ entries, which need to convert in to Excel with particular break based on length. can someone help 011000015FRB-BOS FEDERAL RESERVE BANK OF BOSTON ...

Kashyap Shah

1

asked Feb 8, 2023 at 14:02

0 votes

0 answers

122 views

Microsoft SQL Server Table Partitions on Distinct character values

I am creating a partition table using some very specific character values. If new rows aren't inserted that match any of the values in the partition function, I want it to go in the default partition ....

user210084

1

asked Jan 30, 2023 at 21:15

0 votes

1 answer

1k views

How do I write a Dagster asset that depends on an earlier partition of itself?

I was using depends_on_past with Airflow. I'm now using Dagster, with software-defined assets, and I was told that the way to get similar functionality is with build_asset_reconciliation_sensor and a ...

Sandy Ryza

305

asked Jan 27, 2023 at 18:14

3 votes

1 answer

2k views

Estimating How Long It Takes To Partition A Large Table

I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any ...

rootScott

33

asked Jan 17, 2023 at 16:47

0 votes

1 answer

493 views

Spark write speed performance test while loading data from Teradata to S3 in parquet format

I have a requirement wherein I need to migrate tables from Teradata to DELL ECS S3, with the data being written in parquet format. I have been given a Spark cluster with single worker node of 1GB size ...

the_data_novice

57

asked Jan 6, 2023 at 12:52

0 votes

1 answer

153 views

How to select from partitioned table except a version in big query?

My data set have a partitioned table and I have select all from them: Select Customer id from Company.database.Customer_* various from (2022-01-01 till today) But have a error version on 2022-06-08 ...

Jeany

21

asked Dec 16, 2022 at 21:32

0 votes

1 answer

53 views

How can I assign value to one of the columns based on the increasing value of date for those values?

So I have a table which looks like this: StudentId dateEnrolled 23 03/01 23 05/01 23 07/05 23 08/11 23 03/01 I need to select these records such that I get the following records: StudentId ...

Lia Lia

17

asked Dec 12, 2022 at 18:17

0 votes

1 answer

70 views

Partitioning an ordered dataset into N partitions with ~equal sum in spark (maintaining the order while assigning buckets)

I have a dataset as below having two columns - Id, UserCount. The dataset is sorted on Id column in ascending order. Id UserCount 1 1000 2 800 3 300 4 400 5 500 I want to partition this dataset into n ...

Abhishek Gupta

11

asked Dec 2, 2022 at 10:40

1 vote

1 answer

879 views

How to find out the type of partitioning in a table in google bigquery using python apis

def partition(dataset1, dataset2): try: client.get_dataset(dataset2) print("Dataset {} already exists".format(dataset2)) except NotFound: print("Dataset ...

Max Daniel

27

asked Oct 28, 2022 at 5:49

0 votes

2 answers

509 views

How to get the next value in a separate partition in a SQL query?

I have a single database table affiliations in the following format: author_id article_id institution publication_date 1 1 institution_1 2010-01-01 1 1 institution_2 2010-01-01 1 2 institution_2 2012-...

d.hatch75

60

asked Oct 21, 2022 at 18:56

-1 votes

1 answer

110 views

How to groupby user_id and time using SQL Bigquery

I have a table that contains user_id, time (six hours interval), and average margin. I wanted to group by user_id and time (time in ascending order). The table looks like this as shown below: user_id ...

Data Beginner

61

asked Oct 18, 2022 at 11:56

5 votes

1 answer

8k views

How to read filtered partitioned parquet files efficiently using pandas's read_parquet?

Let say my data stored in object storage, say s3, with date time partition like this: s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet ... s3://my-bucket/year=2022/month=12/day=31/SOME-...

user3595632

5,780

asked Aug 31, 2022 at 6:38

1 vote

1 answer

306 views

Shard a collection in mongo atlas

Is it possible to Shard a collection in MongoDB atlas? I tried to Shard a collection but when going to enable sharding to my database it gave this error. MongoServerError: (Unauthorized) not ...

LakshanAmal

13

asked Aug 30, 2022 at 6:50

3 votes

1 answer

1k views

How to make Spring Boot JPA support (Partition) in its Query

I am working on a transaction table in MySQL, and according to some requirements I have to ALTER table (Transaction) and apply a partition on it (Year) wise, and (Sub-Partition) month-wise, and it ...

Maneesh Kumar

31

asked Aug 16, 2022 at 5:22

0 votes

0 answers

121 views

Increasing spark workers and cassandra nodes takes more time

So, I have a small cluster with 3 Spark workers(2 executors each) and on the same nodes I have also installed Cassandra in order to achieve data locality. In order to evaluate the speed and times(from ...

ktzan

520

asked Jul 24, 2022 at 11:12

0 votes

1 answer

189 views

BigQuery: iterating groups within a window of 28days before a start_date column using _TABLE_SUFFIX

I got a table like this: group_id start_date end_date 19335 20220613 20220714 19527 20220620 20220719 19339 20220614 20220720 19436 20220616 20220715 20095 20220711 20220809 I am trying to retrieve ...

Javiss

815

asked Jul 14, 2022 at 9:55

1 vote

1 answer

2k views

AWS Athena: Partition projection using date-hour with mixed ranges

I am trying to create an Athena table using partition projection. I am delivering records to S3 using Kinesis Firehouse, grouped using a dynamic partitioning key. For example, the records look like ...

ash_m

51

asked Jul 7, 2022 at 14:42

0 votes

1 answer

290 views

Weka Unsupervised resample filter for data partition

I want to divide my dataset into a training set(70%) and a test set(30%). I used unsupervised resample filter in this regard. The steps I followed for the partition are as follows Select unsupervised ...

Encipher

3,488

asked Jun 22, 2022 at 3:29

0 votes

0 answers

85 views

Separation of data traffic when data can come in any partition in event hub in Azure

I am using event hub with 32 partitions and all pods are deployed in AKS . We have only 1 dev environment . I would like to know is there any provision that we can direct our traffic to us . This ...

Ayushi Gupta

303

asked Jun 20, 2022 at 17:46

0 votes

1 answer

233 views

Trigger scheduled query

I have a partitioned table (bigquery) and records are streamed for each date multiple times during a few days period, eg: records for 02.06.2022 are streamed on 03.06, 04.06, 05.06 and etc. Is there a ...

T.B.

43

asked Jun 17, 2022 at 12:05

2 votes

2 answers

1k views

PSQL determine the min value of date depending on another column

The input table looks like this: ID pid method date 111 A123 credit_card 12-03-2015 111 A128 ACH 11-28-2015 Now for the ID = 111, I need to select the MIN(date) and see what the method of payment for ...

moikoi

35

asked Jun 12, 2022 at 17:53

0 votes

1 answer

278 views

Move Range Interval partition data from one table to history table in other database

We have a primary table that is Range partitioned by date with a 1-month interval. It's also a list sub-partitioned with 4 distinct values. So essentially it is one month partition having 4 sub-...

AJORA

37

asked May 11, 2022 at 1:30

1 vote

0 answers

45 views

add generated column with aggregated over a partion and sort

I am trying to add a calculated column that computes a rolling average of a sorted partition. I can make it work as a query but cannot seem to get the result to become a calculated field. ALTER TABLE ...

Nuljon

11

asked May 6, 2022 at 2:43

-2 votes

1 answer

86 views

Python JSON data parsing

Im trying to get the average of "active" for each place under a specific area. So say the output would be ("Andaman and Nicobar Islands": 10, "Andhra Pradesh": 12) ...

nataku

5

asked Apr 21, 2022 at 19:25

0 votes

2 answers

1k views

Can I leverage BigQuery (BQ) partition via a join?

I am a Tableau designer, and we are building some views that get filtered by category a lot. Because of this, we tried to create a category_id that would serve as partition. The problem seems to be ...

DrPib81

63

asked Mar 31, 2022 at 20:53

1 vote

1 answer

1k views

How to generate a single file per partition - Snowflake COPY into location

I've managed to unload my data into a partitions, but each one of them is also being partitioned into multiple files. Is there a way to force Snowflake to generate a single file per partition? It also ...

Andres

13

asked Mar 21, 2022 at 16:47

0 votes

1 answer

176 views

how to know how many query jobs is submitted

I am working on a pipeline that takes data and do some partitioning on it, I am trying to load some data into bq table on gcp, but I got Too many partitions produced by query, allowed 4000, query ...

Mee

1,699

asked Mar 9, 2022 at 11:47

0 votes

3 answers

1k views

Add partition in existing table Greenplum

I am trying to add monthly partition on a table for an year or so. But the issue is I cannot add them in a single query. While creating the table in the past, I have added the partition for each month ...

waqar shahbaz

34

asked Feb 14, 2022 at 14:58

Collectives™ on Stack Overflow