Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
115 views

I have a large table which I want to move to a partitioned model. I created the partitioned table, same fields as the original and partioning by a particular timestamp field (by range). I then ...
skywalker's user avatar
  • 141
0 votes
0 answers
62 views

I am trying to find updated information regarding aws best practices when it comes to multi-tenant data partitioning in S3. From what I know and what I studied for when I did my AWS Solutions ...
DeskCat's user avatar
  • 31
1 vote
1 answer
135 views

I have a BigQuery table where a PubSub subscription inserts new web events every second. This table is partition by: column: derived_tstamp type: timestamp granularity: daily To create a specific ...
Vega's user avatar
  • 3,020
2 votes
1 answer
357 views

I have a big database that represents a graph with a ton of data in it that is constantly growing. The database looks something like: CREATE TABLE node ( id BIGSERIAL PRIMARY KEY, created_at ...
rcv's user avatar
  • 6,348
0 votes
1 answer
442 views

In Bigquery, suppose I create a table and partition it by a date column "mydate" with a "DAY" granularity. Using DBT, this can be done using : partition_by = { "...
Yas's user avatar
  • 63
-1 votes
2 answers
217 views

I am about to create a dynamo db table which has below columns and each row will have unique data, user id profile Id attribute1 1001 9001 x 1002 9002 x table will have 1M records which means unique ...
Shivkumar Deshmukh's user avatar
0 votes
2 answers
116 views

I have a dataset as below from which I would like to draw some inferences. Id Nbr Dt Status Cont1Sta1 DateLagInDays Recurrence 1 2 2023-10-1 1 1 2 2023-11-2 0 1 2 2023-12-13 0 1 3 2023-10-1 0 1 3 2023-...
ramadongre's user avatar
0 votes
1 answer
258 views

We have situation in the database, where we have to make one table schema of entire tables as data partitioned based on tenant id clause Using create_table "billing_schedule_lines_old", id: :...
Kunal Vashist's user avatar
1 vote
1 answer
116 views

Problem: We have a table “test”, consists of sections “test_202309”, “test_202310”, “test_202311”. The sections store data for September 2023, October 2023 and November 2023. I using the command “...
Anastasia's user avatar
1 vote
2 answers
2k views

Snowflake stores data using a hybrid-columnar storage method. I understand what columnar storage is and its benefits, but what does the hybrid mean? Is this simply referring to Snowflake accessing ...
Kymane Llewellyn's user avatar
0 votes
1 answer
390 views

A table contains xyz columns, with 3 years of data. Index = clustered column index Hash distribution column = product. Partition column = date. As the new year data arrive ...
Tony's user avatar
  • 25
0 votes
0 answers
23 views

I've tried several routes to getting the 10 records from each subset of a large dataset and the best I can do is querying each subgroup explicitly in the query. My first attempt from the (Teradata ...
n8.'s user avatar
  • 1,742
0 votes
1 answer
386 views

I am trying to split my data into training, test and validation groups within my data. I have 2 groups: control and TP and within these groups I have a secondary variable called Bio with numbers in ...
Ruth Walker's user avatar
0 votes
1 answer
183 views

I am faced with the following situation: among the BigQuery datasets which I am handling there is a rather large table - let us call it lt - that undergoes daily updates (more specifically, this table ...
ΑΘΩ's user avatar
  • 131
0 votes
1 answer
2k views

I'm following the Apache Hudi documentation to write and read a Hudi table. Here's the code I'm using to create and save a PySpark DataFrame into Azure DataLake Gen2: tableName = "my_hudi_table&...
jakeis's user avatar
  • 83
0 votes
1 answer
209 views

Can we name an automatic list partition as per user's naming critria? I have a table A for which I have created a partition which is automatic. Now when a new row with partitioning key will be ...
user22027949's user avatar
-1 votes
1 answer
542 views

I want to create a unlogged table based on the result of another query something like Create table table_1 as select * from table_2 where <<conditions>> partition by LIST(col); Obviously ...
Aayush Bhatnagar's user avatar
-1 votes
1 answer
67 views

I've got a list of an object: class packet(): def __init__(self, id, data): self.id, self.data = id, data my_list = [packet(1,"blah"),packet(2,"blah"),packet(1,"...
Jakob Lovern's user avatar
  • 1,351
1 vote
1 answer
79 views

For example if this is my table - SeqNo Gap 20 Start 21 End 29 Start 30 End 42 Start 43 End 49 Start 50 Start 51 Start 52 Start 53 Start 54 Start 55 End 220 Start 221 Start 222 ...
Sabreen Sageer's user avatar
1 vote
0 answers
359 views

I am having some issues understanding why I am getting a dask.Series back instead of a dask.DataFrame when using Dask's map_partitions() ddf is one of several large data sets split-loaded as a dask....
dtarakiuw's user avatar
0 votes
0 answers
214 views

I have 10 partitions of a hash partitioned table and I want to increase this number. What is the best way to do it ? Should I have to remove the 10 tables and recreate all of the new ones ? I want 20 ...
Inconvenient9's user avatar
1 vote
1 answer
689 views

Sharding, or say partitioning, is a technique widely used in distributed systems which logically splits data into partitions. Each node is assigned a set of partitions and hence the read/write ...
lzx071021's user avatar
0 votes
1 answer
116 views

I have data in notepad with more than 1000+ entries, which need to convert in to Excel with particular break based on length. can someone help 011000015FRB-BOS FEDERAL RESERVE BANK OF BOSTON ...
Kashyap Shah's user avatar
0 votes
0 answers
122 views

I am creating a partition table using some very specific character values. If new rows aren't inserted that match any of the values in the partition function, I want it to go in the default partition ....
user210084's user avatar
0 votes
1 answer
1k views

I was using depends_on_past with Airflow. I'm now using Dagster, with software-defined assets, and I was told that the way to get similar functionality is with build_asset_reconciliation_sensor and a ...
Sandy Ryza's user avatar
3 votes
1 answer
2k views

I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any ...
rootScott's user avatar
0 votes
1 answer
493 views

I have a requirement wherein I need to migrate tables from Teradata to DELL ECS S3, with the data being written in parquet format. I have been given a Spark cluster with single worker node of 1GB size ...
the_data_novice's user avatar
0 votes
1 answer
153 views

My data set have a partitioned table and I have select all from them: Select Customer id from Company.database.Customer_* various from (2022-01-01 till today) But have a error version on 2022-06-08 ...
Jeany's user avatar
  • 21
0 votes
1 answer
53 views

So I have a table which looks like this: StudentId dateEnrolled 23 03/01 23 05/01 23 07/05 23 08/11 23 03/01 I need to select these records such that I get the following records: StudentId ...
Lia Lia's user avatar
  • 17
0 votes
1 answer
70 views

I have a dataset as below having two columns - Id, UserCount. The dataset is sorted on Id column in ascending order. Id UserCount 1 1000 2 800 3 300 4 400 5 500 I want to partition this dataset into n ...
Abhishek Gupta's user avatar
1 vote
1 answer
879 views

def partition(dataset1, dataset2): try: client.get_dataset(dataset2) print("Dataset {} already exists".format(dataset2)) except NotFound: print("Dataset ...
Max Daniel's user avatar
0 votes
2 answers
509 views

I have a single database table affiliations in the following format: author_id article_id institution publication_date 1 1 institution_1 2010-01-01 1 1 institution_2 2010-01-01 1 2 institution_2 2012-...
d.hatch75's user avatar
-1 votes
1 answer
110 views

I have a table that contains user_id, time (six hours interval), and average margin. I wanted to group by user_id and time (time in ascending order). The table looks like this as shown below: user_id ...
Data Beginner's user avatar
5 votes
1 answer
8k views

Let say my data stored in object storage, say s3, with date time partition like this: s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet ... s3://my-bucket/year=2022/month=12/day=31/SOME-...
user3595632's user avatar
  • 5,780
1 vote
1 answer
306 views

Is it possible to Shard a collection in MongoDB atlas? I tried to Shard a collection but when going to enable sharding to my database it gave this error. MongoServerError: (Unauthorized) not ...
LakshanAmal's user avatar
3 votes
1 answer
1k views

I am working on a transaction table in MySQL, and according to some requirements I have to ALTER table (Transaction) and apply a partition on it (Year) wise, and (Sub-Partition) month-wise, and it ...
Maneesh Kumar's user avatar
0 votes
0 answers
121 views

So, I have a small cluster with 3 Spark workers(2 executors each) and on the same nodes I have also installed Cassandra in order to achieve data locality. In order to evaluate the speed and times(from ...
ktzan's user avatar
  • 520
0 votes
1 answer
189 views

I got a table like this: group_id start_date end_date 19335 20220613 20220714 19527 20220620 20220719 19339 20220614 20220720 19436 20220616 20220715 20095 20220711 20220809 I am trying to retrieve ...
Javiss's user avatar
  • 815
1 vote
1 answer
2k views

I am trying to create an Athena table using partition projection. I am delivering records to S3 using Kinesis Firehouse, grouped using a dynamic partitioning key. For example, the records look like ...
ash_m's user avatar
  • 51
0 votes
1 answer
290 views

I want to divide my dataset into a training set(70%) and a test set(30%). I used unsupervised resample filter in this regard. The steps I followed for the partition are as follows Select unsupervised ...
Encipher's user avatar
  • 3,488
0 votes
0 answers
85 views

I am using event hub with 32 partitions and all pods are deployed in AKS . We have only 1 dev environment . I would like to know is there any provision that we can direct our traffic to us . This ...
Ayushi Gupta's user avatar
0 votes
1 answer
233 views

I have a partitioned table (bigquery) and records are streamed for each date multiple times during a few days period, eg: records for 02.06.2022 are streamed on 03.06, 04.06, 05.06 and etc. Is there a ...
T.B.'s user avatar
  • 43
2 votes
2 answers
1k views

The input table looks like this: ID pid method date 111 A123 credit_card 12-03-2015 111 A128 ACH 11-28-2015 Now for the ID = 111, I need to select the MIN(date) and see what the method of payment for ...
moikoi's user avatar
  • 35
0 votes
1 answer
278 views

We have a primary table that is Range partitioned by date with a 1-month interval. It's also a list sub-partitioned with 4 distinct values. So essentially it is one month partition having 4 sub-...
AJORA's user avatar
  • 37
1 vote
0 answers
45 views

I am trying to add a calculated column that computes a rolling average of a sorted partition. I can make it work as a query but cannot seem to get the result to become a calculated field. ALTER TABLE ...
Nuljon's user avatar
  • 11
-2 votes
1 answer
86 views

Im trying to get the average of "active" for each place under a specific area. So say the output would be ("Andaman and Nicobar Islands": 10, "Andhra Pradesh": 12) ...
nataku's user avatar
  • 5
0 votes
2 answers
1k views

I am a Tableau designer, and we are building some views that get filtered by category a lot. Because of this, we tried to create a category_id that would serve as partition. The problem seems to be ...
DrPib81's user avatar
  • 63
1 vote
1 answer
1k views

I've managed to unload my data into a partitions, but each one of them is also being partitioned into multiple files. Is there a way to force Snowflake to generate a single file per partition? It also ...
Andres's user avatar
  • 13
0 votes
1 answer
176 views

I am working on a pipeline that takes data and do some partitioning on it, I am trying to load some data into bq table on gcp, but I got Too many partitions produced by query, allowed 4000, query ...
Mee's user avatar
  • 1,699
0 votes
3 answers
1k views

I am trying to add monthly partition on a table for an year or so. But the issue is I cannot add them in a single query. While creating the table in the past, I have added the partition for each month ...
waqar shahbaz's user avatar

1
2 3 4 5
7