Skip to main content
Filter by
Sorted by
Tagged with
Best practices
1 vote
3 replies
88 views

I'm using a table of approximately 1TB in a MySQL database. This table also has a monthly partition. We store the last two months of data in this table and regularly truncate the data from the ...
user1592429's user avatar
Best practices
0 votes
3 replies
71 views

I'm new to nginx and proxy servers. We have a problem about googlesource 429 Error, caused by many requests and because of bandwidth, we took a long time to get googlesource. We reviewed to make AOSP ...
SangHyuk Kwon's user avatar
0 votes
1 answer
85 views

I have a large pandas dataframe df of something like a million rows and 100 columns, and I have to create a second dataframe df_n, same size as the first one. Several rows and columns of df_n will be ...
MBlrd's user avatar
  • 165
1 vote
1 answer
104 views

I have a csv that has 162 rows, but around 3 million columns. I want to read it with pandas. I have enough RAM available, but pd.read_csv(file.csv, header=None, dtype=str) takes forever. The cells ...
emilp's user avatar
  • 25
0 votes
1 answer
133 views

I am trying to use mat-autocomplete with large data. For the large data aspect im using a cdk-virtual-scroll-viewport with mat-autocomplete. Everything works except the Arrow-Key Navigation. <mat-...
Fabian's user avatar
  • 3
2 votes
0 answers
74 views

I'm working on a PHP web application that generates reports from a MySQL table with over 1 million rows. What I'm trying to do: Fetch a large dataset and process it to generate a downloadable report (...
Rita Williams's user avatar
6 votes
3 answers
289 views

I have two python lists(list1 and list2),each containing 51 complex numbers. At each index i, I can choose either list1[i] or list2[i]. I want to select one element per index(Total of 51 elements) ...
Manish Tr's user avatar
1 vote
1 answer
51 views

The results area is finding the largest top 4 costs in column A within the date range =IFERROR(LARGE(IF(Sheet1!$D$5:$D$4935>=$A$2,IF(Sheet1!$D$5:$D$20<=$B$2,Sheet1!$E$5:$E$20)),1),0)and then ...
mjac's user avatar
  • 221
0 votes
1 answer
82 views

Following this guide, I managed to convert my IBS matrix into a phylo object. Now. everything works just fine for small test cases. However, I'm now trying to scale to the entire dataset of 300 ...
Matteo's user avatar
  • 537
5 votes
1 answer
259 views

I'm working with a very large dataset containing CWD (Cumulative Water Deficit) and EVI (Enhanced Vegetation Index) measurements across different landcover types. The current code uses LOESS ...
Shunrei's user avatar
  • 339
0 votes
1 answer
55 views

I am trying to remove the "Don't know/refuse" for the headache and breasttenderness variable but all the values for the breasttenderness_num and headache_num are missing. Below is the code ...
John Mathews's user avatar
-3 votes
2 answers
82 views

I want to know how many groups of data this df has: df <- data.frame( stringsAsFactors = FALSE, V1 = c("A","-","-","-","B"...
noriega's user avatar
  • 448
0 votes
0 answers
59 views

I’m working on a system where I calculate the similarity between user vectors and product vectors using cosine similarity in Python with NumPy. The code below performs the necessary operations, but I ...
SanchoH's user avatar
1 vote
0 answers
84 views

I'm currently trying to analyze data from the National Inpatient Sample (NIS). When combining multiple years worth of data, my files are just over 8 GB after processing/selecting relevant columns. I ...
Eli's user avatar
  • 337
1 vote
1 answer
74 views

I have some very long time series data (millions of data points) and generate interactive plotly html plots based on this data. I am using Scattergl from plotly's graphical_obects. When I attempt to ...
rhz's user avatar
  • 1,172
1 vote
1 answer
256 views

I'm working with a large dataset (around 1 million records) represented as a list of dictionaries in Python. Each dictionary has multiple fields, and I need to filter the data based on several ...
user avatar
0 votes
1 answer
316 views

I have converted a large dataset into a two-way frequency table and want to present it in a heatmap graph, with the colours representing the frequency. I've managed to make a heatmap, but it only ...
Asha Marshall's user avatar
3 votes
4 answers
109 views

I work with a real large dataset, where it is very difficult to look at all columns individually. At this time I only want to count the frequency of the information provided. Lets say I have a ...
USER12345's user avatar
  • 145
1 vote
1 answer
59 views

I am using a code to filter out a smaller dataset from a larger dataset. I am selecting children under the age of 24 months and another variable (b9) which says if child is living with mother or not. ...
Ultrainstinct's user avatar
-1 votes
1 answer
60 views

I have a large dataset (millions of entries) that needs to be sorted. What are the best practices or most efficient methods for sorting such a dataset in Python? Specifically: Is Python's built-in ...
Bibek Bayan's user avatar
0 votes
0 answers
38 views

I have a process where I must (de)tokenize payment account information. Currently, when sending a file to client with payment information (among other things), the actual payment account numbers are ...
George's user avatar
  • 2,212
1 vote
0 answers
30 views

I'm trying to create 3d graphs of large data sets(~10 million data points), but for some reason the graph won't render in plotly. To generate some sample data, I use: import numpy as np COUNT = ...
import huh's user avatar
1 vote
1 answer
58 views

i need to a particular type of substitution, infact i want to replace some blanks (" " characters) in a data frame with a random choice on the same column, given a certain condition (for ...
Iacopo's user avatar
  • 11
0 votes
1 answer
99 views

I'm trying to make a pixel art animator thing where you can make pixel art but also animate it but the problem is I want the canvas to take up most of my screen with just a little bit of space below ...
user21434395's user avatar
1 vote
0 answers
590 views

I am using Hugging face to load some pretrained models to do some testing on some data. My code looks like this: import os os.environ['CUDA_VISIBLE_DEVICES'] = '0' #Tried to mitigate out of memory ...
NinaNuska's user avatar
1 vote
2 answers
128 views

I have a large Pandas DataFrame with multiple columns, including Category, SubCategory, Value, and Date. I need to filter this DataFrame based on multiple conditions and then aggregate the filtered ...
Harold's user avatar
  • 47
2 votes
0 answers
60 views

I need to handle extremely large matrix (larger than (150K, 150k)) and use these matrices to do matrix operations (mainly matrix multiplications and calculate the inverse of matrix). This process ...
Zheng YANG's user avatar
2 votes
4 answers
148 views

I apologize if this is formatted incorrectly or if I am missing any information that would be helpful. I am attempting to run a for loop with a nested if statement for a couple of large datasets. The ...
TCB at EU's user avatar
  • 131
0 votes
0 answers
84 views

I'm encountering a memory issue when converting my processed data to a numpy array. I have 57GB of RAM, but the RAM saturates quickly and the kernel restarts at np.array(processed_X). Here is my code: ...
houfis's user avatar
  • 1
1 vote
0 answers
122 views

I'm training a huge self-supervised model, when I tried to train the complete dataset, it threw cuda oom errors, to fix that I decreased batch size and added gradiant accumulation along with eval ...
Shreyas S's user avatar
3 votes
1 answer
787 views

I'm starting to play with {fmt} and wrote a little program to see how it processes large containers. It would seem that fmt::print() (which ultimately sends output to stdout) internally first ...
Matthew Busche's user avatar
-1 votes
1 answer
50 views

[=IF(I74<=50000,"150",IF(I74<=100000,"200",IF(I74<=150000,"250",IF(I74<=200000,"300",IF(I74<=250000,"350",IF(I74<=300000,"400&...
user24664789's user avatar
2 votes
3 answers
188 views

I am trying to gain execution time with python's multiprocessing library (pool_starmap) on a code that executes the same task in parallel, on the same Pandas DataFrame, but with different call ...
Lyreck's user avatar
  • 21
0 votes
1 answer
200 views

I'm trying to convert an extremely large (over 250GB) json file into a csv; the json file looks like this: { "BuildingSiteList":[ { "ID": "00001" (34 more ...
user24506501's user avatar
0 votes
0 answers
323 views

I have two beer datasets; one is ~3 million entries, and another is 175 thousand entries. Doing a fuzzy match on these two will take way too long. I've run a few tests on the same 1000 random sample ...
nick kalra's user avatar
-3 votes
1 answer
64 views

I have a large dataset of historical power prices (151mm+). There are 18,065 individual nodes where prices settle, each with hourly observations (8760/yr). Data schema: Node ID (int64), Datetime (...
kblackburn's user avatar
2 votes
1 answer
121 views

I have a 10GB csv file data/history_{date_to_be_searched}.csv. it has more than 27000 zip codes. On the basis of zip code I have to filter the csv file then each filtered file I have to upload to ...
zircon's user avatar
  • 930
1 vote
2 answers
215 views

I am trying to speed up repeatedly rarefying a data frame and the subsequent addition of generated matrices. Some background information: The data set I want to repeatedly rarefy is very large (about ...
user avatar
0 votes
1 answer
124 views

I am trying to populate and store a NumPy array with ~1 trillion entries with data to be retrieved later. The array has ~50 dimensions with ~7 indices, i.e. it is a rank-7 tensor in 50 dimensions or ...
Geoffrey's user avatar
  • 109
0 votes
2 answers
423 views

I am trying to replace text in a large text file, 5gb. I found the script below. It outputs to a new file. powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII ...
Peter Sun's user avatar
  • 1,953
0 votes
1 answer
100 views

This question is perhaps in an uncanny valley between CrossValidated and StackOverflow, as I'm trying to understand the methodology of functions in an R package, in the context of executing them ...
purpleblade98's user avatar
0 votes
0 answers
93 views

I have come into a problem regarding how to fast and efficient read and split a list of very large transaction data files by a column called SecurityID, inside each transaction data file, there can be ...
ML33M's user avatar
  • 415
0 votes
1 answer
207 views

I'm trying to stream through a large json file using ijson in python. This is my first time trying this. my code is really simple right now: with open('file.json', 'rb') as f: j = ijson.items(f, 'item'...
Tim Vowden's user avatar
-1 votes
1 answer
225 views

I'm working on a Laravel application that requires dynamic loading of select options in the UI, potentially dealing with large datasets. The goal is to implement autocomplete functionality where ...
Rashid Ali Mughal's user avatar
-1 votes
2 answers
102 views

I have two data frames one have start Data and End Date, second data is having Just date. Basically One frame is having group and other have child data. So I want to join all the date which comes ...
Pijush's user avatar
  • 31
2 votes
1 answer
3k views

I have a very large arrow dataset (181GB, 30m rows) from the huggingface framework I've been using. I want to randomly sample with replacement 100 rows (20 times), but after looking around, I cannot ...
youtube's user avatar
  • 504
1 vote
1 answer
102 views

Tried to speed up my functions and make them more memory efficient for variable elimination algorithm on Bayesian Network but it will still crash once the dataframe gets too big. I have created a ...
user23405367's user avatar
0 votes
0 answers
458 views

I am using node js and pupeeter to generate a lot of pdf files with high resolution images on them. After that all generated pdf files i store in array. Then i merge them one by one using pdf-merger-...
Marat Tazhiev's user avatar
0 votes
1 answer
344 views

How to create a column for unique IDs replacing the old unique IDs in a large dataset, as large as around 26000 observations? I have a dataset with 26000 observations and need to create a unique ID ...
Sha D.'s user avatar
  • 9
0 votes
1 answer
498 views

I have two PHP scripts, the parent script has a loop of 1000000 and in each loop it calls another child script with the help of shell_exec(). The child script performs 10000 insertions in mysql table ...
Paarth Pratim Mout's user avatar

1
2 3 4 5
43