Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
84 views

My current data processing flow looks like this Load CSV Pivot Data Filter the original data based on some results from previous step Repeat several times I have this working on several CSV files in ...
BBloggsbott's user avatar
0 votes
1 answer
39 views

I have data in an array-style string format that I need to parse and insert into a DolphinDB table, with each nested array element becoming a separate record. Here's an example of the data: '[["...
Stella.W's user avatar
0 votes
1 answer
30 views

I have a dataset with two columns: sID and sum_count. Now, I need to divide the sID into 5 groups with the requirement that: The sum of the sum_count column in each group should be as equal as ...
Huang WeiFeng's user avatar
1 vote
1 answer
91 views

I am using the following script to split names with a Google Sheet that is receiving submissions from a Squarespace RSVP form. function split() { const DELIMITER = " "; var ss = ...
Christy Perez's user avatar
0 votes
1 answer
56 views

I have a Python script that processes person data and appends the results to an Azure Blob Storage CSV file. However, the issue is that for each new patient the generated csv is appending to the ...
krishna sai's user avatar
3 votes
2 answers
94 views

Each row of my data looks something like this: 8,0 0 1 0.000000000 8082 A WS 24664872 + 8 <- (8,2) 23604576 I'd like to split the data into columns like this: col1 col2 col3 ...
kai's user avatar
  • 31
0 votes
2 answers
88 views

Seeking help on how to extract data from rows of data similar to this Raw Data and convert data placement to this Process Data im having problem to extract "Process X" and populate the ...
noobita's user avatar
0 votes
1 answer
236 views

I'm trying to save the scaling parameters of a dataset into a .npy file on the disk, so I avoid having to recalculate them every time I re-run the code. For now, I'm using MaxAbsScaler() from sklearn ...
geani's user avatar
  • 11
0 votes
1 answer
68 views

My data source emits IOT data with the following structure - io_id,value,timestamp 232,1223,1718191205 321,671,1718191254 54,2313,1718191275 232,432,1718191315 321,983,1718191394 ........ There are 2 ...
GrozaFry's user avatar
0 votes
1 answer
160 views

I encountered an issue while trying to store JSON data as a Delta Lake table using PySpark and Delta Lake. Here's my code: from pyspark.sql import SparkSession from pyspark.sql.types import StructType,...
NO2 SIIZEXL's user avatar
-4 votes
1 answer
64 views

Sorry for the title, I know it might be pretty wide and not so much informative. I am facing a problem regarding the analysis of a data set. The participants of my experiments were randomly assigned ...
taboulet's user avatar
1 vote
1 answer
168 views

I have coded a VBA macro to process downloaded data. The data has some junk rows that need to be deleted, but also has some rows where the data is off by a couple of columns and a couple of rows. The ...
czw's user avatar
  • 872
1 vote
0 answers
59 views

ive been working on a sign language recognition. i extracted landmarks with mediapipe, saved it as .parquets then padded the data to create uniform length. each row of landmark has 21 node with x,y,z ...
karesosis's user avatar
0 votes
0 answers
206 views

This is an opinion based question: My use case is real-time and needs to be able to process everything in a sub-second speed. I have an external mongo DB which holds information about all of the users ...
Or Keren's user avatar
  • 138
0 votes
1 answer
77 views

I having a two services producer and consumer. Producer has big json file in the server. I want to serve over the network through rest api and I used nodejs stream technique to load bytes in memory ...
Faizul Ahemed's user avatar
0 votes
1 answer
56 views

Need a macro that will help me process data where they add an X to mark which group the row belongs to. For example: The data comes with many more columns but that's just the gist of it. They mark ...
user16201107's user avatar
-1 votes
1 answer
47 views

I wrote the code ggplot(data = summary_datas) + geom_bar(mapping = aes(x=member_casual,fill=member_casual)) + labs(title = "Rider Membership data", subtitle= "Difference in the ...
Shasha's user avatar
  • 1
1 vote
1 answer
675 views

I have a LabVIEW program which contains voltage, current and power data into the same waveform. I am planning to extract each of them one by one and putting into array. Currently, I have extracted ...
Nh K's user avatar
  • 13
1 vote
1 answer
88 views

I made a book reading page with the Page_Flip widget, but when the user leaves the application and re-enters the book page, I want it to continue from where it left off. How can I keep the user's ...
Muhammed Halil Demirci's user avatar
1 vote
1 answer
131 views

im using flink 1.81.1 api on java 11 and im trying to use a BroadcastProcessFunction to filter a Products Datastream with a brand autorized Datastream as broadcast. So my first products Datastream ...
Nabil Hadji's user avatar
-1 votes
1 answer
63 views

I have a bunch (100+) CSV files. Each of them can have blank rows, or rows I don't need (Some fuzz info like "Congrats, you all bla bla"). When reading in Pandas I need to specify which row ...
Yewgen_Dom's user avatar
0 votes
1 answer
129 views

I'm using Python to read a file of 5,000,000 rows but currently it only reads 1,000,000 rows. The file is around 125mb. I'm using the pd.read_csv function but this only leads to reading 1,000,000 rows ...
Nanhe Zou's user avatar
1 vote
1 answer
168 views

I using a Vlookup function to bring over data from 4 different files into one sheet. I want to place the vlookup results from the 1st file in Column 4, then the results of the 2nd file in Column 6 ...
Eriknme's user avatar
  • 37
-1 votes
2 answers
102 views

I have two data frames one have start Data and End Date, second data is having Just date. Basically One frame is having group and other have child data. So I want to join all the date which comes ...
Pijush's user avatar
  • 31
0 votes
0 answers
34 views

I'm working on a project which stores data of a tree-structured models like file systems and so on. And in many cases the tree has large number of leaves in it and have unknown depth. My project is ...
pooriya's user avatar
1 vote
1 answer
64 views

I am looking for a optimal way to perform simple data processing from Django Queryset. I would like to not need to install libraries with high volumes like Pandas or numpy. The number of rows in ...
Jacek's user avatar
  • 73
0 votes
1 answer
919 views

I am working on a Python script to process large CSV files (ranging from 2GB to 10GB) and am encountering significant memory usage issues. The script reads a CSV file, performs various transformations ...
Shahnoor's user avatar
-1 votes
1 answer
124 views

I have a .rpt file that has two columns, like this: A column B column 990.E-03 -2.73654E-03 995.E-03 -2.75347E-03 1. ...
hz z's user avatar
  • 1
0 votes
1 answer
58 views

enter image description herehow to convert this excel to data processing using pandas import pandas as pd df = pd.read_excel(r"c:/Users/vpullabh/Desktop/Meraci.Ec-NGIOSD.xlsx", sheet_name=&...
Vaishnavi Pullabhatla's user avatar
4 votes
1 answer
295 views

My question: Need to understand the time complexity of dynamic forward filling and back filling in spark Hello, I have a scala job that reads Delta Table A, transforms Data Frame and writes to Delta ...
Yun Xing's user avatar
-1 votes
1 answer
2k views

I'm working on integrating OpenAI functionalities, specifically GPT3.5 and embeddings, into a large system of Excel workbooks used for almost anything in my office. Our goal is having GPT3.5 taking ...
Pakoco's user avatar
  • 41
1 vote
0 answers
89 views

I'm dealing with a large file with each row with CHR and POS values (which are positional coordinates). I process this file using a tool, but it outputs only a subset of these positional coordinates ...
binf-er's user avatar
  • 11
1 vote
1 answer
661 views

FAILED NOTE When I set up a Dataflow Pipeline and created a Job from template ('Text Files on Cloud Storage to BigQuery'), I meet this problem. Job creation failed: The workflow could not be created. ...
MING's user avatar
  • 11
1 vote
0 answers
344 views

the findContours() function from the OpenCV library does not allow you to customize the selection of contours based on 4-connectivity. I checked on a test image: all the modes of this function that ...
Walrus's user avatar
  • 23
1 vote
0 answers
75 views

I have a pretty straight forward JSON object that I am trying to parse into a list of objects for downstream processing and use. The JSON structure is dynamic but here is an example of the structure I ...
James Peruggia's user avatar
1 vote
2 answers
101 views

I have a python dictionary as follows: ip_dict = {'GLArch': {'GLArch-0.png': ['OTHER', 'Figure 28 TAC '], 'GLArch-1.png': ['DCDFP', 'This insurance '], '...
spectre's user avatar
  • 787
0 votes
1 answer
168 views

I have read through the idea "Behavior Tag Time Series" several times but couldn't understand Here is the explanation in the book, but still not make sense: "Almost all text in a data ...
cloudscomputes's user avatar
0 votes
2 answers
49 views

I need help with this task: Print data for locations that have two threes in the address.zip code. I tried: filtered_data <- df %>% filter(grepl("\\d{3}.*\\d{3}", address.zip)) ...
Rokas's user avatar
  • 13
-2 votes
1 answer
239 views

I am attempting to predict a binary outcome based on 15 continuous sequences (except one which isn't a continuous line, but still a sequence). The dataset contains 933k datapoints for all 15 features ...
Didlex's user avatar
  • 1
0 votes
0 answers
43 views

My valid data(Records.txt) keeps outputting onto the wrong case statment. Records.txt: AB12MP349 Fusion5 20 17000.00 33435KMOP324 BMW 40 25000.00 AB12MP349 Audi 100 4000.00 AB12MP349 Pagni 1 2000000....
Zximy's user avatar
  • 1
2 votes
3 answers
68 views

I have a python dictionary as given below: ip = { "doc1.pdf": { "img1.png": ("FP", "text1"), "img2.png": ("NP", "...
lowkey's user avatar
  • 140
1 vote
1 answer
56 views

I am trying to reproduce a paper that uses the tf-idf method. During the data preprocessing, there is a step that involves feature scaling. In the original paper, it says, "We restrict the words ...
yi zhu's user avatar
  • 11
1 vote
2 answers
836 views

I am facing an issue where the expected date parition folder should be named in format date=yyyymmdd, but instead writing as - Sometimes for each parquet file created in delta path, it's creating a ...
Arindam Bhattacharjee's user avatar
0 votes
0 answers
318 views

I am working on Data processing in which I have EKS cluster in one account and doing processing in second aws account , so we are assuming IAM role from One account to another and performing ...
Rutik Lohade's user avatar
0 votes
1 answer
62 views

I have observations that are formed using Run Length Encoding transform as Example set.seed(1) make_data <- function() { series <- rnorm(sample(10:50,1)) |> cumsum() |> sign() ...
mr.T's user avatar
  • 634
1 vote
1 answer
31 views

I have data as objects like this set.seed(1) make_rle <- function() rnorm(10) |> cumsum() |> sign() |> accelerometry::rle2(indices = T) X <- lapply(1:10, \(x) make_rle()) X [[1]] ...
mr.T's user avatar
  • 634
1 vote
3 answers
75 views

I'm trying to find an efficient way to remove rows of numpy array that contains duplicated elements. For example, the array below: [[1,2,3], [1,2,2], [2,2,2]] should keep [[1,2,3]] only. I know pandas ...
Xe-'s user avatar
  • 25
-1 votes
2 answers
838 views

I am working on a Java project where I need to handle CSV files. Specifically, I need to read and write CSV files and process the data into arrays for further manipulation. I have researched different ...
DamianBautista's user avatar
0 votes
0 answers
107 views

Some preface, I have been teaching myself python for the past few days for a project, with almost no history of coding beyond some dabbling with MATLAB, so I apologize if there is something very ...
Luke M's user avatar
  • 13
1 vote
1 answer
96 views

The results from a data processor in the fluid template are cached. My data processor determines a list of images and a maximum time until which the list can be cached. How do I forward this ...
Dr. Dieter Porth's user avatar

1
2 3 4 5
19