Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
26 views

I have been set a task by my manager to try and predict insurance premiums based on some categories such as job description, number of people employed and turnover. I am comparing between K-Nearest ...
Red_bull's user avatar
0 votes
0 answers
27 views

I am trying to use MongoDB Kafka official connector to create topics automatically while creating connector using sql command CREATE SOURCE CONNECTOR logistics_n WITH ( 'connector.class' = 'com....
Roll no1's user avatar
  • 1,423
0 votes
1 answer
58 views

Im working with Stackoverflow 2024 survey. In the csv file there are several multivalued variables (separated by ;). I want to apply One-hot encoding to the variables Employment and LanguageAdmire by ...
Lev's user avatar
  • 843
0 votes
0 answers
21 views

I am using IterativeImputer from sklearn.impute to fill missing values in my dataset. One of my columns, Education_Level, is a categorical feature, so I first applied LabelEncoder to convert it into ...
Mahdi Mashayekhi's user avatar
0 votes
0 answers
18 views

I am working on a binary classification task using an audio dataset, which is already divided into training and testing sets. However, I also need a validation set, so I split the training set into ...
GauravGiri's user avatar
0 votes
2 answers
54 views

I'm trying to combine several (>2) dataframes with the same rows and different columns in R. For example, I have 4 dataframes: df1 <- data.frame( x = c("A1", "A2", "A3&...
user avatar
0 votes
1 answer
49 views

I'm currently using MinMaxScaler() on my dataset. However, because my dataset is large I'm doing a first iteration pass in batches to compute the Min and Max Values for my Scaler. i'm using ...
Saffy's user avatar
  • 13
0 votes
0 answers
69 views

I am working on a deep learning project to forecast Sudden Cardiac Death (SCD) using ECG data from PhysioNet. Specifically, I need to download and preprocess the following databases: MIT-BIH Normal ...
lipano marte's user avatar
0 votes
0 answers
18 views

I'm working on a padas DataFrame that contains columns with lists and currently trying the method explode, but I'm not getting the desired output, instead, it does a Cartesian Product, combining all ...
buzzo's user avatar
  • 1
0 votes
0 answers
54 views

I am working on a project where I need to preprocess multiple motion capture files stored in .npy format. I am able to load and preprocess individual files, but I am facing difficulties when trying to ...
Mathletes Choreo's user avatar
2 votes
0 answers
66 views

I am fine-tuning sam model for my dataset containing train_images and train_masks. I am able to create dict, but when calling last command i.e. to load dataset from dict, kernel dies. It happened ...
Sanju 's user avatar
  • 21
0 votes
1 answer
539 views

I want to apply log() to my DataFrame and MinMaxScaler() together. I want the output to be a pandas DataFrame() with indexes and columns from the original data. I want to use the parameters used to ...
Guilherme Parreira's user avatar
0 votes
1 answer
70 views

I want to train a simple neural network, which has embedding_dim as a parameter: class BoolQNN(nn.Module): def __init__(self, embedding_dim): super(BoolQNN, self).__init__() self....
samuel gast's user avatar
0 votes
0 answers
72 views

I am using shell in Jupyter with Python programming Language. When I use to prepare a dataset, I fail to complete it on sorting by column and case sensitive. The line is like this: !head -n 5 $...
md Almus Fuad's user avatar
-1 votes
1 answer
191 views

I'm currently working with data of customers reviews on products from Sephora. my task to classify them to sentiments : negative, neutral , positive . A common technique of text preprocessing is to ...
read data's user avatar
2 votes
1 answer
74 views

I'm trying to clean up an ASCII dataset with inconsistent spacing (ex. dataset = \[1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 1 1 1\]) but so far what I've ...
Daedalus's user avatar
1 vote
0 answers
27 views

When fitting the model in google collab there doesnt seem to be any problem. However, when I try to create an interface using streamlit and pickle, Target encoder doesnt work and I am unable to solve ...
user25546188's user avatar
0 votes
1 answer
81 views

Why am I encountering the ModuleNotFoundError for the datachain.lib module? Are there any additional steps I need to take to properly use the datachain package in my project? I'm working on a Python ...
Rashid mehmood's user avatar
0 votes
0 answers
56 views

I have to preprocess a feature which is basically a list of number codes enocoded as a string, and I want to encode it such that the output is an array of frequencies of each of these numbers. The ...
AKHIL GOPIKUMAR's user avatar
0 votes
1 answer
81 views

I have a dataset annotated by three people, so now I have three files. This dataset is about tweets annotation. How can I combine this dataset into one file for further processing. The data set is an ...
ZAIN UL ABIDIN QADRI's user avatar
1 vote
2 answers
707 views

I am trying to build a custom sigmoid-shaped function because I want to scale my data during preprocessing. Basically, the goal is to obtain a sigmoid shaped function that outputs from 0 to 1 and only ...
cercio's user avatar
  • 89
1 vote
1 answer
775 views

I am using feature-column dataset in my code, in newer version of TensorFlow 2.16.1 and later there is no keras.layers.DenseFeatures class in order to ready the input layer for the DNN. what is the ...
shahramy's user avatar
1 vote
0 answers
90 views

I have the following input: data = { 'Group_A': ['0&1', '1&5', '0&5', '1&7', '3&8', '4&8', '3&5', '4&4'], 'Group_B': ['1&0', '5&7', '0&5'...
deepcurious's user avatar
1 vote
3 answers
124 views

I have a table of observations, or rather 'grouped' observations, where each group represents a deal, and each row representing a product. But the prediction is to be done at a Deal level. Below is ...
Salih's user avatar
  • 399
0 votes
1 answer
868 views

I am working on a project involving Step Functions with SageMaker. I have an existing Step Function that I need to integrate SageMaker into, and I tried adding steps such as processing, model training,...
Gwenda Thomas's user avatar
-2 votes
1 answer
33 views

I have a problem while making a predictive model, so I'm leaving a question. I'm trying to create a predictive model using machine learning methodologies such as random forest, xgboost, etc. At this ...
최성렬's user avatar
-4 votes
1 answer
64 views

Sorry for the title, I know it might be pretty wide and not so much informative. I am facing a problem regarding the analysis of a data set. The participants of my experiments were randomly assigned ...
taboulet's user avatar
0 votes
0 answers
95 views

I have a large csv file with about 7000 rows (files) with text entries consisting of the following columns in bold: filename title text author year 0 latin_xmls\10.xml De facto Ungarie ...
Phil's user avatar
  • 1
0 votes
1 answer
382 views

Trying to filter out rows in which the data of specific column start with a given substring. I have a pandas.DataFrame as shown below (simplified): price DRUG_CODE 123 A12D958 234 B564F3C ... ... I'm ...
Warren Chen's user avatar
0 votes
0 answers
28 views

class TestFilterDF(unittest.TestCase): @patch('plugins.qa_plugins.preprocessing.read_df') def test_filter_df(self, read_df_mock): # Mocking read_df function to return a DataFrame ...
Uplabdhi Khare's user avatar
0 votes
1 answer
35 views

from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder from sklearn.pipeline import Pipeline from sklearn.model_selection import ...
s213439's user avatar
0 votes
1 answer
164 views

I am training a model for crop yield prediction having a self-constructed dataset of 6 features and 2000 records. However, the dataset is biased and I am not getting accurate results. I have tried ...
Muhammad Bilal's user avatar
0 votes
0 answers
41 views

Trying to run an LSTM model where the data is separated into few columns in csv and i'm trying to prepare date from such csv's. Getting the error of ValueError: Failed to convert a NumPy array to a ...
Athul Srinivas's user avatar
0 votes
1 answer
49 views

I'm using Python and I have a dataset containing NaN values. To clean up these data, I replaced the NaN values with the mean or median of each column using the fillna() function from pandas. However, ...
AI enthousiast's user avatar
1 vote
1 answer
2k views

I've uploaded a dataset on kaggle(approx. 73GB), and I'm trying to preprocess this data for model training purposes. This dataset has a large no. of missing values, which I am trying to interpolate ...
54m4gr4's user avatar
  • 13
0 votes
0 answers
142 views

I'm trying to make an ANN in Python to predict something from a dataset (in this case diabetes), and I'm struggling to figure out how to solve this error. Here is the full code: import pandas as pd ...
nyura45's user avatar
0 votes
1 answer
638 views

I'm trying to make a simple lstm neural network. I've got time series data which I am splitting into sequences and batches using Pytorch's Dataset and DataLoader. To account for the variable lengths ...
D Danne's user avatar
  • 17
0 votes
0 answers
88 views

There is known method how to create dataset: CODE snippet was borrowed from: https://www.tensorflow.org/tutorials/audio/simple_audio #Gather data from files ''' .....some code I see no need to paste, ...
Hell576's user avatar
0 votes
0 answers
61 views

I'm new with python so I'm sorry if this is a basic one. However, after I ran the code, I got this: TypeError: cannot do positional indexing on RangeIndex with these indexers [ Year Average of PM ...
Sofia's user avatar
  • 1
0 votes
1 answer
66 views

I'm working on a project where I have a set of longtail data that I want to transform into a Gaussian distribution. I'm looking to achieve something similar to scikit-learn's PowerTransformer, but ...
umut's user avatar
  • 1
0 votes
1 answer
107 views

I have 31 features to be input into an ML algorithm. Of these 22 feature values are in the range of 0 to 1 already. The remaining 9 features vary between 0 to 750. My doubt is if I choose to apply ...
rekha's user avatar
  • 7
0 votes
1 answer
35 views

I scraped reviews from a web and there are pros and cons separate from each other. I scraped them as a list because it looks like as the best solution for not having the same review with user, date ...
averzeo's user avatar
1 vote
1 answer
38 views

I'm performing data analysis on a dataset with categorical labels are interrelated. My labels track experimental conditions. In my case, labels track concentrations of combinations of two chemicals ...
WoolyThomas's user avatar
12 votes
2 answers
81 views

I defined student_sub_set dataframe as below: # select the subset of characteristics for the regression student_sub_set = student[['acad_lang_home', 'absent_freq','tired_freq','sex', ...
Narges Ghanbari's user avatar
0 votes
0 answers
100 views

from sklearn.preprocessing import MinMaxScaler values = df[['Close']] #values is floats ranging from 0.06 to 190.08 sc = MinMaxScaler() scaled_values = sc.fit_transform(values) descaled_values = sc....
haintaki's user avatar
0 votes
1 answer
71 views

I have a data file that has a geometrical combination as the heading and the following related data generated from the software. The data file has the following structure. The data file start from ...
Mad0731's user avatar
0 votes
0 answers
64 views

There are 13000 values approximately for a given column. The below function works in a way that the input is a list of strings and does the NER tagging for each word in the list. On an average there ...
srinivas muralidharan's user avatar
1 vote
0 answers
878 views

I've been exploring frameworks to integrate large language models (LLMs) into my applications, specifically focusing on data preprocessing, ingestion, and query capabilities. I've come across both ...
Arrmlet's user avatar
  • 121
0 votes
0 answers
93 views

I am now trying to run preprocessing tasks of DLRM with Apache Beam https://github.com/tensorflow/models/tree/master/official/recommendation/ranking/preprocessing. The dataset is Criteo Kaggle 10GB ...
Eric's user avatar
  • 1
2 votes
1 answer
8k views

I have downloaded the Open Images dataset to train a YOLO (You Only Look Once) model for a computer vision project. However, I am facing some challenges and I am seeking guidance on how to proceed. ...
Ameer Hamzah's user avatar

1
2 3 4 5
10