1,715 questions
1
vote
1
answer
98
views
Transformer model outputs degrade after ONNX export — what could be causing this?
I’ve exported a fine-tuned BERT-based QA model to ONNX for faster inference, but I’m noticing that the predictions from the ONNX model are consistently less accurate than those from the original ...
3
votes
1
answer
241
views
Why does the ProtBERT model generate identical embeddings for all non-whitespace-separated (single token?) inputs?
Why do non-identical inputs to ProtBERT generate identical embeddings when non-whitespace-separated?
I've looked at answers here etc. but they appear to be different cases where the slicing of the out....
0
votes
1
answer
68
views
Model Summary not picking up from Codebert model
I am fine tuning a CodeBert Model using my custom Dataset and Tokenizers. I tried to unfreeze the last 4 layers of the model. When checking if the layers are unfrozen it shows me which layers are not ...
0
votes
1
answer
286
views
How to solve InvalidArchiveError while installing pytorch with Anaconda?
I have installed the latest Anaconda and updated everything. When I try to install BERTopic or PyTorch itself, I am getting this error:
InvalidArchiveError("Error with archive C:\Users\myuser\...
1
vote
1
answer
129
views
Can I use a custom attention layer while still leveraging a pre-trained BERT model?
In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...
1
vote
0
answers
70
views
What is the appropriate shape and values of the tensor expected by ModernBertSequentialClassification? with Candle in Rust
I don't understand appropriate shape and values of the tensor expected by ModernBertSequentialClassification finetuning in Candle of Rust.
Is there a formula to determine the appropriate shape and ...
0
votes
3
answers
667
views
Error: A KerasTensor is symbolic: it's a placeholder for a shape an a dtype. It doesn't have any actual numerical value
I have trying to recreate this tutorial that's found on tensorflow's docs. However, I've been having an error I cannot solve and seems to be related to the literal source code of the tutorial. Also, ...
1
vote
1
answer
83
views
Training a custom tokenizer with Huggingface gives weird token splits at inference
So I trained a tokenizer from scratch using Huggingface’s tokenizers library (not AutoTokenizer.from_pretrained, but actually trained a new one). Seemed to go fine, no errors. But when I try to use it ...
0
votes
0
answers
134
views
PyTorch with Docker issues: torch.cuda.is_available() = False
I'm having an issue with PyTorch in a Docker container where torch.cuda.is_available() returns False, but the same PyTorch version works correctly outside the container.
Environment
Host: Debian 12
...
0
votes
1
answer
315
views
How can I decide how many epochs to train for when re-training a model on the full dataset without a validation set?
I have a BERT model that I want to fine-tune. Initially, I use a training dataset, which I split into a training and validation set. During fine-tuning, I monitor the validation loss to ensure that ...
2
votes
1
answer
196
views
How to Identify Similar Code Parts Using CodeBERT Embeddings?
I'm using CodeBERT to compare how similar two pieces of code are. For example:
# Code 1
def calculate_area(radius):
return 3.14 * radius * radius
# Code 2
def compute_circle_area(r):
return 3.14159 * ...
0
votes
0
answers
36
views
How to change last layer in finetuned model?
When I fine-tuned the model Hubert to detect phoneme, I chose a fine-tuned ASR Hubert model and I removed the last two layers and added a linear layer to the config vocab_size of phoneme. What is ...
0
votes
0
answers
71
views
How many obs per class are necessary? - transfer learning w. BERT fine-tuning
I seek advice on a classification problem in industry.
The rows in a dataset must be classified/labeled--it lacks a target/column (labels have dot-separated levels like 'x.x.x.x.x.x.x')--during every ...
0
votes
0
answers
45
views
How to detect out-of-vocabulary words in a prompt
I need to detect words an LLM has no knowledge about, to add RAG-based definition of said word to the prompt, i.e.:
What is the best way to achieve slubalisme using the new fabridocium product ?, ...
0
votes
1
answer
236
views
Why is my BERT model producing NaN loss during training for multi-label classification on imbalanced data?
I’m running into a frustrating issue while training a BERT-based multi-label text classification model on an imbalanced dataset. After a few epochs, the training loss suddenly becomes NaN, and I can’t ...
0
votes
1
answer
277
views
torch.OutOfMemoryError: CUDA out of memory. (Google Colab)
I tried to adapt the mBERT model to an existing code. However, I received the following issue even though I tried different solutions.
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20....
0
votes
0
answers
55
views
Is it Possible to feed Embeddings generate by BERT to a LSTM based autoencoder to get the latent space?
I've just learn about how BERT produce embeddings. I might not understand it fully.
I was thinking of doing a project of leveraging those embeddings and feed it to an autoencoder to generate latent ...
0
votes
1
answer
56
views
Is it possible to evaluate Machine Translations using Sentence BERT?
I'm not referring to BERTScore. BERTScore uses token-level word embeddings, you compute pairwise cosine similarity of word embeddings and obtain scores using greedy matching.
I'm referring to Sentence ...
0
votes
0
answers
90
views
Introducing additional layers (dropout and dense layers) after BERT's output
I'm working on a BERT-based model for fake news detection. While applying additional layers(as my models encounters not getting good accuracy for only BERT model), like dropout and fully connected ...
1
vote
1
answer
473
views
ValueError: Exception encountered when calling layer 'tf_bert_model' (type TFBertModel)
I have been trying to run TFBertModel from Transformers, but it kept on throwing me this error
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 ...
2
votes
0
answers
350
views
Why does my PyTorch DataLoader only use one CPU core despite setting num_workers>1?
I am trying to fine-tune BERT for a multi-label classification task (Jigsaw toxic comments). I created a custom dataset and DataLoader as follows:
class CustomDataSet(Dataset):
def __init__(...
1
vote
2
answers
78
views
dropout(): argument 'input' (position 1) must be Tensor, not str BERT Issue
I was trying to run some epochs to train my sentiment analysis model, at the very last passage, the epochs stopped with the error in the title. I attach the codes here:
Sentiment classifier:
# Build ...
3
votes
1
answer
66
views
How can one obtain the "correct" embedding layer in BERT?
I want to utilize BERT to assess the similarity between two pieces of text:
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
import numpy as np
tokenizer ...
0
votes
2
answers
573
views
Why do LayerNorm layers in BERT base have 768 (and not 512) weight and bias parameters?
The following will print 768 weight and bias parameters for each LayerNorm layer.
from transformers import BertModel
model = BertModel.from_pretrained('bert-base-uncased')
for name, param in model....
0
votes
0
answers
75
views
Encoder Decoder Transformer model generate a repetitive token as output in text summarization
I implemented a transformer Encoder Decoder (Bert2Bert) for text summarization task. In train phase train loss decreases but in prediction phase it generate a repetitive token as output for example [2,...
1
vote
1
answer
129
views
AdapterConfig.__init__() got an unexpected keyword argument 'mh_adapter'
I am trying adapters on LIMU-BERT, which is a lightweight BERT for IMU data. I pretrained LIMU-BERT on Dataset A and planned to add adapters and tune them on Dataset B. Here is my adapter-adding code:
...
0
votes
0
answers
64
views
Error: torch.dtype' object has no attribute 'base_dtype
I am trying to train the BERT model but I haven't figured out the structure of TensorFlow yet. In the line for x = self.bert_module(book) an error occurs.
Exception encountered when calling layer '...
1
vote
1
answer
164
views
How to use Hugging Face model with 512 max tokens on longer text (for Named Entity Recognition)
I have been using the Named Entity Recognition (NER) model https://huggingface.co/cahya/bert-base-indonesian-NER on Indonesian text as follows:
text = "..."
model_name = "cahya/bert-...
2
votes
2
answers
304
views
Normalization of token embeddings in BERT encoder blocks
Following the multi-headed attention layer in a BERT encoder block, is layer normalization done separately on the embedding of each token (i.e., one mean and variance per token embedding), or on the ...
2
votes
0
answers
29
views
Using sub-classes as anchors and classes as positives and negatives in a Siamese network with triplet loss?
I’m experimenting with a Siamese network using triplet loss to categorize sub-classes into broader classes. My setup differs from traditional triplet loss models: It involves using the sub-class as ...
2
votes
1
answer
48
views
How to convert character indices to BERT token indices
I am working with a question-answer dataset UCLNLP/adversarial_qa.
from datasets import load_dataset
ds = load_dataset("UCLNLP/adversarial_qa", "adversarialQA")
How do I map ...
2
votes
1
answer
72
views
Dutch sentiment analysis RobBERTje outputs just positive/negative labels, netural label is missing
When I run Dutch sentiment analysis RobBERTje, it outputs just positive/negative labels, netural label is missing in the data.
https://huggingface.co/DTAI-KULeuven/robbert-v2-dutch-sentiment
There are ...
1
vote
0
answers
82
views
Tensorflow serving keras bert model issue
I am trying to use tensorflow serving to serve a keras bert model, but I have problem to predict with rest api, below are informations. Can you please help me to resolve this problem.
predict output (...
-1
votes
1
answer
74
views
Why is Keras pretrained BERT MaskedLM producing inconsistent predictions?
I am trying to use keras-nlp with a pretrained masked BERT model to predict some tokens in a sequence. However the model produces inconsistent results. What could be wrong or am i misunderstanding ...
0
votes
1
answer
56
views
OutOfMemory while training pre-trained BERT model for token classification task
I am using pre-trained BertForTokenClassification for nested Named Entities Recognition task. To define nested entities, I am using multi-labels method. In the output model returns 3 lists of logits, ...
0
votes
0
answers
62
views
How to write Bangla in shap graph
I have used shap on my Bangla dataset and plotted bar graph with following code:
pred = transformers.pipeline("text-classification", model=model, tokenizer=tokenizer, device=0, ...
0
votes
1
answer
545
views
How do I freeze only some embedding indices with tied embeddings?
I found in Is it possible to freeze only certain embedding weights in the embedding layer in pytorch? a nice way to freeze only some indices of an embedding layer.
However, while including it in a ...
0
votes
2
answers
252
views
Why even with anisotropy, we can still compute embedding similarity in (some RAG) projects
I have just noticed that the token/sentence embeddings trained from Transformer-based model will have strong anisotropy problem which means most of the embeddings are close to each other in the vector ...
0
votes
1
answer
593
views
How do I add a CRF layer to a BERT model for NER tasks?
I have created an NER model using BERT to detect medical entities which works great. I'm trying to add a CRF layer on top of my BERT model to enhance its performances but I'm getting an error that I ...
0
votes
1
answer
123
views
SBERT Fine-tuning always stops before finish all epochs
I'm working on a project using the SBERT pre-trained models (specifically MiniLM) for a text classification project with 995 classifications. I am following the steps laid out here for the most part ...
0
votes
0
answers
418
views
Number of parameters in BERT model
Suppose you are pretraining a BERT model with 8 layers, 768-dim hidden states, 8 attention heads, and a sub-word vocabulary of size 40k. Also, your feed-forward hidden layer is of dimension 3072. What ...
-1
votes
1
answer
191
views
Capitalized words in sentiment analysis
I'm currently working with data of customers reviews on products from Sephora. my task to classify them to sentiments : negative, neutral , positive .
A common technique of text preprocessing is to ...
0
votes
1
answer
257
views
Clustering for SBERT embedding
I have a set of sentences which I have transformed into vectors using SBERT embedding. I would like to cluster these vectors.
When looking for informations online, I keep seeing post telling to do ...
-1
votes
2
answers
1k
views
fbgemm.dll module could not be found when using Pytorch/transformers
I currently am trying to use the BERT language model for invoice creation. However, i am receiving the error:
OSError: [WinError 126] The specified module could not be found. Error loading "C:\...
0
votes
2
answers
3k
views
The python module sentence-transformers is not found, even though the package is installed
For my python script (as seen below), I use the package sentence-transformer, which contains SBERT models. Even though this package is clearly listed as installed when executing "pip list" ...
-2
votes
1
answer
56
views
Hybridized collaborative filtering and sentence similarity-based system for doctor recommendation based on user input of symptoms and location
I'm trying to solve a problem of recommending a doctor based on a user's symptoms and location using a hybridized collaborative filtering and sentence similarity-based recommender system that follow ...
1
vote
2
answers
75
views
Identify starting row of actual data in Pandas DataFrame with merged header cells
My original df looks like this -
df
Note in the data frame:
The headers are there till row 3 & from row 4 onwards, the values for those headers are starting.
The numbers of rows & columns ...
0
votes
0
answers
78
views
BERT embedding cosine similarities look very random and useless
I thought you can use BERT embeddings to determine semantic similarity. I was trying to group some words in categories using this, but the results were very bad.
E.g. here is a small example with ...
1
vote
0
answers
139
views
Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?
I am training a BERT model using pytorch and HuggingFace's BertModel. The sequences of tokens can vary in length from 1 (just a CLS token) to 128. The model trains fine when using absolute position ...
0
votes
1
answer
411
views
Exporting a Bert-based PyTorch model to CoreML. How can I make the CoreML model work for any input?
I use the code below to export a Bert-based PyTorch model to CoreML.
Since I used
dummy_input = tokenizer("A French fan", return_tensors="pt")
the CoreML model only works with ...