Newest 'bert-language-model' Questions

1 vote

1 answer

98 views

Transformer model outputs degrade after ONNX export — what could be causing this?

I’ve exported a fine-tuned BERT-based QA model to ONNX for faster inference, but I’m noticing that the predictions from the ONNX model are consistently less accurate than those from the original ...

vinoth

41

asked Oct 10 at 21:56

3 votes

1 answer

241 views

Why does the ProtBERT model generate identical embeddings for all non-whitespace-separated (single token?) inputs?

Why do non-identical inputs to ProtBERT generate identical embeddings when non-whitespace-separated? I've looked at answers here etc. but they appear to be different cases where the slicing of the out....

Maximilian Press

397

asked Jul 31 at 17:17

0 votes

1 answer

68 views

Model Summary not picking up from Codebert model

I am fine tuning a CodeBert Model using my custom Dataset and Tokenizers. I tried to unfreeze the last 4 layers of the model. When checking if the layers are unfrozen it shows me which layers are not ...

Ayush V Jain

128

asked Jul 27 at 20:01

0 votes

1 answer

286 views

How to solve InvalidArchiveError while installing pytorch with Anaconda?

I have installed the latest Anaconda and updated everything. When I try to install BERTopic or PyTorch itself, I am getting this error: InvalidArchiveError("Error with archive C:\Users\myuser\...

Paul Nguyễn

11

asked Jul 17 at 7:32

1 vote

1 answer

129 views

Can I use a custom attention layer while still leveraging a pre-trained BERT model?

In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...

Blockchain Kid

335

asked Jul 6 at 11:47

1 vote

0 answers

70 views

What is the appropriate shape and values of the tensor expected by ModernBertSequentialClassification? with Candle in Rust

I don't understand appropriate shape and values of the tensor expected by ModernBertSequentialClassification finetuning in Candle of Rust. Is there a formula to determine the appropriate shape and ...

whitebox3

11

asked Jun 9 at 8:09

0 votes

3 answers

667 views

Error: A KerasTensor is symbolic: it's a placeholder for a shape an a dtype. It doesn't have any actual numerical value

I have trying to recreate this tutorial that's found on tensorflow's docs. However, I've been having an error I cannot solve and seems to be related to the literal source code of the tutorial. Also, ...

franjefriten

52

asked May 10 at 20:22

1 vote

1 answer

83 views

Training a custom tokenizer with Huggingface gives weird token splits at inference

So I trained a tokenizer from scratch using Huggingface’s tokenizers library (not AutoTokenizer.from_pretrained, but actually trained a new one). Seemed to go fine, no errors. But when I try to use it ...

RobGG3938

13

asked May 2 at 13:43

0 votes

0 answers

134 views

PyTorch with Docker issues: torch.cuda.is_available() = False

I'm having an issue with PyTorch in a Docker container where torch.cuda.is_available() returns False, but the same PyTorch version works correctly outside the container. Environment Host: Debian 12 ...

Antonio

1

asked Apr 29 at 8:03

0 votes

1 answer

315 views

How can I decide how many epochs to train for when re-training a model on the full dataset without a validation set?

I have a BERT model that I want to fine-tune. Initially, I use a training dataset, which I split into a training and validation set. During fine-tuning, I monitor the validation loss to ensure that ...

Rishi Garg

1

asked Mar 31 at 15:40

2 votes

1 answer

196 views

How to Identify Similar Code Parts Using CodeBERT Embeddings?

I'm using CodeBERT to compare how similar two pieces of code are. For example: # Code 1 def calculate_area(radius): return 3.14 * radius * radius # Code 2 def compute_circle_area(r): return 3.14159 * ...

Nep

21

asked Mar 20 at 14:30

0 votes

0 answers

36 views

How to change last layer in finetuned model?

When I fine-tuned the model Hubert to detect phoneme, I chose a fine-tuned ASR Hubert model and I removed the last two layers and added a linear layer to the config vocab_size of phoneme. What is ...

Ngoc Anh

1

asked Feb 24 at 8:47

0 votes

0 answers

71 views

How many obs per class are necessary? - transfer learning w. BERT fine-tuning

I seek advice on a classification problem in industry. The rows in a dataset must be classified/labeled--it lacks a target/column (labels have dot-separated levels like 'x.x.x.x.x.x.x')--during every ...

Johan

226

asked Feb 13 at 17:54

0 votes

0 answers

45 views

How to detect out-of-vocabulary words in a prompt

I need to detect words an LLM has no knowledge about, to add RAG-based definition of said word to the prompt, i.e.: What is the best way to achieve slubalisme using the new fabridocium product ?, ...

aguadoe

168

asked Jan 29 at 18:15

0 votes

1 answer

236 views

Why is my BERT model producing NaN loss during training for multi-label classification on imbalanced data?

I’m running into a frustrating issue while training a BERT-based multi-label text classification model on an imbalanced dataset. After a few epochs, the training loss suddenly becomes NaN, and I can’t ...

Erhan Arslan

36

asked Jan 28 at 13:03

0 votes

1 answer

277 views

torch.OutOfMemoryError: CUDA out of memory. (Google Colab)

I tried to adapt the mBERT model to an existing code. However, I received the following issue even though I tried different solutions. torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20....

MarMarhoun

55

asked Jan 24 at 23:41

0 votes

0 answers

55 views

Is it Possible to feed Embeddings generate by BERT to a LSTM based autoencoder to get the latent space?

I've just learn about how BERT produce embeddings. I might not understand it fully. I was thinking of doing a project of leveraging those embeddings and feed it to an autoencoder to generate latent ...

Nik Imran

1

asked Jan 24 at 3:28

0 votes

1 answer

56 views

Is it possible to evaluate Machine Translations using Sentence BERT?

I'm not referring to BERTScore. BERTScore uses token-level word embeddings, you compute pairwise cosine similarity of word embeddings and obtain scores using greedy matching. I'm referring to Sentence ...

Yuirike

41

asked Jan 23 at 13:18

0 votes

0 answers

90 views

Introducing additional layers (dropout and dense layers) after BERT's output

I'm working on a BERT-based model for fake news detection. While applying additional layers(as my models encounters not getting good accuracy for only BERT model), like dropout and fully connected ...

Abrar Hussain

25

asked Jan 10 at 10:34

1 vote

1 answer

473 views

ValueError: Exception encountered when calling layer 'tf_bert_model' (type TFBertModel)

I have been trying to run TFBertModel from Transformers, but it kept on throwing me this error ValueError Traceback (most recent call last) Cell In[9], line 1 ----> 1 ...

Faiz khan

13

asked Dec 26, 2024 at 15:53

2 votes

0 answers

350 views

Why does my PyTorch DataLoader only use one CPU core despite setting num_workers>1?

I am trying to fine-tune BERT for a multi-label classification task (Jigsaw toxic comments). I created a custom dataset and DataLoader as follows: class CustomDataSet(Dataset): def __init__(...

Hyppolite

67

asked Dec 26, 2024 at 13:21

1 vote

2 answers

78 views

dropout(): argument 'input' (position 1) must be Tensor, not str BERT Issue

I was trying to run some epochs to train my sentiment analysis model, at the very last passage, the epochs stopped with the error in the title. I attach the codes here: Sentiment classifier: # Build ...

Laura Valentini

11

asked Dec 7, 2024 at 9:47

3 votes

1 answer

66 views

How can one obtain the "correct" embedding layer in BERT?

I want to utilize BERT to assess the similarity between two pieces of text: from transformers import AutoTokenizer, AutoModel import torch import torch.nn.functional as F import numpy as np tokenizer ...

Beitian Ma

33

asked Dec 4, 2024 at 3:15

0 votes

2 answers

573 views

Why do LayerNorm layers in BERT base have 768 (and not 512) weight and bias parameters?

The following will print 768 weight and bias parameters for each LayerNorm layer. from transformers import BertModel model = BertModel.from_pretrained('bert-base-uncased') for name, param in model....

Fijoy Vadakkumpadan

678

asked Nov 27, 2024 at 21:48

0 votes

0 answers

75 views

Encoder Decoder Transformer model generate a repetitive token as output in text summarization

I implemented a transformer Encoder Decoder (Bert2Bert) for text summarization task. In train phase train loss decreases but in prediction phase it generate a repetitive token as output for example [2,...

rasoul mohammadi

1

asked Nov 25, 2024 at 17:39

1 vote

1 answer

129 views

AdapterConfig.init() got an unexpected keyword argument 'mh_adapter'

I am trying adapters on LIMU-BERT, which is a lightweight BERT for IMU data. I pretrained LIMU-BERT on Dataset A and planned to add adapters and tune them on Dataset B. Here is my adapter-adding code: ...

555wen

178

asked Nov 23, 2024 at 7:27

0 votes

0 answers

64 views

Error: torch.dtype' object has no attribute 'base_dtype

I am trying to train the BERT model but I haven't figured out the structure of TensorFlow yet. In the line for x = self.bert_module(book) an error occurs. Exception encountered when calling layer '...

AzureStrannik

1

asked Nov 21, 2024 at 4:04

1 vote

1 answer

164 views

How to use Hugging Face model with 512 max tokens on longer text (for Named Entity Recognition)

I have been using the Named Entity Recognition (NER) model https://huggingface.co/cahya/bert-base-indonesian-NER on Indonesian text as follows: text = "..." model_name = "cahya/bert-...

Mauro Escudero

11

asked Nov 20, 2024 at 15:03

2 votes

2 answers

304 views

Normalization of token embeddings in BERT encoder blocks

Following the multi-headed attention layer in a BERT encoder block, is layer normalization done separately on the embedding of each token (i.e., one mean and variance per token embedding), or on the ...

Fijoy Vadakkumpadan

678

asked Nov 11, 2024 at 14:30

2 votes

0 answers

29 views

Using sub-classes as anchors and classes as positives and negatives in a Siamese network with triplet loss?

I’m experimenting with a Siamese network using triplet loss to categorize sub-classes into broader classes. My setup differs from traditional triplet loss models: It involves using the sub-class as ...

LawNLP_9808

21

asked Nov 11, 2024 at 5:57

2 votes

1 answer

48 views

How to convert character indices to BERT token indices

I am working with a question-answer dataset UCLNLP/adversarial_qa. from datasets import load_dataset ds = load_dataset("UCLNLP/adversarial_qa", "adversarialQA") How do I map ...

Jack Peng

642

asked Nov 9, 2024 at 15:15

2 votes

1 answer

72 views

Dutch sentiment analysis RobBERTje outputs just positive/negative labels, netural label is missing

When I run Dutch sentiment analysis RobBERTje, it outputs just positive/negative labels, netural label is missing in the data. https://huggingface.co/DTAI-KULeuven/robbert-v2-dutch-sentiment There are ...

pjercic

473

asked Nov 4, 2024 at 11:36

1 vote

0 answers

82 views

Tensorflow serving keras bert model issue

I am trying to use tensorflow serving to serve a keras bert model, but I have problem to predict with rest api, below are informations. Can you please help me to resolve this problem. predict output (...

cceasy

303

asked Nov 4, 2024 at 8:50

-1 votes

1 answer

74 views

Why is Keras pretrained BERT MaskedLM producing inconsistent predictions?

I am trying to use keras-nlp with a pretrained masked BERT model to predict some tokens in a sequence. However the model produces inconsistent results. What could be wrong or am i misunderstanding ...

user3085693

65

asked Oct 29, 2024 at 10:36

0 votes

1 answer

56 views

OutOfMemory while training pre-trained BERT model for token classification task

I am using pre-trained BertForTokenClassification for nested Named Entities Recognition task. To define nested entities, I am using multi-labels method. In the output model returns 3 lists of logits, ...

Alexandr Duck

1

asked Oct 25, 2024 at 17:45

0 votes

0 answers

62 views

How to write Bangla in shap graph

I have used shap on my Bangla dataset and plotted bar graph with following code: pred = transformers.pipeline("text-classification", model=model, tokenizer=tokenizer, device=0, ...

Tanjim Taharat Aurpa

13

asked Oct 9, 2024 at 15:59

0 votes

1 answer

545 views

How do I freeze only some embedding indices with tied embeddings?

I found in Is it possible to freeze only certain embedding weights in the embedding layer in pytorch? a nice way to freeze only some indices of an embedding layer. However, while including it in a ...

Mirco Ramo

11

asked Oct 7, 2024 at 9:42

0 votes

2 answers

252 views

Why even with anisotropy, we can still compute embedding similarity in (some RAG) projects

I have just noticed that the token/sentence embeddings trained from Transformer-based model will have strong anisotropy problem which means most of the embeddings are close to each other in the vector ...

yuu Mu

1

asked Oct 2, 2024 at 22:08

0 votes

1 answer

593 views

How do I add a CRF layer to a BERT model for NER tasks?

I have created an NER model using BERT to detect medical entities which works great. I'm trying to add a CRF layer on top of my BERT model to enhance its performances but I'm getting an error that I ...

Akram H

71

asked Sep 25, 2024 at 12:20

0 votes

1 answer

123 views

SBERT Fine-tuning always stops before finish all epochs

I'm working on a project using the SBERT pre-trained models (specifically MiniLM) for a text classification project with 995 classifications. I am following the steps laid out here for the most part ...

SohmOuse

11

asked Sep 25, 2024 at 3:55

0 votes

0 answers

418 views

Number of parameters in BERT model

Suppose you are pretraining a BERT model with 8 layers, 768-dim hidden states, 8 attention heads, and a sub-word vocabulary of size 40k. Also, your feed-forward hidden layer is of dimension 3072. What ...

Ss Dev

1

asked Sep 18, 2024 at 11:29

-1 votes

1 answer

191 views

Capitalized words in sentiment analysis

I'm currently working with data of customers reviews on products from Sephora. my task to classify them to sentiments : negative, neutral , positive . A common technique of text preprocessing is to ...

read data

3

asked Aug 30, 2024 at 13:49

0 votes

1 answer

257 views

Clustering for SBERT embedding

I have a set of sentences which I have transformed into vectors using SBERT embedding. I would like to cluster these vectors. When looking for informations online, I keep seeing post telling to do ...

Alex Jax

1

asked Aug 29, 2024 at 15:11

-1 votes

2 answers

1k views

fbgemm.dll module could not be found when using Pytorch/transformers

I currently am trying to use the BERT language model for invoice creation. However, i am receiving the error: OSError: [WinError 126] The specified module could not be found. Error loading "C:\...

Adam Ramondo

1

asked Aug 21, 2024 at 12:11

0 votes

2 answers

3k views

The python module sentence-transformers is not found, even though the package is installed

For my python script (as seen below), I use the package sentence-transformer, which contains SBERT models. Even though this package is clearly listed as installed when executing "pip list" ...

Nea13

1

asked Jul 29, 2024 at 19:18

-2 votes

1 answer

56 views

Hybridized collaborative filtering and sentence similarity-based system for doctor recommendation based on user input of symptoms and location

I'm trying to solve a problem of recommending a doctor based on a user's symptoms and location using a hybridized collaborative filtering and sentence similarity-based recommender system that follow ...

Sadura Akinrinwa

1

asked Jul 23, 2024 at 17:20

1 vote

2 answers

75 views

Identify starting row of actual data in Pandas DataFrame with merged header cells

My original df looks like this - df Note in the data frame: The headers are there till row 3 & from row 4 onwards, the values for those headers are starting. The numbers of rows & columns ...

Debojit Roy

11

asked Jul 20, 2024 at 10:55

0 votes

0 answers

78 views

BERT embedding cosine similarities look very random and useless

I thought you can use BERT embeddings to determine semantic similarity. I was trying to group some words in categories using this, but the results were very bad. E.g. here is a small example with ...

mihovg93

93

asked Jul 13, 2024 at 20:58

1 vote

0 answers

139 views

Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?

I am training a BERT model using pytorch and HuggingFace's BertModel. The sequences of tokens can vary in length from 1 (just a CLS token) to 128. The model trains fine when using absolute position ...

NW_liftoff

11

asked Jul 10, 2024 at 0:10

0 votes

1 answer

411 views

Exporting a Bert-based PyTorch model to CoreML. How can I make the CoreML model work for any input?

I use the code below to export a Bert-based PyTorch model to CoreML. Since I used dummy_input = tokenizer("A French fan", return_tensors="pt") the CoreML model only works with ...

Franck Dernoncourt

84.8k

asked Jul 3, 2024 at 23:39

Collectives™ on Stack Overflow