Newest 'rag' Questions

2 votes

0 answers

250 views

+50

OpenAI RAG LangChain tool calling is not working/compatible with withStructuredOutput()?

I have a OpenAI model with Retrieval-Augmented Generation (RAG): import {OpenAIEmbeddingFunction} from "@chroma-core/openai"; import chromaClient from "../config/chromadb"; import {...

yeln

757

asked Apr 10 at 4:46

Advice

0 votes

0 replies

110 views

Built a Continued Pretraining + Fine-Tuning pipeline for a Veterinary Drug LLM on BioGPT-Large — Looking for feedback on my approach

I've been working on adapting Microsoft's BioGPT-Large for veterinary pharmacology using Plumb's Veterinary Drug Handbook (2023) as my domain corpus. After going through a lot of trial and error, I ...

sahil koshti

1

asked Mar 23 at 3:59

Best practices

0 votes

0 replies

51 views

Implementing Deterministic Entity Resolution in a Multi-Agent RAG for Investigative Archiving

Body: I am architecting a Forensic Data Audit system (Multi-Agent RAG) to analyze fragmented, large-scale archives. A critical bottleneck is maintaining Entity Resolution (ER) across millions of ...

abdo zaalouk

1

asked Mar 22 at 13:40

Advice

0 votes

1 replies

68 views

How to perform asynchronous LLM inference on Kafka streams using Apache Spark, and handle high-throughput RAG ingestion?

I’m working on a streaming pipeline where data is coming from a Kafka topic, and I want to integrate LLM-based processing and RAG ingestion. I’m running into architectural challenges around latency ...

Arpan

993

asked Mar 20 at 19:41

Advice

0 votes

4 replies

72 views

An AI Assistant Chatbot based on RAG for University?

so one of the biggest hurdles I currently face is navigating through my University website to find the relevant data like fee structure for my course(which is updated bi annually) and other ...

Shehzad Khan

1

asked Mar 12 at 4:49

Advice

4 votes

8 replies

219 views

Improve the RAG chatbot result

I am building a local RAG chatbot using LangChain and ChromaDB (PersistentClient). I’m encountering 'hallucinations' when the similarity search returns documents with a low relevance score. How can I ...

grace h

1

asked Mar 4 at 7:18

Best practices

2 votes

2 replies

105 views

Best technique for retrieving large set of documents for local LLM

I need some help. I'm really struggling with RAG or some other things to scrape large documents to local llm (I'm using llama-server for running gpt-oss 20b). My question is: how to implement such ...

bud

1

asked Feb 27 at 13:03

1 vote

0 answers

91 views

Agentic RAG tool_calling issue: Groq + LangChain agent fails with tool_use_failed when calling custom tool (Llama 3.3)

I'm building a Streamlit app using LangChain (latest), LangGraph, and Groq with the model: llama-3.3-70b-versatile I'm using the modern create_agent() API (LangGraph-backed). The agent has two tools: ...

Ravikiran Arasur T S

1

asked Feb 25 at 10:03

5 votes

0 answers

165 views

CUDA error: CUBLAS_STATUS_INVALID_VALUE in cublasGemmEx() with PyTorch, fp16=False

I am using an RTX 3060 (12GB VRAM) and implementing a RAG pipeline with the BGE-M3 embedding model. Initially, I installed PyTorch with the CUDA 12.8 wheel (my NVIDIA driver supports CUDA 12.9). ...

Sujith A

51

asked Feb 17 at 17:10

0 votes

1 answer

66 views

LangChain.js createHistoryAwareRetriever with Ollama embeddings throws invalid input type error

What I am Working on I’m building a conversational RAG pipeline using LangChain JS with Ollama (local models). If I use a normal retriever created from a vector store, everything works fine and ...

Praneeth

3

asked Jan 27 at 21:21

0 votes

1 answer

86 views

Agentic RAG flow fails at chroma retrieval

import os, asyncio, json from dotenv import load_dotenv from autogen_agentchat.agents import AssistantAgent from autogen_agentchat.teams import DiGraphBuilder, GraphFlow from chromadb import ...

Sushruth Kamarushi

1

asked Jan 17 at 19:47

3 votes

0 answers

283 views

Is there a way in MCP to stream a LLM response chunk by chunk back to the client?

I'm using FastMCP in python to implement a MCP server. Currently I run into a problem when it comes to streaming of the generated tokens from the LLM. I don't want to wait for the completed response ...

Daniel

313

asked Dec 18, 2025 at 10:40

Tooling

0 votes

0 replies

101 views

How to use SelfQueryRetriever in the recents versions of Langchain?

I'm trying to use metadata in RAG systems using LangChain. I see a lot of tutorials using SelfQueryRetriever, but it appears that this was deprecated in recent versions. Is this correct? I couldn't ...

Augusto Firmo

11

asked Dec 12, 2025 at 20:49

Advice

2 votes

2 replies

123 views

RAG with Pinecone + GPT-5 for generating new math problems: incoherent outputs, mixed chunks, and lack of originality

I’m building a tool that generates new mathematics exam problems using an internal database of past problems. My current setup uses a RAG pipeline, Pinecone as the vector database, and GPT-5 as the ...

Marc-Loïc Abena

11

asked Nov 29, 2025 at 19:22

Best practices

1 vote

2 replies

175 views

Regarding rag for telephony with deepgram

I'm building a voice-based calling system where users can create AI agents that make outbound phone calls. The agent uses Deepgram for real-time transcription and ElevenLabs/Cartesia for speech ...

Sarthak Sahu

1

asked Nov 15, 2025 at 9:35

Advice

0 votes

1 replies

63 views

How can I group transcribed phrases into meaningful chunks without using complex models?

I have a large set of phrases obtained via Azure Fast Transcription, and I need to group them into coherent semantic chunks (to use later in a RAG pipeline). Initially, I tried grouping phrases based ...

Daniel

13

asked Nov 6, 2025 at 10:18

0 votes

0 answers

50 views

How to exclude metadata from embedding?

I'm using LlamaIndex 0.14.7. I would like to embed document text without concatenating metadata, because I put a long text in metadata. Here's my code: table_vec_store: SimpleVectorStore = ...

Trams

421

asked Nov 6, 2025 at 9:31

0 votes

0 answers

71 views

Langchain RAG is not retrieving any document

This is my embedding code, which I run once only: embeddings = OpenAIEmbeddings(model="text-embedding-3-large") vector_store = MongoDBAtlasVectorSearch.from_connection_string( ...

Mingruifu Lin

161

asked Oct 29, 2025 at 17:00

1 vote

1 answer

240 views

Why does answer_relevancy return NaN when evaluating RAG with Ragas?

I’m trying to evaluate my Retrieval-Augmented Generation (RAG) pipeline using Ragas. . Here’s a complete version of my code: """# RAG Evaluation""" from datasets import ...

Chandima

11

asked Sep 25, 2025 at 8:51

0 votes

1 answer

108 views

Chroma not accepting lists in persistentClient collection?

My objective is to do keyword filtering in Chroma. I have a field called keywords with a list of strings and I want to filter with it, but chroma won't let me add lists as a field. I checked my Chroma ...

Elena López-Negrete Burón

1

asked Sep 23, 2025 at 10:37

0 votes

1 answer

117 views

RAG Chatbot does not answer paraphrased questions

I built a RAG chatbot in python,langchain, and FAISS for the vectorstore. And the data is stored as JSON. The chatbot sometimes refuses to answer when a question is rephrased. Here are two ...

SoftwareEngineer

1

asked Sep 20, 2025 at 13:54

0 votes

0 answers

35 views

RAG Pipeline Memory Leak - Vector Embeddings Not Releasing After Context Switch in Memo AI

Question: I'm building a memory-augmented AI system using RAG with persistent vector storage, but facing memory leaks and context contamination between sessions. Problem: Vector embeddings aren't ...

TensorMind

1

asked Sep 18, 2025 at 8:20

0 votes

1 answer

83 views

module not found in haystack 2.17.1

i am trying to create a small starter llm RAG project using haystack. my project packages are below (I use UV): [project] name = "llm-project" version = "0.1.0" description = "...

femi

992

asked Sep 13, 2025 at 17:35

0 votes

0 answers

84 views

Why does LanceDB's full-text-search fail to find matches where the exact text is present?

I am trying to use lancedb to perform FTS, but getting spurious results. Here is a minimal example: # Data generation import lancedb import polars as pl from string import ascii_lowercase words = [...

MKWL

41

asked Sep 12, 2025 at 4:05

0 votes

0 answers

195 views

Zep Graphiti - core - Adding Episode fails the LLM structured output

On the ingestion part to the graph db, I pass a json file, as an episode, custom entities (and edges), using gemini api, but I get some discrepancy on the structured output, like so: LLM generation ...

George Petropoulos

438

asked Sep 7, 2025 at 21:11

0 votes

0 answers

66 views

How to send extra headers from RAGFlow Agent to a Spring Boot MCP server tool call?

I am using RAGFlow connected to a Spring Boot MCP server. My agent flow is simple: Begin node → collects inputs (auth_token, tenant_id, x_request_status) Agent (gpt-4o) → connected to MCP Tool (server)...

Ishan Garg

729

asked Sep 4, 2025 at 17:45

1 vote

0 answers

101 views

ragas with Ollama does not terminate

I am using the python package ragas with the goal of generating a testset for a RAG application. I am defining my BaseRagasLLM as: from langchain_ollama import OllamaLLM from ragas.llms import ...

oyster

21

asked Aug 29, 2025 at 14:41

1 vote

1 answer

469 views

Firecrawl self-hosted crawler throws Connection violated security rules error

I set up a self-hosted Firecrawl instance and I want to crawl my internal intranet site (e.g. https://intranet.xxx.gov.tr/). I can access the site directly both from the host machine and from inside ...

birdalugur

307

asked Aug 22, 2025 at 13:47

2 votes

1 answer

299 views

Why is FAISS document retrieval slow and inconsistent on EC2 t3.micro instance?

I'm building a document Q&A system using FAISS for vector search on an AWS EC2 t3.micro instance (1 vCPU, 1GB RAM). My FAISS index is relatively small (8.4MB .faiss + 1.4MB .pkl files), but I'm ...

user29255210

65

asked Aug 22, 2025 at 11:01

0 votes

0 answers

162 views

How to Use Pytest Fixtures in a RAG-Based LangChain Streamlit App?

I'm building a RAG (Retrieval-Augmented Generation) chatbot using LangChain, Gemini API, and Qdrant, with a Streamlit frontend. I want to write unit tests for the app using pytest, and I’m trying to ...

Krishna Suthar

1

asked Jul 28, 2025 at 22:43

0 votes

1 answer

172 views

How do I prevent duplicate messages in context window, when using rag and memory?

When using rag and memory, multiple identical copies of the same information is sent to the ai, when asking related questions. I have import java.util.ArrayList; import java.util.List; import dev....

MTilsted

5,555

asked Jul 28, 2025 at 21:59

0 votes

1 answer

403 views

Deleting data points in qDrant DB

I am trying to delete all the data points that are associated with a particular email Id, but I am encountering the following error. source code: app.get('/cleanUpResources', async (req, res) => { ...

Abhishek Prasad

11

asked Jul 24, 2025 at 19:05

-1 votes

1 answer

329 views

ImportError: cannot import name 'Client' from 'pinecone' (unknown location)

The problem with this piece of code is that I am unable to import Client from the pinecone library. I tried to uninstalling and reinstalling different versions none of them worked. I also tried it ...

ACR

11

asked Jul 24, 2025 at 2:41

-1 votes

1 answer

63 views

How to ensure all documents contribute to summary context after merging indexes?

I'm building a LangChain RAG pipeline using the FAISS vector store. I'm merging multiple FAISS indexes — each representing one document — and then querying them to generate summaries or answers via ...

Musab

54

asked Jul 22, 2025 at 6:21

1 vote

0 answers

242 views

How to handle follow-up confirmations in Spring AI 1.0.0 without losing context during tool selection using RAG?

I'm building a web application using Spring Boot 3.4.5 and Spring AI 1.0.0 with Llama3.2(Ollama) model integration. I've implemented tool calling, and because I have many tools in the application, I'm ...

Sarath Molathoti

81

asked Jul 1, 2025 at 12:27

0 votes

0 answers

120 views

What's the reason I get a blank screen while uploading a Json to Flowise?

I have been recently trying to do a multiagent project that to summarize, consists on: Through an user input (often a query), the first agent will be dedicated to making the input more suitable for ...

PMathC

1

asked Jun 26, 2025 at 9:49

-1 votes

1 answer

879 views

AttributeError: 'LlmAgent' object has no attribute 'invoke'

I am trying to call Flask API which i alrady running on port 5000 on my system, i am desgning a agentic AI code which will invoke GET and then POSt based on some condition , and using google-adk. I ...

witty_minds

77

asked Jun 20, 2025 at 6:18

1 vote

0 answers

138 views

Sentence similarity pipeline with @huggingface/transformers

Wanted to use the pipeline api from @huggingface/transformers js for sentence-similarity - but I do not see a specific pipeline for it. The closest thing is text classification and feature extractions ...

Edv Beq

1,020

asked Jun 4, 2025 at 16:16

1 vote

0 answers

82 views

Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks (No LangChain)

I'm building a RAG-based document QA system using Python (no LangChain), LLaMA (50K context), PostgreSQL with pgvector, and Docling for parsing. Users can upload up to 10 large documents (300+ pages ...

Anton Lee

11

asked Jun 2, 2025 at 16:30

0 votes

0 answers

59 views

multi-intent queries in vector database retrieval

I'm working on a RAG pipeline using a vector database to search over a Q&A dataset. I'm using embedding-based dense retrieval to fetch relevant answers to user queries. The issue I'm facing is ...

MojtabaMAleki02

5

asked May 30, 2025 at 11:59

0 votes

0 answers

77 views

Using llama-index with the deployed LLM

I wanted to make a web app that uses llama-index to answer queries using RAG from specific documents. I have locally set up Llama3.2-1B-instruct llm and using that locally to create indexes of the ...

Utkarsh

1

asked May 29, 2025 at 11:17

1 vote

1 answer

865 views

Why is the upload of files to GCP Vertex AI RAG corpora so slow?

I am experimenting with RAG on GCP/Vertex AI, and tried to create some simple example. Here's what I came up with, creating small dummy files locally and then uploading them one by one to a newly-...

Davide Fiocco

6,039

asked May 9, 2025 at 11:58

0 votes

0 answers

213 views

Llamaindex returns "Empty Response"

I have a RAG system using llamaindex. I am upgrading library from 0.10.44 to 0.12.33. I see a different behaviour now. Before when there were not results from vectors store it seems it called the LLM ...

Deibys

669

asked May 6, 2025 at 14:16

0 votes

0 answers

102 views

How to loop through text chunks created using AzureOpenAI `client.vector_stores.create`

I checked Azure's documentation on this topic here but I do not see anything related to this. My goal is to create a question and answer dataset for my RAG solution based on each chunk for a good ...

Mike B

3,629

asked May 5, 2025 at 13:03

1 vote

1 answer

170 views

Embedding model `all-mpnet-base-v2` not able to classify user prompt properly

I am using this model to embed a product catalog for a rag. In the product catalog, there are no red shirts for men, but there are red shirts for women. How can I make sure the model doesnt output ...

Advait Shendage

11

asked Apr 29, 2025 at 9:16

0 votes

2 answers

79 views

SitemapLoader(sitemap_url).load() hangs

from langchain_community.document_loaders import SitemapLoader def crawl(self): print("Starting crawler...") sitemap_url = "https://gringo.co.il/sitemap.xml" ...

Gulzar

29k

asked Apr 18, 2025 at 19:38

1 vote

2 answers

1k views

How to add S3 bucket objects metadata into bedrock knowledgebase?

I am using AWS bedrock for the first time. I have configured the data source which is S3 along with opensearch serverless cluster for embeddings. However, I do not have any control over the mappings ...

Makarand

646

asked Apr 14, 2025 at 0:51

1 vote

0 answers

52 views

how to deal with evolving information in RAG?

I'm trying to index a series of articles to use in a RAG knowledge base, I cannot find any best practice or recommendation documented about dealing with information that changes or evolves in time. ...

weeanon

821

asked Apr 7, 2025 at 13:10

0 votes

1 answer

523 views

I am using LangChain4j to develop a knowledge base and encountered the "different vector dimensions 1024 and 384"

I want to know if there are any other settings required for pgvector or what content needs to be set in the code to enable pgvector to support higher vector dimensions. I found on the official website ...

tom

3

asked Apr 5, 2025 at 13:12

0 votes

1 answer

28 views

How to Reduce time when formatting the Cypher result?

I'm retrieving results from a Cypher query, which includes the article's date and text. After fetching the results, I'm formatting them before passing them to the LLM for response generation. ...

Yuvraj Singh Bhadauria

1

asked Mar 26, 2025 at 8:23

Collectives™ on Stack Overflow