1,655 questions
-5
votes
0
answers
24
views
What AI model does Greptile use for its code review capabilities? [closed]
I am exploring AI-based code review tools and came across Greptile, which claims to provide deep, context-aware code reviews by analyzing entire repositories. However, I couldn't find definitive ...
1
vote
1
answer
113
views
ModuleNotFoundError when importing ConversationBufferMemory and ConversationalRetrievalChain from LangChain
I'm trying to import ConversationBufferMemory and ConversationalRetrievalChain in my Python notebook as follows:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ...
Advice
0
votes
0
replies
62
views
Text-to-Speech function in a 3D Unity game
I am doing a 3D game using a Unity, and I need my NPC can communicate with the players with voice
I want my NPC can speak with a voice that response is generated by the LLM, so the response will ...
0
votes
0
answers
42
views
Langchain RAG is not retrieving any document
This is my embedding code, which I run once only:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
...
0
votes
0
answers
60
views
How is the Model Context Protocol (MCP) secured when interacting with LLMs? [closed]
I’ve been exploring the Model Context Protocol (MCP) and its role in connecting external data or tools with large language models (LLMs).
However, I’m curious about how MCP ensures security, ...
0
votes
0
answers
16
views
Does SGLang’s OpenAI-compatible API support async/await non-streaming calls?
I’m using SGLang’s OpenAI-compatible server (e.g., --port 30000, /v1/chat/completions) and calling it via the openai SDK with an async client:
from openai import AsyncOpenAI
client = AsyncOpenAI(...
-3
votes
0
answers
66
views
Unsloth installation on windows - triton cannot install
I am trying to set up Unsloth on my Windows machine with an NVIDIA GeForce RTX 5090 GPU (Blackwell architecture), but I am running into an issue.
Environment details:
OS: Windows 11
Python: 3.12
Conda ...
0
votes
0
answers
30
views
Open WebUI pipeline only activates on action [closed]
I am using open-webui pipeline for exporting traces to langfuse from open-webui, chats will be recognized as traces but the session and users will remain unknown unless i load the code from github ...
1
vote
0
answers
29
views
Redis- OpenAI not able to tune with actual radis text retrieved from vector index
I am trying to create a simple vector index for conversation AI application where i want to use radis as long-term memory.
i configured radis locally and created the index which ideally stores
"...
0
votes
0
answers
47
views
Does setting torch_dtype=torch.float16 override 8-bit quantization in BitsAndBytes?
I'm trying to run the Qwen2.5-Coder-3B model locally with 8-bit quantization using BitsAndBytes.
While loading the model, I noticed that some examples also specify torch_dtype=torch.float16.
From my ...
1
vote
0
answers
55
views
Why does botocore raise ValidationException with pydantic BaseModel output constraint on variable length output types?
I am using pydantic AI and use pydantic BaseModel to set the output_type of the answer. From time to time I encounter a ValidationException. I cannot accurately pinpoint the issue creating this ...
0
votes
0
answers
71
views
Torch example transformer with TransformerDecoder
In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...
0
votes
0
answers
51
views
Granite 4.0 H-Micro with LangGraph create_react_agent not invoking tools
I'm trying to use IBM Granite 4.0 H-Micro-Q2_K.gguf locally with LangGraph’s create_react_agent() (or the newer create_agent()) to automatically call tools defined in Python.
I have two tools defined:
...
0
votes
0
answers
45
views
How do I get a conversation going with a MistralAI Agent using LangChain?
I am learning how to make Agents using MistralAI and LangChain. I have been following the official documentation from LangChain but I got stuck at when I was calling the LLM for the second time.
I ...
-1
votes
0
answers
45
views
Can agent simulation have an integrated LLM
I'm supposed to develop a mock processor that takes the following payload:
{
"scope": {"type": "folder", "name": "invoices-2025"},
"messages&...
0
votes
1
answer
77
views
Cannot get token logprobs while using langchain structured output
I am using langchain to call an LLM and I want to get the logprobs for each token.
I want to get them after doing this:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
class ...
0
votes
1
answer
32
views
MLXLMCommon in Swift gives error when loading model: noModelFactoryAvailable
I am following the tutorial here:
https://github.com/ml-explore/mlx-swift-examples/tree/main/Libraries/MLXLMCommon
import MLXLMCommon
Task {
do {
print("Loading model...")
...
0
votes
0
answers
87
views
Firebase Genkit using prompt in chat
I am using chat with genKit library like below:
const session = agenticAi.createSession<any>({
initialState: {
uid: uid,
},
});
const chat = session.chat({
model: googleAI.model('...
1
vote
0
answers
40
views
How can I make an MCP tool ask the LLM to request missing parameters instead of sending empty strings?
I'm building an MCP server (with spring ai 1.1.0-M3) that exposes a tool for searching contacts in my internal system.
Here’s a simplified version of the tool method:
public class myTool {
@...
0
votes
0
answers
51
views
AgentDojo repo on Github: I can't reproduce the table results, because the Security results are always 0.00%
I tried running this command with GPT_4O_MINI_2024_07_18, so that the rate limits wouldn't block the execution before the results are printed
python -m agentdojo.scripts.benchmark -s workspace -ut ...
0
votes
1
answer
39
views
Getting "FATAL FIPS SELFTEST FAILURE" when importing qwen-vl-utils
When I run
from qwen_vl_utils import process_vision_info
in my Python environment, I get
crypto/fips/fips.c:154: OpenSSL internal error: FATAL FIPS SELFTEST FAILURE
Aborted
I'm using
OpenSSL 3.3.2
...
1
vote
0
answers
149
views
Installation error while installing GroundingDino
I am trying to install the GroundingDino as instructed in the README file of their official GitHub repo, but I am facing the error below:
Obtaining file:///home/kgupta/workspace/Synthetic_Data_gen/...
0
votes
0
answers
41
views
How to reduce latency in a context-aware chatbot with chart + dataset inputs
I’m building a chatbot for my research project that helps participants understand a chart. The chatbot runs on a website built with React.
My goal is to make it feel just like using ChatGPT in the ...
0
votes
1
answer
138
views
Error while deploying, but not in local: "crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs"
When I initialize a Crew in Azure, I get an error:
crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs"
followed by lots of uuids.
from crewai import ...
0
votes
1
answer
97
views
What does total_token_count means from gemini response?
I'm trying to understand how total_token_count is calculated for the gemini-2.5-flash model.
The official documentation suggests total_token_count = prompt_token_count + candidates_token_count, but my ...
-1
votes
1
answer
57
views
How to reconstruct sentences from mean-pooled embeddings (embedding inversion) [closed]
I’m working on a research problem where I want to reconstruct or paraphrase sentences starting from synthetic embeddings.
The embeddings are global (mean-pooled), not token-level, so they lose ...
0
votes
0
answers
66
views
How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights
After failing to make the QwenImageEditPlus run (https://huggingface.co/spaces/discord-community/README/discussions/9#68d260e32053323e6bfab30c), I tried a different approach (thanks to all the example ...
0
votes
0
answers
31
views
Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?
I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document.
I read that after having reached maximum context, I should restart the ...
0
votes
1
answer
66
views
Schema Guided Reasoning to dynamically force LLM to select one of the values from list
I am trying to use a Schema Guided Reasoning approach to develop a system which will classify for me product into one of the categories from predefined list based on its name\description. For that I ...
1
vote
1
answer
1k
views
AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'
I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...
5
votes
1
answer
243
views
How to stream LLM responses in a Shiny app instead of waiting for full output?
I am creating a Shiny app in R. One of its features is to display processed text within the app, and then allow the user to click a button to send that text to an LLM (Large Language Model).
The ...
1
vote
0
answers
59
views
Why does hugging face trainer still recognize different device between my encoder & classifier head even after I manually map it on the same device
I encounterd this error while trying to run hugging face trainer on a multi-gpu.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
I use a ...
3
votes
0
answers
47
views
Azure ML Endpoint Fails with HFValidationError even after using pathlib.Path
I am trying to deploy a fine-tuned Mistral-7B model on an Azure ML Online Endpoint. The deployment repeatedly fails during the init() phase of the scoring script with an huggingface_hub.errors....
0
votes
0
answers
132
views
Zep Graphiti - core - Adding Episode fails the LLM structured output
On the ingestion part to the graph db, I pass a json file, as an episode, custom entities (and edges), using gemini api, but I get some discrepancy on the structured output, like so:
LLM generation ...
0
votes
0
answers
49
views
How to send extra headers from RAGFlow Agent to a Spring Boot MCP server tool call?
I am using RAGFlow
connected to a Spring Boot MCP server.
My agent flow is simple:
Begin node → collects inputs (auth_token, tenant_id, x_request_status)
Agent (gpt-4o) → connected to
MCP Tool (server)...
0
votes
0
answers
48
views
How to enforce Claude to respond with either text or tool_use, but not both at the same time?
I’m already familiar with the different options for the tool_choice parameter (auto, tool, any, none). That’s not what my question is about.
I always let Claude decide whether to use a tool, and if so,...
0
votes
0
answers
52
views
How to dynamically register Language Model Tools in a VS Code extension?
I want to register different Language Model Tools in my VS Code extension depending on a specific configuration.
Currently, the way VS Code works seems to be:
I need to declare the tool in package....
0
votes
0
answers
92
views
How to handle stateful MCP connections in a load-balanced agentic application?
I'm building an agentic application where users interact with AI agents. Here's my setup:
Current Architecture:
Agent supports remote tool calling via MCP (Model Context Protocol)
Each user ...
0
votes
0
answers
188
views
Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead
Description:
I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...
1
vote
1
answer
180
views
How to upsert documents to Flowise document store using API Loader instead of file upload?
’m currently working with Flowise and trying to update my document store via API requests.
I followed the documentation here.
Specifically, I want to try Scenario 2: In the same document store, ...
0
votes
1
answer
84
views
Can't connect to Ollama hosted locally from python script
I am building ETL using LLM to extract some information.
I have ollama installed locally. I am on Macbook M4 Max.
I don't understand why I have this error from my worker.
ads-worker-1 | 2025-08-28 15:...
0
votes
0
answers
73
views
How to accelerate my corpus embedding to the chromadb
I have the corpus.jsonl which has 6.5gb storage.And i use the one h100 gpu to embedding the corpus to the chromadb,but it seems very slowly.I want to find how can i accelerate the progress(gpu,cpu,io)....
1
vote
0
answers
274
views
Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python
I tried to install llama-cpp-python via pip, but I have an error with the installation
The command that I wrote:
CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...
0
votes
1
answer
44
views
Agent not using both tool specs
I've created an Agent using llama index. When I specify only one tool spec, it works correctly. However, when I try to use two, one is ignored.
import asyncio
import logging
import os
from dotenv ...
0
votes
0
answers
50
views
Smolagents CodeAgent gets error from correct code
the Smolagents CodeAgent is given a task to convert a string into markdown table format. It successfully captures the related part of the string and writes the code for markdown table formatting. ...
0
votes
1
answer
380
views
How to store Gemini 2.5 Flash + MCP multi-turn conversation data (including tool calls and responses)?
I’m building a multi-turn conversation system using Gemini 2.5 Flash with thinking and Model Context Protocol (MCP) tool calls.
With OpenAI models, I usually store conversation history as an array of ...
0
votes
0
answers
211
views
TypeError: PPOTrainer.__init__() got an unexpected keyword argument 'config'
I am trying to initialize a PPO_trainer but have issues.
from trl import PPOTrainer, PPOConfig
ppo_config = PPOConfig(
batch_size=4,
learning_rate=1e-5,
mini_batch_size=2,
use_cpu=...
2
votes
3
answers
139
views
FastAPI endpoint stream LLM output word for word
I have a FastAPI endpoint (/generateStreamer) that generates responses from an LLM model. I want to stream the output so users can see the text as it’s being generated, rather than waiting for the ...
0
votes
0
answers
73
views
Is there a way to refactor a Maven submodule's pom.xml to be completely "standalone" from its parent?
I'm conducting a study in which I'm examining how effective LLMs are at translating code between frameworks. Here is one of the datasets I'm using to test this: https://github.com/eclipse-ee4j/...
0
votes
0
answers
54
views
Getting always 0 in agent evaluation with Agent Goal accuracy In RAGAS AI Framework
I am using ragas ==0.2.15.
I have created an Investment Research Assistant in LangChain-based conversational AI system designed to guide users in making informed investment decisions. It supports real-...