Skip to main content
Filter by
Sorted by
Tagged with
-5 votes
0 answers
24 views

What AI model does Greptile use for its code review capabilities? [closed]

I am exploring AI-based code review tools and came across Greptile, which claims to provide deep, context-aware code reviews by analyzing entire repositories. However, I couldn't find definitive ...
Kayalvizhi Palanisamy's user avatar
1 vote
1 answer
113 views

ModuleNotFoundError when importing ConversationBufferMemory and ConversationalRetrievalChain from LangChain

I'm trying to import ConversationBufferMemory and ConversationalRetrievalChain in my Python notebook as follows: from langchain.memory import ConversationBufferMemory from langchain.chains import ...
Shaffan's user avatar
  • 23
Advice
0 votes
0 replies
62 views

Text-to-Speech function in a 3D Unity game

I am doing a 3D game using a Unity, and I need my NPC can communicate with the players with voice I want my NPC can speak with a voice that response is generated by the LLM, so the response will ...
Alden Ling's user avatar
0 votes
0 answers
42 views

Langchain RAG is not retrieving any document

This is my embedding code, which I run once only: embeddings = OpenAIEmbeddings(model="text-embedding-3-large") vector_store = MongoDBAtlasVectorSearch.from_connection_string( ...
Mingruifu Lin's user avatar
0 votes
0 answers
60 views

How is the Model Context Protocol (MCP) secured when interacting with LLMs? [closed]

I’ve been exploring the Model Context Protocol (MCP) and its role in connecting external data or tools with large language models (LLMs). However, I’m curious about how MCP ensures security, ...
Dipanshu S's user avatar
0 votes
0 answers
16 views

Does SGLang’s OpenAI-compatible API support async/await non-streaming calls?

I’m using SGLang’s OpenAI-compatible server (e.g., --port 30000, /v1/chat/completions) and calling it via the openai SDK with an async client: from openai import AsyncOpenAI client = AsyncOpenAI(...
Erfan Mhi's user avatar
-3 votes
0 answers
66 views

Unsloth installation on windows - triton cannot install

I am trying to set up Unsloth on my Windows machine with an NVIDIA GeForce RTX 5090 GPU (Blackwell architecture), but I am running into an issue. Environment details: OS: Windows 11 Python: 3.12 Conda ...
Nisal Chandira De Zoysa's user avatar
0 votes
0 answers
30 views

Open WebUI pipeline only activates on action [closed]

I am using open-webui pipeline for exporting traces to langfuse from open-webui, chats will be recognized as traces but the session and users will remain unknown unless i load the code from github ...
noobie's user avatar
  • 19
1 vote
0 answers
29 views

Redis- OpenAI not able to tune with actual radis text retrieved from vector index

I am trying to create a simple vector index for conversation AI application where i want to use radis as long-term memory. i configured radis locally and created the index which ideally stores "...
Hari's user avatar
  • 11
0 votes
0 answers
47 views

Does setting torch_dtype=torch.float16 override 8-bit quantization in BitsAndBytes?

I'm trying to run the Qwen2.5-Coder-3B model locally with 8-bit quantization using BitsAndBytes. While loading the model, I noticed that some examples also specify torch_dtype=torch.float16. From my ...
SHresTho12's user avatar
1 vote
0 answers
55 views

Why does botocore raise ValidationException with pydantic BaseModel output constraint on variable length output types?

I am using pydantic AI and use pydantic BaseModel to set the output_type of the answer. From time to time I encounter a ValidationException. I cannot accurately pinpoint the issue creating this ...
DeerFreak's user avatar
  • 117
0 votes
0 answers
71 views

Torch example transformer with TransformerDecoder

In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...
cuneyttyler's user avatar
  • 1,395
0 votes
0 answers
51 views

Granite 4.0 H-Micro with LangGraph create_react_agent not invoking tools

I'm trying to use IBM Granite 4.0 H-Micro-Q2_K.gguf locally with LangGraph’s create_react_agent() (or the newer create_agent()) to automatically call tools defined in Python. I have two tools defined: ...
jashan khangura's user avatar
0 votes
0 answers
45 views

How do I get a conversation going with a MistralAI Agent using LangChain?

I am learning how to make Agents using MistralAI and LangChain. I have been following the official documentation from LangChain but I got stuck at when I was calling the LLM for the second time. I ...
Gokul Krishna Balaji's user avatar
-1 votes
0 answers
45 views

Can agent simulation have an integrated LLM

I'm supposed to develop a mock processor that takes the following payload: { "scope": {"type": "folder", "name": "invoices-2025"}, "messages&...
Eyy boss's user avatar
  • 113
0 votes
1 answer
77 views

Cannot get token logprobs while using langchain structured output

I am using langchain to call an LLM and I want to get the logprobs for each token. I want to get them after doing this: from langchain_openai import ChatOpenAI from pydantic import BaseModel class ...
rikyeah's user avatar
  • 2,128
0 votes
1 answer
32 views

MLXLMCommon in Swift gives error when loading model: noModelFactoryAvailable

I am following the tutorial here: https://github.com/ml-explore/mlx-swift-examples/tree/main/Libraries/MLXLMCommon import MLXLMCommon Task { do { print("Loading model...") ...
sudoExclamationExclamation's user avatar
0 votes
0 answers
87 views

Firebase Genkit using prompt in chat

I am using chat with genKit library like below: const session = agenticAi.createSession<any>({ initialState: { uid: uid, }, }); const chat = session.chat({ model: googleAI.model('...
Moblize IT's user avatar
  • 1,342
1 vote
0 answers
40 views

How can I make an MCP tool ask the LLM to request missing parameters instead of sending empty strings?

I'm building an MCP server (with spring ai 1.1.0-M3) that exposes a tool for searching contacts in my internal system. Here’s a simplified version of the tool method: public class myTool { @...
gs fs's user avatar
  • 45
0 votes
0 answers
51 views

AgentDojo repo on Github: I can't reproduce the table results, because the Security results are always 0.00%

I tried running this command with GPT_4O_MINI_2024_07_18, so that the rate limits wouldn't block the execution before the results are printed python -m agentdojo.scripts.benchmark -s workspace -ut ...
Saif Farid's user avatar
0 votes
1 answer
39 views

Getting "FATAL FIPS SELFTEST FAILURE" when importing qwen-vl-utils

When I run from qwen_vl_utils import process_vision_info in my Python environment, I get crypto/fips/fips.c:154: OpenSSL internal error: FATAL FIPS SELFTEST FAILURE Aborted I'm using OpenSSL 3.3.2 ...
Anson Savage's user avatar
1 vote
0 answers
149 views

Installation error while installing GroundingDino

I am trying to install the GroundingDino as instructed in the README file of their official GitHub repo, but I am facing the error below: Obtaining file:///home/kgupta/workspace/Synthetic_Data_gen/...
Mahfuzur Mahim Rahman's user avatar
0 votes
0 answers
41 views

How to reduce latency in a context-aware chatbot with chart + dataset inputs

I’m building a chatbot for my research project that helps participants understand a chart. The chatbot runs on a website built with React. My goal is to make it feel just like using ChatGPT in the ...
Hesper's user avatar
  • 161
0 votes
1 answer
138 views

Error while deploying, but not in local: "crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs"

When I initialize a Crew in Azure, I get an error: crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs" followed by lots of uuids. from crewai import ...
Ray's user avatar
  • 3,994
0 votes
1 answer
97 views

What does total_token_count means from gemini response?

I'm trying to understand how total_token_count is calculated for the gemini-2.5-flash model. The official documentation suggests total_token_count = prompt_token_count + candidates_token_count, but my ...
Wonjune Shin's user avatar
-1 votes
1 answer
57 views

How to reconstruct sentences from mean-pooled embeddings (embedding inversion) [closed]

I’m working on a research problem where I want to reconstruct or paraphrase sentences starting from synthetic embeddings. The embeddings are global (mean-pooled), not token-level, so they lose ...
melissa mattos's user avatar
0 votes
0 answers
66 views

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

After failing to make the QwenImageEditPlus run (https://huggingface.co/spaces/discord-community/README/discussions/9#68d260e32053323e6bfab30c), I tried a different approach (thanks to all the example ...
Siladittya's user avatar
  • 1,215
0 votes
0 answers
31 views

Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?

I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document. I read that after having reached maximum context, I should restart the ...
user305883's user avatar
  • 1,739
0 votes
1 answer
66 views

Schema Guided Reasoning to dynamically force LLM to select one of the values from list

I am trying to use a Schema Guided Reasoning approach to develop a system which will classify for me product into one of the categories from predefined list based on its name\description. For that I ...
Maksim Khaitovich's user avatar
1 vote
1 answer
1k views

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...
Quinten's user avatar
  • 42.7k
5 votes
1 answer
243 views

How to stream LLM responses in a Shiny app instead of waiting for full output?

I am creating a Shiny app in R. One of its features is to display processed text within the app, and then allow the user to click a button to send that text to an LLM (Large Language Model). The ...
LIANG Chen's user avatar
1 vote
0 answers
59 views

Why does hugging face trainer still recognize different device between my encoder & classifier head even after I manually map it on the same device

I encounterd this error while trying to run hugging face trainer on a multi-gpu. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! I use a ...
Dwi Rezky Fahlan's user avatar
3 votes
0 answers
47 views

Azure ML Endpoint Fails with HFValidationError even after using pathlib.Path

I am trying to deploy a fine-tuned Mistral-7B model on an Azure ML Online Endpoint. The deployment repeatedly fails during the init() phase of the scoring script with an huggingface_hub.errors....
User's user avatar
  • 157
0 votes
0 answers
132 views

Zep Graphiti - core - Adding Episode fails the LLM structured output

On the ingestion part to the graph db, I pass a json file, as an episode, custom entities (and edges), using gemini api, but I get some discrepancy on the structured output, like so: LLM generation ...
George Petropoulos's user avatar
0 votes
0 answers
49 views

How to send extra headers from RAGFlow Agent to a Spring Boot MCP server tool call?

I am using RAGFlow connected to a Spring Boot MCP server. My agent flow is simple: Begin node → collects inputs (auth_token, tenant_id, x_request_status) Agent (gpt-4o) → connected to MCP Tool (server)...
Ishan Garg's user avatar
0 votes
0 answers
48 views

How to enforce Claude to respond with either text or tool_use, but not both at the same time?

I’m already familiar with the different options for the tool_choice parameter (auto, tool, any, none). That’s not what my question is about. I always let Claude decide whether to use a tool, and if so,...
Alexander Popov's user avatar
0 votes
0 answers
52 views

How to dynamically register Language Model Tools in a VS Code extension?

I want to register different Language Model Tools in my VS Code extension depending on a specific configuration. Currently, the way VS Code works seems to be: I need to declare the tool in package....
leonard520's user avatar
0 votes
0 answers
92 views

How to handle stateful MCP connections in a load-balanced agentic application?

I'm building an agentic application where users interact with AI agents. Here's my setup: Current Architecture: Agent supports remote tool calling via MCP (Model Context Protocol) Each user ...
Atharva's user avatar
0 votes
0 answers
188 views

Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead

Description: I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...
Promit Dey Sarker Arjan's user avatar
1 vote
1 answer
180 views

How to upsert documents to Flowise document store using API Loader instead of file upload?

’m currently working with Flowise and trying to update my document store via API requests. I followed the documentation here. Specifically, I want to try Scenario 2: In the same document store, ...
傅靖茹's user avatar
0 votes
1 answer
84 views

Can't connect to Ollama hosted locally from python script

I am building ETL using LLM to extract some information. I have ollama installed locally. I am on Macbook M4 Max. I don't understand why I have this error from my worker. ads-worker-1 | 2025-08-28 15:...
Mael Fosso's user avatar
0 votes
0 answers
73 views

How to accelerate my corpus embedding to the chromadb

I have the corpus.jsonl which has 6.5gb storage.And i use the one h100 gpu to embedding the corpus to the chromadb,but it seems very slowly.I want to find how can i accelerate the progress(gpu,cpu,io)....
YiJun Sachs's user avatar
1 vote
0 answers
274 views

Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python

I tried to install llama-cpp-python via pip, but I have an error with the installation The command that I wrote: CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...
ZZISST's user avatar
  • 21
0 votes
1 answer
44 views

Agent not using both tool specs

I've created an Agent using llama index. When I specify only one tool spec, it works correctly. However, when I try to use two, one is ignored. import asyncio import logging import os from dotenv ...
UserX's user avatar
  • 101
0 votes
0 answers
50 views

Smolagents CodeAgent gets error from correct code

the Smolagents CodeAgent is given a task to convert a string into markdown table format. It successfully captures the related part of the string and writes the code for markdown table formatting. ...
aearslan's user avatar
  • 176
0 votes
1 answer
380 views

How to store Gemini 2.5 Flash + MCP multi-turn conversation data (including tool calls and responses)?

I’m building a multi-turn conversation system using Gemini 2.5 Flash with thinking and Model Context Protocol (MCP) tool calls. With OpenAI models, I usually store conversation history as an array of ...
6zL's user avatar
  • 21
0 votes
0 answers
211 views

TypeError: PPOTrainer.__init__() got an unexpected keyword argument 'config'

I am trying to initialize a PPO_trainer but have issues. from trl import PPOTrainer, PPOConfig ppo_config = PPOConfig( batch_size=4, learning_rate=1e-5, mini_batch_size=2, use_cpu=...
m0ss's user avatar
  • 472
2 votes
3 answers
139 views

FastAPI endpoint stream LLM output word for word

I have a FastAPI endpoint (/generateStreamer) that generates responses from an LLM model. I want to stream the output so users can see the text as it’s being generated, rather than waiting for the ...
sander's user avatar
  • 1,490
0 votes
0 answers
73 views

Is there a way to refactor a Maven submodule's pom.xml to be completely "standalone" from its parent?

I'm conducting a study in which I'm examining how effective LLMs are at translating code between frameworks. Here is one of the datasets I'm using to test this: https://github.com/eclipse-ee4j/...
Advait's user avatar
  • 46
0 votes
0 answers
54 views

Getting always 0 in agent evaluation with Agent Goal accuracy In RAGAS AI Framework

I am using ragas ==0.2.15. I have created an Investment Research Assistant in LangChain-based conversational AI system designed to guide users in making informed investment decisions. It supports real-...
Divya M's user avatar

1
2 3 4 5
34