Newest 'large-language-model' Questions

-5 votes

0 answers

24 views

What AI model does Greptile use for its code review capabilities? [closed]

I am exploring AI-based code review tools and came across Greptile, which claims to provide deep, context-aware code reviews by analyzing entire repositories. However, I couldn't find definitive ...

Kayalvizhi Palanisamy

1

asked 39 mins ago

1 vote

1 answer

113 views

ModuleNotFoundError when importing ConversationBufferMemory and ConversationalRetrievalChain from LangChain

I'm trying to import ConversationBufferMemory and ConversationalRetrievalChain in my Python notebook as follows: from langchain.memory import ConversationBufferMemory from langchain.chains import ...

Shaffan

23

asked Oct 31 at 19:40

Advice

0 votes

0 replies

62 views

Text-to-Speech function in a 3D Unity game

I am doing a 3D game using a Unity, and I need my NPC can communicate with the players with voice I want my NPC can speak with a voice that response is generated by the LLM, so the response will ...

Alden Ling

15

asked Oct 30 at 13:30

0 votes

0 answers

42 views

Langchain RAG is not retrieving any document

This is my embedding code, which I run once only: embeddings = OpenAIEmbeddings(model="text-embedding-3-large") vector_store = MongoDBAtlasVectorSearch.from_connection_string( ...

Mingruifu Lin

161

asked Oct 29 at 17:00

0 votes

0 answers

60 views

How is the Model Context Protocol (MCP) secured when interacting with LLMs? [closed]

I’ve been exploring the Model Context Protocol (MCP) and its role in connecting external data or tools with large language models (LLMs). However, I’m curious about how MCP ensures security, ...

Dipanshu S

17

asked Oct 28 at 18:48

0 votes

0 answers

16 views

Does SGLang’s OpenAI-compatible API support async/await non-streaming calls?

I’m using SGLang’s OpenAI-compatible server (e.g., --port 30000, /v1/chat/completions) and calling it via the openai SDK with an async client: from openai import AsyncOpenAI client = AsyncOpenAI(...

Erfan Mhi

95

asked Oct 28 at 15:06

-3 votes

0 answers

66 views

Unsloth installation on windows - triton cannot install

I am trying to set up Unsloth on my Windows machine with an NVIDIA GeForce RTX 5090 GPU (Blackwell architecture), but I am running into an issue. Environment details: OS: Windows 11 Python: 3.12 Conda ...

Nisal Chandira De Zoysa

11

asked Oct 28 at 6:13

0 votes

0 answers

30 views

Open WebUI pipeline only activates on action [closed]

I am using open-webui pipeline for exporting traces to langfuse from open-webui, chats will be recognized as traces but the session and users will remain unknown unless i load the code from github ...

noobie

19

asked Oct 27 at 7:27

1 vote

0 answers

29 views

Redis- OpenAI not able to tune with actual radis text retrieved from vector index

I am trying to create a simple vector index for conversation AI application where i want to use radis as long-term memory. i configured radis locally and created the index which ideally stores "...

Hari

11

asked Oct 26 at 9:02

0 votes

0 answers

47 views

Does setting torch_dtype=torch.float16 override 8-bit quantization in BitsAndBytes?

I'm trying to run the Qwen2.5-Coder-3B model locally with 8-bit quantization using BitsAndBytes. While loading the model, I noticed that some examples also specify torch_dtype=torch.float16. From my ...

SHresTho12

147

asked Oct 24 at 22:47

1 vote

0 answers

55 views

Why does botocore raise ValidationException with pydantic BaseModel output constraint on variable length output types?

I am using pydantic AI and use pydantic BaseModel to set the output_type of the answer. From time to time I encounter a ValidationException. I cannot accurately pinpoint the issue creating this ...

DeerFreak

117

asked Oct 22 at 11:29

0 votes

0 answers

71 views

Torch example transformer with TransformerDecoder

In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...

cuneyttyler

1,395

asked Oct 21 at 8:48

0 votes

0 answers

51 views

Granite 4.0 H-Micro with LangGraph create_react_agent not invoking tools

I'm trying to use IBM Granite 4.0 H-Micro-Q2_K.gguf locally with LangGraph’s create_react_agent() (or the newer create_agent()) to automatically call tools defined in Python. I have two tools defined: ...

jashan khangura

43

asked Oct 20 at 10:19

0 votes

0 answers

45 views

How do I get a conversation going with a MistralAI Agent using LangChain?

I am learning how to make Agents using MistralAI and LangChain. I have been following the official documentation from LangChain but I got stuck at when I was calling the LLM for the second time. I ...

Gokul Krishna Balaji

1

asked Oct 19 at 8:34

-1 votes

0 answers

45 views

Can agent simulation have an integrated LLM

I'm supposed to develop a mock processor that takes the following payload: { "scope": {"type": "folder", "name": "invoices-2025"}, "messages&...

Eyy boss

113

asked Oct 16 at 6:07

0 votes

1 answer

77 views

Cannot get token logprobs while using langchain structured output

I am using langchain to call an LLM and I want to get the logprobs for each token. I want to get them after doing this: from langchain_openai import ChatOpenAI from pydantic import BaseModel class ...

rikyeah

2,128

asked Oct 15 at 9:53

0 votes

1 answer

32 views

MLXLMCommon in Swift gives error when loading model: noModelFactoryAvailable

I am following the tutorial here: https://github.com/ml-explore/mlx-swift-examples/tree/main/Libraries/MLXLMCommon import MLXLMCommon Task { do { print("Loading model...") ...

sudoExclamationExclamation

9,021

asked Oct 14 at 18:01

0 votes

0 answers

87 views

Firebase Genkit using prompt in chat

I am using chat with genKit library like below: const session = agenticAi.createSession<any>({ initialState: { uid: uid, }, }); const chat = session.chat({ model: googleAI.model('...

Moblize IT

1,342

asked Oct 13 at 23:22

1 vote

0 answers

40 views

How can I make an MCP tool ask the LLM to request missing parameters instead of sending empty strings?

I'm building an MCP server (with spring ai 1.1.0-M3) that exposes a tool for searching contacts in my internal system. Here’s a simplified version of the tool method: public class myTool { @...

gs fs

45

asked Oct 13 at 13:46

0 votes

0 answers

51 views

AgentDojo repo on Github: I can't reproduce the table results, because the Security results are always 0.00%

I tried running this command with GPT_4O_MINI_2024_07_18, so that the rate limits wouldn't block the execution before the results are printed python -m agentdojo.scripts.benchmark -s workspace -ut ...

Saif Farid

1

asked Oct 13 at 10:08

0 votes

1 answer

39 views

Getting "FATAL FIPS SELFTEST FAILURE" when importing qwen-vl-utils

When I run from qwen_vl_utils import process_vision_info in my Python environment, I get crypto/fips/fips.c:154: OpenSSL internal error: FATAL FIPS SELFTEST FAILURE Aborted I'm using OpenSSL 3.3.2 ...

Anson Savage

351

asked Oct 8 at 17:08

1 vote

0 answers

149 views

Installation error while installing GroundingDino

I am trying to install the GroundingDino as instructed in the README file of their official GitHub repo, but I am facing the error below: Obtaining file:///home/kgupta/workspace/Synthetic_Data_gen/...

Mahfuzur Mahim Rahman

41

asked Oct 8 at 12:53

0 votes

0 answers

41 views

How to reduce latency in a context-aware chatbot with chart + dataset inputs

I’m building a chatbot for my research project that helps participants understand a chart. The chatbot runs on a website built with React. My goal is to make it feel just like using ChatGPT in the ...

Hesper

161

asked Oct 4 at 4:56

0 votes

1 answer

138 views

Error while deploying, but not in local: "crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs"

When I initialize a Crew in Azure, I get an error: crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs" followed by lots of uuids. from crewai import ...

Ray

3,994

asked Oct 3 at 23:15

0 votes

1 answer

97 views

What does total_token_count means from gemini response?

I'm trying to understand how total_token_count is calculated for the gemini-2.5-flash model. The official documentation suggests total_token_count = prompt_token_count + candidates_token_count, but my ...

Wonjune Shin

31

asked Oct 1 at 13:11

-1 votes

1 answer

57 views

How to reconstruct sentences from mean-pooled embeddings (embedding inversion) [closed]

I’m working on a research problem where I want to reconstruct or paraphrase sentences starting from synthetic embeddings. The embeddings are global (mean-pooled), not token-level, so they lose ...

melissa mattos

1

asked Sep 30 at 0:02

0 votes

0 answers

66 views

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

After failing to make the QwenImageEditPlus run (https://huggingface.co/spaces/discord-community/README/discussions/9#68d260e32053323e6bfab30c), I tried a different approach (thanks to all the example ...

Siladittya

1,215

asked Sep 24 at 7:36

0 votes

0 answers

31 views

Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?

I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document. I read that after having reached maximum context, I should restart the ...

user305883

1,739

asked Sep 23 at 23:35

0 votes

1 answer

66 views

Schema Guided Reasoning to dynamically force LLM to select one of the values from list

I am trying to use a Schema Guided Reasoning approach to develop a system which will classify for me product into one of the categories from predefined list based on its name\description. For that I ...

Maksim Khaitovich

4,792

asked Sep 22 at 15:08

1 vote

1 answer

1k views

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...

Quinten

42.7k

asked Sep 19 at 8:39

5 votes

1 answer

243 views

How to stream LLM responses in a Shiny app instead of waiting for full output?

I am creating a Shiny app in R. One of its features is to display processed text within the app, and then allow the user to click a button to send that text to an LLM (Large Language Model). The ...

LIANG Chen

149

asked Sep 17 at 15:41

1 vote

0 answers

59 views

Why does hugging face trainer still recognize different device between my encoder & classifier head even after I manually map it on the same device

I encounterd this error while trying to run hugging face trainer on a multi-gpu. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! I use a ...

Dwi Rezky Fahlan

11

asked Sep 15 at 3:45

3 votes

0 answers

47 views

Azure ML Endpoint Fails with HFValidationError even after using pathlib.Path

I am trying to deploy a fine-tuned Mistral-7B model on an Azure ML Online Endpoint. The deployment repeatedly fails during the init() phase of the scoring script with an huggingface_hub.errors....

User

157

asked Sep 12 at 5:05

0 votes

0 answers

132 views

Zep Graphiti - core - Adding Episode fails the LLM structured output

On the ingestion part to the graph db, I pass a json file, as an episode, custom entities (and edges), using gemini api, but I get some discrepancy on the structured output, like so: LLM generation ...

George Petropoulos

448

asked Sep 7 at 21:11

0 votes

0 answers

49 views

How to send extra headers from RAGFlow Agent to a Spring Boot MCP server tool call?

I am using RAGFlow connected to a Spring Boot MCP server. My agent flow is simple: Begin node → collects inputs (auth_token, tenant_id, x_request_status) Agent (gpt-4o) → connected to MCP Tool (server)...

Ishan Garg

729

asked Sep 4 at 17:45

0 votes

0 answers

48 views

How to enforce Claude to respond with either text or tool_use, but not both at the same time?

I’m already familiar with the different options for the tool_choice parameter (auto, tool, any, none). That’s not what my question is about. I always let Claude decide whether to use a tool, and if so,...

Alexander Popov

25.5k

asked Sep 3 at 15:23

0 votes

0 answers

52 views

How to dynamically register Language Model Tools in a VS Code extension?

I want to register different Language Model Tools in my VS Code extension depending on a specific configuration. Currently, the way VS Code works seems to be: I need to declare the tool in package....

leonard520

1

asked Sep 3 at 14:56

0 votes

0 answers

92 views

How to handle stateful MCP connections in a load-balanced agentic application?

I'm building an agentic application where users interact with AI agents. Here's my setup: Current Architecture: Agent supports remote tool calling via MCP (Model Context Protocol) Each user ...

Atharva

1

asked Sep 3 at 10:56

0 votes

0 answers

188 views

Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead

Description: I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...

Promit Dey Sarker Arjan

1

asked Sep 3 at 10:17

1 vote

1 answer

180 views

How to upsert documents to Flowise document store using API Loader instead of file upload?

’m currently working with Flowise and trying to update my document store via API requests. I followed the documentation here. Specifically, I want to try Scenario 2: In the same document store, ...

傅靖茹

51

asked Aug 29 at 14:54

0 votes

1 answer

84 views

Can't connect to Ollama hosted locally from python script

I am building ETL using LLM to extract some information. I have ollama installed locally. I am on Macbook M4 Max. I don't understand why I have this error from my worker. ads-worker-1 | 2025-08-28 15:...

Mael Fosso

400

asked Aug 28 at 15:24

0 votes

0 answers

73 views

How to accelerate my corpus embedding to the chromadb

I have the corpus.jsonl which has 6.5gb storage.And i use the one h100 gpu to embedding the corpus to the chromadb,but it seems very slowly.I want to find how can i accelerate the progress(gpu,cpu,io)....

YiJun Sachs

23

asked Aug 27 at 1:52

1 vote

0 answers

274 views

Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python

I tried to install llama-cpp-python via pip, but I have an error with the installation The command that I wrote: CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...

ZZISST

21

asked Aug 26 at 20:53

0 votes

1 answer

44 views

Agent not using both tool specs

I've created an Agent using llama index. When I specify only one tool spec, it works correctly. However, when I try to use two, one is ignored. import asyncio import logging import os from dotenv ...

UserX

101

asked Aug 24 at 18:36

0 votes

0 answers

50 views

Smolagents CodeAgent gets error from correct code

the Smolagents CodeAgent is given a task to convert a string into markdown table format. It successfully captures the related part of the string and writes the code for markdown table formatting. ...

aearslan

176

asked Aug 24 at 18:32

0 votes

1 answer

380 views

How to store Gemini 2.5 Flash + MCP multi-turn conversation data (including tool calls and responses)?

I’m building a multi-turn conversation system using Gemini 2.5 Flash with thinking and Model Context Protocol (MCP) tool calls. With OpenAI models, I usually store conversation history as an array of ...

6zL

21

asked Aug 13 at 2:09

0 votes

0 answers

211 views

TypeError: PPOTrainer.init() got an unexpected keyword argument 'config'

I am trying to initialize a PPO_trainer but have issues. from trl import PPOTrainer, PPOConfig ppo_config = PPOConfig( batch_size=4, learning_rate=1e-5, mini_batch_size=2, use_cpu=...

m0ss

472

asked Aug 6 at 15:43

2 votes

3 answers

139 views

FastAPI endpoint stream LLM output word for word

I have a FastAPI endpoint (/generateStreamer) that generates responses from an LLM model. I want to stream the output so users can see the text as it’s being generated, rather than waiting for the ...

sander

1,490

asked Aug 6 at 8:05

0 votes

0 answers

73 views

Is there a way to refactor a Maven submodule's pom.xml to be completely "standalone" from its parent?

I'm conducting a study in which I'm examining how effective LLMs are at translating code between frameworks. Here is one of the datasets I'm using to test this: https://github.com/eclipse-ee4j/...

Advait

46

asked Aug 3 at 19:52

0 votes

0 answers

54 views

Getting always 0 in agent evaluation with Agent Goal accuracy In RAGAS AI Framework

I am using ragas ==0.2.15. I have created an Investment Research Assistant in LangChain-based conversational AI system designed to guide users in making informed investment decisions. It supports real-...

Divya M

9

asked Jul 31 at 1:20

Collectives™ on Stack Overflow