Skip to content

[Bug]: search memory likes to return more assistant facts, less user facts #1008

@tomw-mv

Description

@tomw-mv

Describe the bug

search memory likes to return more assistant facts, less user facts

  • thus it returns second hand info instead of first hand info, leading to missing facts and wrong answers
  • thus timestamps are wrong for temporal questions, leading to wrong answers
  • consider doing separate search for user facts (firsthand info) and increase its score, and decrease score for assistant facts (secondhand info)

Steps to reproduce

run longmemeval with s dataset

e.g. multi-session
Question gpt4_59c863d7

Issue:
search memory did not return facts on:
Revell F-15 Eagle
69 Camaro

search memory did not return user content for these, only assistant content:
B-29 bomber
German Tiger

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q01-10/test4-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. multi-session
Question c4a1ceb8

Issue:
search memory only returned assistant facts on fresh lime juice, no user facts which contained the answer
answer_llm found the word grapefruit in an assistant fact and incorrectly thinks user said it

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q11-20/test1-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. multi-session
Question 28dc39ac

Issue:
search memory did not return Hyper Light Drifter, which took me 5 hours
search memory did not return Celeste, which took me 10 hours
search memory returned assistant fact on The Last of Us Part II, but not any user facts

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q11-20/test1-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. multi-session
Question 88432d0a

Issue:
search memory did not return any facts for rustic Italian bread
search memory did not return any facts for batch of cookies
search memory returned assistant facts but no user facts on sourdough starter

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q11-20/test1-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. multi-session
Question d23cf73b

Issue:
search memory did not return any facts on Indian cuisine
search memory returned assistant facts about many suggestions
answer_llm tooks assistant suggestions as user facts

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q21-30/test2-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. temporal-reasoning
Question gpt4_d6585ce8

Issue:
search memory returned assistant facts for outdoor concert but no user facts which does not help answer the question
search memory returned assistant facts for Queen but no user facts which does not help answer the question
search memory returned assistant facts for Brooklyn but no user facts so the timestamp is wrong
search memory did not return the required fact for jazz night so the timestamp is wrong

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q31-40/test3-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. temporal-reasoning
Question gpt4_f420262c

Issue:
search memory return only assistant facts but no user facts for American so timestamp is wrong
search memory return only assistant facts but no user facts for JetBlue so timestamp is wrong
search memory return only assistant facts but no user facts for Delta so timestamp is wrong
search memory did not return facts for United

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q31-40/test3-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g. knowledge-update
Question 945e3d21

Issue:
search memory did not return any facts on yoga from the user, only returned facts on yoga from assistant

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q01-10/test4-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================

more examples are in the results spreadsheet

https://memverge-my.sharepoint.com/:x:/p/tom_wong/IQAHq1HfgGCaS568iLbHKRRtAa70uBHPF0LcHpIabrlxPNE?e=lTtXPp

Expected behavior

see results spreadsheet, link is directly above.
see tabs "mmai pass" and "mmai fail"

  • when answer is correct, there is 50:50 ratio of user and assistant facts

  • when answer is wrong, there is 44:56 ratio of user and assistant facts

  • more assistant facts results in more wrong answers

  • when answer is correct, there is 43:57 ratio where assistant facts appear towards top of the search memories list

  • when answer is wrong, there is 0:100 ratio where assistant facts appear towards top of the search memories list

  • user facts towards the top of the search memories list results in more correct answers

Environment

build is main 01/14 commit dacf9a6

Additional context

Edwin has a suggestion that may help add more user facts:
one way I found to increase the LongMemEval score is to prepend the question with "User: "
So if a question is "What did I eat for breakfast?", then the query is "User: What did I eat for breakfast?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions