Skip to content

[Bug]: search memory did not return facts even when there is exact word match between the question and the facts #1009

@tomw-mv

Description

@tomw-mv

Describe the bug

run longmemeval, analyze wrong answers
examples are in the results spreadsheet

https://memverge-my.sharepoint.com/:x:/p/tom_wong/IQAHq1HfgGCaS568iLbHKRRtAa70uBHPF0LcHpIabrlxPNE?e=lTtXPp

Steps to reproduce

run longmemeval with s dataset

e.g.
Question ID: 118b2229

Question:
How long is my daily commute to work?

Facts:
[2023-05-22T21:17:57] User: I've been listening to audiobooks during my daily commute, which takes 45 minutes each way.

Issue:

  • Search memory did not return the fact.
  • The phrase "daily commute" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q01-10/test4-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: gpt4_59c863d7

Question:
How many model kits have I worked on or bought?

Facts:
"""[2023-05-20T14:37:49] User: I'm looking for some tips on photo-etching for my new 1/72 scale B-29 bomber model kit. I've never tried it before, but I've seen some amazing results online. By the way, I just got this kit and a 1/24 scale '69 Camaro at a model show last weekend."",

""[2023-05-27T14:39:49] User: I'm looking for some tips on weathering techniques for my model kits. I've been getting back into model building and I recently finished a simple Revell F-15 Eagle kit that I picked up on a whim during a trip to the hobby store in late April."",

""[2023-05-29T18:18:49] User: I'm looking for some tips on weathering techniques for my model tanks. I've been using AK Interactive products, but I'm interested in trying out some new methods. By the way, I also started working on a diorama featuring a 1/16 scale German Tiger I tank, and I'm trying to get the terrain to look as realistic as possible."",

""[2023-05-30T03:21:51] User: I'm looking for some advice on painting metal surfaces for a model kit. I recently finished a Tamiya 1/48 scale Spitfire Mk.V and had to learn some new techniques, but I'm still not entirely happy with the results. Do you have any tips or recommended products that could help me achieve a more realistic finish?"",

""[2023-05-30T03:21:59] User: I'm thinking of trying out enamel washes on my next project, a 1/72 scale B-29 bomber. Do you have any recommendations for enamel washes that work well with acrylic paints?""

B-29 bomber
Revell F-15 Eagle
German Tiger
Spitfire Mk.V
69 Camaro"

Issue:

  • search memory did not return facts on:
    Revell F-15 Eagle
    69 Camaro

search memory did not return user content for these, only assistant content:
B-29 bomber
German Tiger"

  • The phrase "model kit" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q01-10/test4-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: 06f04340

Question:
What should I serve for dinner this weekend with my homegrown ingredients?

Facts:
[2023-05-23T00:28:51] User: I've been using basil and mint in my cooking lately. I've even harvested some cherry tomatoes from my garden. Do you have any suggestions for companion plants that could help my cherry tomatoes grow better?

Issue:
search memory did not return cherry tomatoes which was the only home grown ingredient

  • The phrase "home grown" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q11-20/test1-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: gpt4_a1b77f9c

Question:
How many weeks in total do I spent on reading 'The Nightingale' and listening to 'Sapiens: A Brief History of Humankind' and 'The Power'?

Facts:
"[2022-01-01T20:17:49] User: I'm looking for some book recommendations. I started reading 'The Nightingale' by Kristin Hannah today and I'm really into historical fiction right now. Can you suggest some other books in the same genre that you think I'd enjoy?"",
""[2022-01-15T11:55:49] User: I'm looking for some book recommendations. I just finished reading ""The Nightingale"" by Kristin Hannah today, and I'm still reeling from the emotional impact. I'm open to trying out different genres, so feel free to suggest anything that you think I might enjoy."",
""[2022-02-01T18:17:55] User: That's really helpful, thanks! I think I can apply this to some of my own habits. Speaking of habits, I just started listening to 'Sapiens: A Brief History of Humankind' by Yuval Noah Harari today, and it got me thinking about how humans have developed certain habits and behaviors over time."",
""[2022-03-01T22:44:49] User: I just finished listening to 'Sapiens: A Brief History of Humankind' by Yuval Noah Harari today, and it got me thinking about the impact of technology on human evolution. Can you tell me more about the latest advancements in AI and its potential applications in various industries?"",
""[2022-03-06T19:29:57] User: I'm interested in learning more about the themes of identity, culture, and power in ""The Fifth Season"" by N.K. Jemisin. I've been thinking a lot about how these themes relate to my own experiences as a woman, and I started listening to ""The Power"" by Naomi Alderman today, which got me thinking about how I take certain things for granted. Do you think ""The Fifth Season"" would be a good fit for someone interested in exploring these themes further?"",
""[2022-03-20T05:20:49] User: I'm looking for some book recommendations. I just finished listening to 'The Power' by Naomi Alderman today and it really made me think. I'm interested in exploring more books that challenge my perspectives. Do you have any suggestions?"""

Issue:
"search memory did not return any facts on Sapiens: A Brief History of Humankind
search memory did not return any facts on The Power
search memory returned facts for assistant but missing facts for user which has the required info on The Nightingale
"

  • The phrases "Sapiens: A Brief History of Humankind" and "The Power" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q11-20/test1-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: 2318644b

Question:
How much more did I spend on accommodations per night in Hawaii compared to Tokyo?

Facts:
[2023-05-26T05:01:51] User: Thank you for the information. I stayed in a hostel in Tokyo that cost around $30 per night when I went solo last January, so it's possible for me to find good deals. I'm planning to visit some of the popular tourist spots in Tokyo, such as Shibuya Crossing and the Tokyo Tower. Are there any affordable transportation options available, or should I just take a taxi?

Issue:
search memory missing any facts about Tokyo

  • The word "Tokyo" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q21-30/test2-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: gpt4_2f8be40d

Question:
How many weddings have I attended in this year?

Facts:
"sister:
""content"": ""I think I've got a good idea of how to keep the conversation light and fun. Thanks for the tips! By the way, speaking of weddings, my sister's wedding was just amazing, and I'm still on a high from it. We had such a great time planning it together, from choosing the dress to deciding on the menu. It was really special to be a part of it.""

jen and tom:
""content"": ""I'm planning my own wedding and I was wondering if you could give me some tips on how to choose the perfect venue. By the way, I just got back from a friend's wedding last weekend, and it was amazing - the bride, Jen, looked stunning in her bohemian-inspired dress, and her husband, Tom, was clearly smitten with her. It was at a rustic barn in the countryside, and it was so cozy and relaxed."""

rachel:
""content"": ""That's really helpful, thanks! I'm thinking of having a smaller, intimate ceremony, and I was wondering if you could suggest some ways to make it feel more personal and special for our guests. My cousin Rachel's wedding at the vineyard was really lovely, and I think part of what made it so special was the fact that it was a family affair - I was a bridesmaid, and it was great to catch up with family members I hadn't seen in years."""

Issue:
"search results missing rachel wedding

  • The word "wedding" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q11-20/test1-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: gpt4_d12ceb0e

Question:
What is the average age of me, my parents, and my grandparents?

Facts:
"[2023-05-22T10:08:51] User: I'm considering going back to school to get a master's degree, but I'm not sure what field I want to pursue. My grandma is 75 and my grandpa is 78, and seeing them slow down has made me think about my own future and what I want to achieve in my career. Can you suggest some popular master's programs that would be a good fit for someone in their early thirties looking to make a career change?"",

""[2023-05-23T21:27:51] User: I'm trying to get healthier and wondering if you can recommend some exercises that are suitable for people my age. By the way, my parents are getting older too - my mom is 55 and my dad is 58, so I'm trying to set a good example for them as well."",

""[2023-05-26T10:09:49] User: I'm trying to get back into a regular exercise routine, can you recommend some workouts that are suitable for someone my age? By the way, I just turned 32 on February 12th, so I'm feeling a bit more motivated to take care of myself now.""

75
78
55
58
32
avg=59.6"

Issue:
"search memory did not return fact on parents ages
answer_llm use some random numbers for parents ages"

  • The word "parent" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q21-30/test2-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================
e.g.
Question ID: gpt4_d84a3211

Question:
How much total money have I spent on bike-related expenses since the start of the year?

Facts:
"""[2023-05-05T00:32:51] User: Actually, I remember taking my bike in for a tune-up on April 20th because the gears were getting stuck. The mechanic told me I needed to replace the chain, which I did, and it cost me $25. While I was there, I also got a new set of bike lights installed, which were $40. Can you help me create a bike maintenance schedule to ensure I don't miss anything important?"",

""[2023-05-05T05:37:55] User: That's great information about bike insurance. I'll definitely look into it. Speaking of my bike, I recently got a new set of bike lights installed, which were $40. They're really bright and make me feel a lot safer on the roads, especially since I've been doing some early morning rides. Do you have any tips on how to stay safe while cycling in low-light conditions?"",

""[2023-05-05T06:23:55] User: I've had good experiences with the local bike shop downtown where I bought my Bell Zephyr helmet for $120. They did a great job with the tune-up last time, and the mechanic was knowledgeable and friendly. I might just go back there for my next tune-up."",

""[2023-05-05T23:04:59] User: That's a great list of tips! I'd like to add that I recently got a new set of bike lights installed, which were $40, and it's made a huge difference for my early morning rides. It's something to consider if you'll be riding in low-light conditions during your trip.""
"

Issue:
search memory missing Zephyr helmet for $120

  • The word "bike" exactly matches.

Files on aep4:
cd "/automation/atf/tomw/tmp/lme_compare/s full_ctx vs mmai vs gold_facts q01-10/test4-mmai"
vi "memmachine_search_eval_results_gpt-4.1-mini.json"

========================================

Expected behavior

If facts have exact word matches, consider the facts and give the facts higher score so they can be returned by search memories.

Environment

build is main 01/14 commit dacf9a6

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions