fix: Adding metadata to document chunks#3184
Conversation
|
Hey @dmartinol! In reference to your comment about |
Thanks for pointing this out @courtneypacheco ! |
091aa4b to
126286c
Compare
1ca00f7 to
db7ba31
Compare
db7ba31 to
f0b0830
Compare
|
@jwm4 updated the list of excluded fields as per your request |
f0b0830 to
6c499ea
Compare
cdoern
left a comment
There was a problem hiding this comment.
Looks good to me. Do we want a test for this? Or is that already covered?
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
added UT, thanks |
Adding metadata to document chunks, following guidance from
docling-haystackpackage.Reference code here.
Note: we cannot integrate the package as-is because it depends on
docling = "^2.9.0"while we are forced todocling>=2.4.2,<=2.8.3frominstructlab-sdg.Metadata fields:
All the DocMeta fields apart from:
schema_name,versionanddoc_itemsIssue resolved by this Pull Request:
Closes #3192
Verifying the generated schema:
Sequence of commands to validate the schema of the default in-memory store:
Sample output (edited to show only the relevant fields):
And a snippet of a chunk metafdata from the JSON document:
Checklist:
conventional commits.