Skip to content

Improvements and new features for indexing#1290

Open
sacerdot wants to merge 66 commits into
Deducteam:masterfrom
sacerdot:indexing_BO
Open

Improvements and new features for indexing#1290
sacerdot wants to merge 66 commits into
Deducteam:masterfrom
sacerdot:indexing_BO

Conversation

@sacerdot
Copy link
Copy Markdown
Collaborator

@sacerdot sacerdot commented Jul 17, 2025

Bugs to be fixed:

  • "lambdapi websearch --require HOLLight.theory_hol" triggers an assert failure due to changes to recursive require/open

Todo:

  • docs and changelog need to be updated
  • test if search in vscode still works with latest version
  • use lplib/color.ml in place of our color management of the output [ but do not lose our code to check if we are in lsp_mode or targeting a tty ]
  • restore the black color after the error message in case of Fatal exception
  • commit the very nice new look&feel of websearch

New features (to be documented too):

  1. new command "lambdapi deindex"
  2. the current development (i.e. the set of currently required files) is now indexed and deindexed automatically
  3. new filter | "regexp" to restrict the set of results using a regexp over the full qid
  4. "lambdapi websearch" now uses a rocqParser that accepts (also) the syntax of Rocq for fun/forall/exists; it also remaps /\ / ~ to the corresponding LP tokens for logical connectives
  5. new flag "--source" to load a map from LP identifiers to Rocq sources (file + line); the Rocq sources are also shown by websearch
  6. the "--require" flag can now be passed multiple times (no longer necessary since Frederic has implemented recursive open, but nice to have anyway; see bug to be fixed above though)
  7. when an identifier is ambiguous, but after normalization it is no more, the query is now considered non ambiguous
  8. "lambdapi search" now streams the result
  9. Plac in rewriting rule is now supported (handled like a variable... hoping that it makes sense)
  10. bug fix: many spurious justifications were included
  11. bug fix: avoid Stack_overflow by rewriting a few functions using tail recursion

sacerdot added 30 commits July 3, 2025 16:44
It deindex all constants whose path is a suffix of PATH.

Question: what are useful CLI args for deindex? E.g.
- a list of filenames
- a .pkg file (or the fact that it must use the .pkg file)
- a user-provided PATH like it is now?

Moreover notice that there is currently no check that PATH is a well-formed
PATH and A.B matches A.BC as a prefix.
Now the query 'concl >= _ | ".*vsum.*"' terminates on the medium-sized HOL
light extraction
a query like 'concl = _' was returning justifications of the form

_ found in hypothesis

Now only really relevant justifications are returned

Signed-off-by: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
1. during indexing using --source=PATH one can add indexing information
   mapping LP identifiers to Coq identifiers and locations
2. when the search results are shown, the Coq code is also printed if
   the LP identifier is found in the map in point 1

The code to generate the file used in 1 should belong to HOL2DK. Right now
we have hacked some best-effort shell script that we will commit in the
HOL2DK_indexing repository
It used to match only prefixes of strings, which was quite weird
Note: there's no easy way to implement the same check for rules.
However it is unlikely to index twice a file that contains only
rewriting rules.
E.g. in last rewriting rule in tests/OK/coercions.lp
1. we used a Timed.ref to update the indexes with the local development
 (i.e. just the files that have been required, not the others) so that it
 is possible to query the local development
2. when doing "require ... as Foo" the user can use Foo.X in the queries;
 (but not regular expressions involving Foo, but that seems reasonable)

We have introduced some "hakish" code here and there to be reviewed. In
particular:

1. we store in Common/Mode if we are running LSP
2. we detected that the current LSP buffer has "fname" set to an URI
   "file:///". We use this feature to distinguish between the buffer
   and the filesystem so that, when showing query results to the user,
   if the text comes from the buffer we can retrieve it
3. we use a finely tuned combination of Stdlib.ref/Timed.ref to "remember"
   things that would be forgotten. E.g.:
   a) we remember the LSP buffer content *after* the end of the compilation
   b) we remember the index *after* the end of the indexing of a single
      file
4. we used a maison stream-like representation of file contents to be able
   to retrieve the text to be shown to the user both from the filesystem
   or from strings (i.e. the content of the LSP buffer)
5. since indexing is now performed even during compilation when things
   enter a signature in LSP mode, we created a huge circular dependency
   in the modules that we break using multiple callbacks and Stdlib.ref
   that are set here and there. Maybe one can do (much) better.
   In particular pure now depends on tool.
- doc and welcome page of websearch to be updated with the new syntax
- in "lambdapi search" and VScode -> and forall cannot be used any more
- the new syntax allows all the LP syntax plus:
  t ::= ... | forall binders, t | exists binders, t | fun binders => t
            | t -> t
- the binders used in the new entries allow also "x ... z : t" that is
  currently rejected by LP (but it is allowed in Coq)

TODO: but for exists, all the Coq's stdlib notations are NOT in place
(e.g. arithmetic/logic connectives, numbers, ...). How should we add them?
Maybe one possible path is to have "special" notation files that can be fed
to websearch when it starts. As for rewriting rules for normalization, the
notation files are supposed to add notation to symbols that have NOT been
put in the environment. It may be feasible: the file that calls Pratter
already uses our find_sym to resolve symbols!
- this basically removes ALL overloaded problems from the HOL export
- the code is written so that the two normalized symbols may be
  unequal (in the sense of =) but have the same path. Maybe one should
  investigate why this is necessary, but for the time being it is safe
  to ignore. [ We are probably comparing a real symbol with a bogus one
  or even two bogus ones that differ ]
…kept

Example: Real was no longer found because it must become R and not Real
In particular one can require (and open) theory_hol.lp and then
require (and open) a notation.lp file to add infix notation.
Copy link
Copy Markdown
Member

@fblanqui fblanqui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some first remarks.

Comment thread src/pure/pure.ml Outdated
Comment thread src/pure/pure.ml Outdated
Comment thread CHANGES.md
Comment thread doc/options.rst Outdated
Comment thread doc/options.rst Outdated
Comment thread doc/queries.rst Outdated

Runs a query between double quotes against the index file
``~/.LPSearch.db``. See :doc:`query_language` for the query language
``~/.LPSearch.db`` updated with current development and required files. See :doc:`query_language` for the query language
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"updated with current development and required files": what does that mean?
-> updated with the lp or dk files passed as argument, as well as their dependencies?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe needs to be reformulated :

  • current development is the set of items introduced in the file being currently under edition
  • required files : are the files required in the beginning of the file under edition (with the require command)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, note that queries do not need to be between double quotes anymore with the new parser.

Comment thread doc/query_language.rst Outdated
Comment thread src/cli/config.ml Outdated
Comment thread src/common/error.ml
Comment thread src/common/error.ml Outdated
Comment thread src/common/error.ml Outdated
built from the format [fmt] (provided the necessary arguments). *)
(** [fatal popt fmt] raises the [Fatal(popt,msg,more)] exception, in which [msg] is
built from the format [fmt] (provided the necessary arguments).
[more] continues the error message and is printed in normal format instead of
Copy link
Copy Markdown
Member

@fblanqui fblanqui Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no more argument in the following function. Only in fatal_no_pos there is one. Why?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the more (renamed to err_desc) is specifically useful in the case of a particular fatal_no_pos exception.
I believe there is no real need for such argument for the present moment. We can add it later if ever needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but you need to document that.

Comment thread src/common/error.ml Outdated
Comment thread src/common/mode.ml Outdated
Copy link
Copy Markdown
Member

@fblanqui fblanqui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Alidra. A few things still need to be fixed.

Comment thread doc/options.rst Outdated
-------

* ``--path``: indicates the prefix of symbols paths to be removed from the index.
The path must be dot (`.`) and is not checked for well formness (i.e. A.B matches A.BC).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear: why "the path must be dot"? Typo: 'well formness" -> well-formedness.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reformulated : The path must use dot ('.') notation

Comment thread doc/query_language.rst Outdated
Q ::= B | Q,Q | Q;Q | Q|PATH
B ::= WHERE HOW GENERALIZE? PATTERN
PATH ::= << string >>
PATH ::= << string >> | "<< regexp>>"
Copy link
Copy Markdown
Member

@fblanqui fblanqui Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<< string >> should be replaced by non-qualified identifier
<< regexp>> should be replaced by <<regexp>> (no spurious space)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand what you mean by :<< string >> should be replaced by non-qualified identifier.

Here string could be math or math.arithmetics

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATH ::= <identifier> | "<regexp>"

Comment thread src/handle/command.ml
| Fatal(Some(None) ,m) -> fatal pos "%s" m
| Fatal(Some(Some(_)),_, _) as e -> raise e
| Fatal(None ,m, _) -> fatal pos "%s" m
| Fatal(Some(None) ,m, _) -> fatal pos "%s" m
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the third argument of Fatal is ignored?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is intended to provide extra information on the error which will be provided in a different format that the error itself. Presently it is only used by the search engine to show the symbols in the search query which are ambiguous.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand but this function should not ignore that argument. Why not doing the same thing as in the search engine?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I added the argument so it could be used in the future.

Comment thread src/lplib/color.ml Outdated
let cya fmt = colorize Cya fmt

let default (fmt:('a, 'b, 'c, 'd, 'e, 'f) format6) = fmt

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in extra.ml (this has nothing to do with colors)?

Comment thread src/lsp/lp_doc.ml
let path = String.sub uri 7 (String.length uri - 7) in
Some(Pure.initial_state path), []
with Error.Fatal(_pos, msg) ->
with Error.Fatal(_pos, msg, _) ->
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the 3rd arg is ignored?

Comment thread src/parsing/pretty.ml
let filter ppf f =
match f with
| Path s -> out ppf " | %s" s
| RegExp _ -> out ppf "(;%s;)@." "regular expression" (*FIXME?*)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output should be parsable. The argument of RegExp should be a string, and it should be output. See my comment on syntax.ml.

Comment thread src/parsing/syntax.ml
| Union
type filter =
| Path of string
| RegExp of Str.regexp
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax should reflect the user input so that it can be easily printed. Str.regexp is an abstract data type and cannot be printed. The argument of RegExp should be a string (without the surrounding double quotes). See how are handled STRINGLIT.

Comment thread src/pure/pure.ml
| exception Fatal(Some(None) , _ , "") ->
assert false
| exception Fatal(None , _ ) ->
| exception Fatal(None , _ , "") ->
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why matching on the empty string? This looks like a bug as this does not catch all possible cases.

Comment thread src/pure/pure.ml
| Fatal(None , _ ) -> assert false
| Fatal(Some(Some(pos)), msg, "") -> List.rev Stdlib.(!cmds), Some(pos, msg)
| Fatal(Some(None) , _ , "" ) -> assert false
| Fatal(None , _ , "" ) -> assert false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why matching on the empty string? This looks like a bug as this does not catch all possible cases.

Comment thread src/tool/indexing.ml

(* Tail recursive implementation of List.append for
OCaml < 5.1 *)
let (@) l1 l2 = List.rev_append (List.rev l1) l2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in lplib/list.ml.

Comment thread ressources/dune Outdated
; (rule
; (targets websearch.ml)
; (deps websearch.eml.ml)
; (action (run dream_eml %{deps} --workspace %{workspace_root})))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove if not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants