Improvements and new features for indexing#1290
Conversation
It deindex all constants whose path is a suffix of PATH. Question: what are useful CLI args for deindex? E.g. - a list of filenames - a .pkg file (or the fact that it must use the .pkg file) - a user-provided PATH like it is now? Moreover notice that there is currently no check that PATH is a well-formed PATH and A.B matches A.BC as a prefix.
Now the query 'concl >= _ | ".*vsum.*"' terminates on the medium-sized HOL light extraction
a query like 'concl = _' was returning justifications of the form _ found in hypothesis Now only really relevant justifications are returned Signed-off-by: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
1. during indexing using --source=PATH one can add indexing information mapping LP identifiers to Coq identifiers and locations 2. when the search results are shown, the Coq code is also printed if the LP identifier is found in the map in point 1 The code to generate the file used in 1 should belong to HOL2DK. Right now we have hacked some best-effort shell script that we will commit in the HOL2DK_indexing repository
It used to match only prefixes of strings, which was quite weird
Note: there's no easy way to implement the same check for rules. However it is unlikely to index twice a file that contains only rewriting rules.
E.g. in last rewriting rule in tests/OK/coercions.lp
1. we used a Timed.ref to update the indexes with the local development
(i.e. just the files that have been required, not the others) so that it
is possible to query the local development
2. when doing "require ... as Foo" the user can use Foo.X in the queries;
(but not regular expressions involving Foo, but that seems reasonable)
We have introduced some "hakish" code here and there to be reviewed. In
particular:
1. we store in Common/Mode if we are running LSP
2. we detected that the current LSP buffer has "fname" set to an URI
"file:///". We use this feature to distinguish between the buffer
and the filesystem so that, when showing query results to the user,
if the text comes from the buffer we can retrieve it
3. we use a finely tuned combination of Stdlib.ref/Timed.ref to "remember"
things that would be forgotten. E.g.:
a) we remember the LSP buffer content *after* the end of the compilation
b) we remember the index *after* the end of the indexing of a single
file
4. we used a maison stream-like representation of file contents to be able
to retrieve the text to be shown to the user both from the filesystem
or from strings (i.e. the content of the LSP buffer)
5. since indexing is now performed even during compilation when things
enter a signature in LSP mode, we created a huge circular dependency
in the modules that we break using multiple callbacks and Stdlib.ref
that are set here and there. Maybe one can do (much) better.
In particular pure now depends on tool.
- doc and welcome page of websearch to be updated with the new syntax
- in "lambdapi search" and VScode -> and forall cannot be used any more
- the new syntax allows all the LP syntax plus:
t ::= ... | forall binders, t | exists binders, t | fun binders => t
| t -> t
- the binders used in the new entries allow also "x ... z : t" that is
currently rejected by LP (but it is allowed in Coq)
TODO: but for exists, all the Coq's stdlib notations are NOT in place
(e.g. arithmetic/logic connectives, numbers, ...). How should we add them?
Maybe one possible path is to have "special" notation files that can be fed
to websearch when it starts. As for rewriting rules for normalization, the
notation files are supposed to add notation to symbols that have NOT been
put in the environment. It may be feasible: the file that calls Pratter
already uses our find_sym to resolve symbols!
- this basically removes ALL overloaded problems from the HOL export - the code is written so that the two normalized symbols may be unequal (in the sense of =) but have the same path. Maybe one should investigate why this is necessary, but for the time being it is safe to ignore. [ We are probably comparing a real symbol with a bogus one or even two bogus ones that differ ]
…kept Example: Real was no longer found because it must become R and not Real
In particular one can require (and open) theory_hol.lp and then require (and open) a notation.lp file to add infix notation.
…ase rewrite the query replacing...
fblanqui
left a comment
There was a problem hiding this comment.
Here are some first remarks.
|
|
||
| Runs a query between double quotes against the index file | ||
| ``~/.LPSearch.db``. See :doc:`query_language` for the query language | ||
| ``~/.LPSearch.db`` updated with current development and required files. See :doc:`query_language` for the query language |
There was a problem hiding this comment.
"updated with current development and required files": what does that mean?
-> updated with the lp or dk files passed as argument, as well as their dependencies?
There was a problem hiding this comment.
Maybe needs to be reformulated :
- current development is the set of items introduced in the file being currently under edition
- required files : are the files required in the beginning of the file under edition (with the require command)
There was a problem hiding this comment.
By the way, note that queries do not need to be between double quotes anymore with the new parser.
| built from the format [fmt] (provided the necessary arguments). *) | ||
| (** [fatal popt fmt] raises the [Fatal(popt,msg,more)] exception, in which [msg] is | ||
| built from the format [fmt] (provided the necessary arguments). | ||
| [more] continues the error message and is printed in normal format instead of |
There was a problem hiding this comment.
There is no more argument in the following function. Only in fatal_no_pos there is one. Why?
There was a problem hiding this comment.
Actually, the more (renamed to err_desc) is specifically useful in the case of a particular fatal_no_pos exception.
I believe there is no real need for such argument for the present moment. We can add it later if ever needed.
There was a problem hiding this comment.
Ok, but you need to document that.
| ------- | ||
|
|
||
| * ``--path``: indicates the prefix of symbols paths to be removed from the index. | ||
| The path must be dot (`.`) and is not checked for well formness (i.e. A.B matches A.BC). |
There was a problem hiding this comment.
Not clear: why "the path must be dot"? Typo: 'well formness" -> well-formedness.
There was a problem hiding this comment.
reformulated : The path must use dot ('.') notation
| Q ::= B | Q,Q | Q;Q | Q|PATH | ||
| B ::= WHERE HOW GENERALIZE? PATTERN | ||
| PATH ::= << string >> | ||
| PATH ::= << string >> | "<< regexp>>" |
There was a problem hiding this comment.
<< string >> should be replaced by non-qualified identifier
<< regexp>> should be replaced by <<regexp>> (no spurious space)
There was a problem hiding this comment.
I am not sure to understand what you mean by :<< string >> should be replaced by non-qualified identifier.
Here string could be math or math.arithmetics
There was a problem hiding this comment.
PATH ::= <identifier> | "<regexp>"
| | Fatal(Some(None) ,m) -> fatal pos "%s" m | ||
| | Fatal(Some(Some(_)),_, _) as e -> raise e | ||
| | Fatal(None ,m, _) -> fatal pos "%s" m | ||
| | Fatal(Some(None) ,m, _) -> fatal pos "%s" m |
There was a problem hiding this comment.
Why the third argument of Fatal is ignored?
There was a problem hiding this comment.
This argument is intended to provide extra information on the error which will be provided in a different format that the error itself. Presently it is only used by the search engine to show the symbols in the search query which are ambiguous.
There was a problem hiding this comment.
I understand but this function should not ignore that argument. Why not doing the same thing as in the search engine?
There was a problem hiding this comment.
Right. I added the argument so it could be used in the future.
| let cya fmt = colorize Cya fmt | ||
|
|
||
| let default (fmt:('a, 'b, 'c, 'd, 'e, 'f) format6) = fmt | ||
|
|
There was a problem hiding this comment.
Shouldn't this be in extra.ml (this has nothing to do with colors)?
| let path = String.sub uri 7 (String.length uri - 7) in | ||
| Some(Pure.initial_state path), [] | ||
| with Error.Fatal(_pos, msg) -> | ||
| with Error.Fatal(_pos, msg, _) -> |
| let filter ppf f = | ||
| match f with | ||
| | Path s -> out ppf " | %s" s | ||
| | RegExp _ -> out ppf "(;%s;)@." "regular expression" (*FIXME?*) |
There was a problem hiding this comment.
The output should be parsable. The argument of RegExp should be a string, and it should be output. See my comment on syntax.ml.
| | Union | ||
| type filter = | ||
| | Path of string | ||
| | RegExp of Str.regexp |
There was a problem hiding this comment.
The syntax should reflect the user input so that it can be easily printed. Str.regexp is an abstract data type and cannot be printed. The argument of RegExp should be a string (without the surrounding double quotes). See how are handled STRINGLIT.
| | exception Fatal(Some(None) , _ , "") -> | ||
| assert false | ||
| | exception Fatal(None , _ ) -> | ||
| | exception Fatal(None , _ , "") -> |
There was a problem hiding this comment.
Why matching on the empty string? This looks like a bug as this does not catch all possible cases.
| | Fatal(None , _ ) -> assert false | ||
| | Fatal(Some(Some(pos)), msg, "") -> List.rev Stdlib.(!cmds), Some(pos, msg) | ||
| | Fatal(Some(None) , _ , "" ) -> assert false | ||
| | Fatal(None , _ , "" ) -> assert false |
There was a problem hiding this comment.
Why matching on the empty string? This looks like a bug as this does not catch all possible cases.
|
|
||
| (* Tail recursive implementation of List.append for | ||
| OCaml < 5.1 *) | ||
| let (@) l1 l2 = List.rev_append (List.rev l1) l2 |
| ; (rule | ||
| ; (targets websearch.ml) | ||
| ; (deps websearch.eml.ml) | ||
| ; (action (run dream_eml %{deps} --workspace %{workspace_root}))) |
Bugs to be fixed:
Todo:
New features (to be documented too):