- Code
- Plans
- Try inserting outline paths into separators/overlays in occur buffers
- Try org-map-entries
- Clean up persistent-action buffers
- [[#maybe-b-use-text-properties-instead-of-plists-for-timestamping][MAYBE [#B] Use text properties instead of plists for timestamping]]
- Remove duplicate/overlapping results from occur buffer
- Search files instead of buffers
- Match tags separately
- Case-sensitive if caps are present
- Substring matching
- MAYBE Use grep to find matching lines
- Look at how Deft searches files
- MAYBE flx sorting
- MAYBE Match only headings
- MAYBE Testing with Buttercup
- Support new Helm with input-idle-delay
- Bugs
- Checklists
These come in handy while coding.
Why aren’t macros like these in some default package? Sure beats having to type (mapcar (lambda (it) (...it...)) list)
over and over.
(defmacro it (&rest body)
`(lambda (it)
,@body))
(defmacro mapit (seq &rest body)
`(mapcar (lambda (it)
,@body)
,seq))
This makes it easy to profile code:
(defmacro profile-rifle (times &rest body)
`(let (output)
(dolist (p '("helm-" "org-" "string-" "s-" "buffer-" "append" "delq" "map" "list" "car" "save-" "outline-" "delete-dups" "sort" "line-" "nth" "concat" "char-to-string" "rx-" "goto-" "when" "search-" "re-"))
(elp-instrument-package p))
(dotimes (x ,times)
,@body)
(elp-results)
(elp-restore-all)
(point-min)
(forward-line 20)
(delete-region (point) (point-max))
(setq output (buffer-substring-no-properties (point-min) (point-max)))
(kill-buffer)
(delete-window)
output))
Prototype code, keeping for future reference.
(let* ((num-context-words 2)
(needle "needle")
(haystack "one two three needle four five six")
(hay (s-split needle haystack))
(left-hay (s-split-words (car hay)))
(right-hay (s-split-words (nth 1 hay))))
(concat "..."
(s-join " " (subseq left-hay (- num-context-words)))
" " needle " "
(s-join " " (subseq right-hay 0 num-context-words))
"..."))
;; Multiple needles
(let* ( (needles '("needle" "pin"))
(haystack "one two three \" needle not pin four five six seven eight pin nine ten eleven twelve"))
(cl-loop for needle in needles
append (cl-loop for re = (rx-to-string `(and (repeat 1 ,helm-org-rifle-context-words (and (1+ (not space))
(or (1+ space)
word-boundary)))
(group (eval needle))
(repeat 1 ,helm-org-rifle-context-words (and (or word-boundary
(1+ space))
(1+ (not space))))))
for m = (string-match re haystack end)
for end = (match-end 1)
while m
collect (concat "..." (match-string-no-properties 0 haystack) "..."))))
This code splits on word boundaries, but it’s very slow. Profiling it
showed the vast majority of the time was in string-match
. I’m
guessing the regexp is too complicated or unoptimized.
;; Reduce matching lines to matched word with context
(setq matched-words-with-context
(cl-loop for line in (map 'list 'car matching-lines-in-node)
append (cl-loop for token in input
for re = (rx-to-string
`(and (repeat 0 ,helm-org-rifle-context-words
(and (1+ (not space))
(or (1+ space)
word-boundary)))
(group (eval token))
(repeat 0 ,helm-org-rifle-context-words
(and (or word-boundary
(1+ space))
(1+ (not space))))))
;; This one line uses about 95% of the runtime of this function
for m = (string-match re line end)
for end = (match-end 1)
when m
collect (match-string-no-properties 0 line))))
This version is much, much faster, but instead of matching on word boundaries, it just matches so-many characters before and after the token. It’s not quite as nice, but the speedup is worth it, and it seems good enough.
This is the version currently in-use.
(setq matched-words-with-context
(cl-loop for line in (map 'list 'car matching-lines-in-node)
append (cl-loop for token in input
for re = (rx-to-string '(and (repeat 0 25 not-newline)
(eval token)
(repeat 0 25 not-newline)))
for m = (string-match re line end)
for end = (match-end 1)
when m
collect (match-string-no-properties 0 line))))
- State “DONE” from “TODO” [2016-04-01 Fri 22:55]
Okay, it works now. Here’s hoping I don’t break it again. - State “TODO” from “TODO” [2016-04-01 Fri 19:03]
[2016-04-01 Fri 19:03] Somehow I broke it. Now to fix it…
I don’t understand why this loop isn’t working like I want it to:
(cl-loop with end
for line in (mapcar 'car matching-lines-in-node)
for token in input
for re = (rx-to-string `(and (repeat 0 ,helm-org-rifle-context-characters not-newline)
(eval token)
(repeat 0 ,helm-org-rifle-context-characters not-newline)))
for match = (string-match re line end)
for end = (match-end 0)
when match
collect (match-string-no-properties 0 line))
From what I can tell from the manual, it should do what I want. Let’s try this:
(cl-loop for line in '("1" "2" "3")
for word in '("a" "b" "c")
collect (list (format "Line:%s Word:%s" line word)))
Well that does not behave like Python list-comps. So let’s try nested:
(cl-loop for line in '("1" "2" "3")
collect (cl-loop for word in '("a" "b" "c")
collect (format "Line:%s Word:%s" line word)))
There. So this loop should work:
(cl-loop with end
for line in (mapcar 'car matching-lines-in-node)
for end = nil
collect (cl-loop for token in input
for re = (rx-to-string `(and (repeat 0 ,helm-org-rifle-context-characters not-newline)
(eval token)
(repeat 0 ,helm-org-rifle-context-characters not-newline)))
for match = (string-match re line end)
for end = (match-end 0)
when match
collect (match-string-no-properties 0 line)))
(helm-org-rifle-get-candidates-in-buffer (get-file-buffer "~/org/inbox.org") "emacs :org:")
Hm…not quite. Well, this is the code from just before the commit that broke it:
(setq matched-words-with-context
(cl-loop for line in (map 'list 'car matching-lines-in-node)
append (cl-loop with end
for token in input
for re = (rx-to-string `(and (repeat 0 ,helm-org-rifle-context-characters not-newline)
(eval token)
(repeat 0 ,helm-org-rifle-context-characters not-newline)))
for match = (string-match re line end)
if match
do (setq end (match-end 0))
and collect (match-string-no-properties 0 line)
(profile-rifle 10 (helm-org-rifle-get-candidates-in-buffer (find-file-noselect "~/org/inbox.org") "emacs helm !mail"))
Hm, that seems nearly twice as slow as before, compared to this. Let’s try without negation:
(profile-rifle 10 (helm-org-rifle-get-candidates-in-buffer (find-file-noselect "~/org/inbox.org") "emacs helm"))
Okay, that’s bad. But something is obviously wrong, because it’s calling rx-form
and search-forward-regexp
way too many times. Let’s see…
The problem is that the positive-re
is matching anywhere, not just at word boundaries, so it’s matching way too many nodes. Well, that is a problem; I don’t know if it explains the entire slowdown.
For example, this matches overwhelming
because of the helm
in the middle:
"\\(\\(?:[ ]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\)emacs\\(\\(?:[ ]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\|$\\)\\|\\(\\(?:[ ]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\)helm\\(\\(?:[ ]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\|$\\)"
Okay, the problem now is that I changed helm-org-rifle-tags-re
to fix tag matching, but that same regexp is used in helm-org-rifle-prep-token
, and now that function is matching any token as a tag and giving the wrong result.
I do not understand why it’s doing that, because that regexp is only supposed to match tags…
Okay, the other regexp that I kept commented out appears to match actual tags, as in it’s useful for testing whether a string is a tag:
(org-re ":\\([[:alnum:]_@#%:]+\\):[ \t]*$")
While this one appears to match tags in a document, potentially in a list of tags:
(org-re "\\(?:[ \t]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?")
Okay, I fixed it, I had an if match
instead of a while match
in the matched-words-with-context
loop.
Now to profile and compare with the pre-fix-context version:
Pre-context-fixed version: master @ 5c30f38
(profile-rifle 50 (helm-org-rifle-get-candidates-in-buffer (find-file-noselect "~/org/inbox.org") "emacs helm"))
Context-fixed version: 2b5b12a
[2016-04-02 Sat 00:14] Well, that’s definitely worse, although it’s still probably fast enough, because the elp instrumentation makes it a lot slower.
I’m also noticing that when I eval the buffer of the old version, and then the new one, and back and forth, it’s giving different results than when I start a new Emacs session before eval’ing each buffer. The content-fixed version is still slower, but it’s annoying that they are somehow interfering with each other…
Oh, I know what it probably is: defvar
not changing already-defined vars. Gah, I wish there were a “developer mode” that would automatically treat defvar
as setq
! That might also be causing different results to be returned.
And on that note, notice that the old version is running org-heading-components
9350 times and the new one 9750 times (divided by 50 runs, of course). That means the newer one is returning more results. That’s probably a good thing–better than returning fewer results–but it’s still an annoying discrepancy.
Well, anyway, it seems that the new version is working properly, even if it is a bit slower. I can probably optimize it some from here by profiling it some more. And it’s probably still fast enough anyway. I’m going to commit these test results and go from there.
[2016-04-02 Sat 00:24] I just noticed that the new version has search-forward-regexp
while the old shows re-search-forward
. I guess I accidentally used one instead of the other. And I didn’t have re-
in the profile-rifle
macro, so it wasn’t being instrumented. But I can’t even find out what the difference between those two functions is. Their docstrings are identical, but re-search-forward
says it’s “an interactive built-in function in `C source code’” and search-forward-regexp
says it’s an “interactive built-in function”. If one were an alias for the other, wouldn’t it say so, like other functions do? And I just googled it, and I can’t even find any discussions disambiguating them.
Well, I guess I will change all the search-forward-regexp
to re-search-forward
and profile it again, now with re-
instrumented…
Well, that made it a bit slower… and re-search-forward
is running 1915 times per run, which seems like a lot. Well, just for fun, let’s see if search-forward-regexp
is any different…
Well, seems about the same. Some other functions are calling re-search-forward
. I guess I’ll stick to re-search-forward
for consistency.
Let’s see if I can optimize this regexp, because it’s the one used for finding the next matching node:
(positive-re (mapconcat 'helm-org-rifle-prep-token input "\\|"))
Wait…I think I can’t do that, because each token has to be handled separately in case it’s a tag. At least, that’s the way I found that works.
I just realized something: because re-
wasn’t instrumented when I profiled the pre-context-fix code, that probably made the test runs a lot faster. I should rerun that test now that I’ve instrumented re-
:
Uh…that’s a lot slower…even slower than the context-fixed version. And it’s running re-search-forward
about 1/3rd fewer times, yet it’s still slower. That means the context-fixed version is faster…yet it doesn’t feel faster… This is getting really confusing.
…Or not! I ran it again, and this time it was back to 0.38 seconds per run, instead of the 0.88 that it showed. So the old version is faster. Argh, I even restarted Emacs between runs, but the results are still not always consistent.
(Haha, if anyone reads this on GitHub, they’re going to be confused, because GitHub doesn’t display results blocks in their Org renderer.)
Back to testing the context-fixed version:
Maybe the problem is here:
(s-matches? re target)
In the pre-context-fix version, I’m using:
(s-contains? token target t)
I think I changed to the regexp version because the s-contains?
version was doing substring matching, which I don’t want. Let’s switch it real quick just to see if that’s the problem:
Eh, it’s only about 20ms faster per run, although s-contains?
is more than twice as fast as s-matches?
. But it’s still such a short time that it doesn’t make much difference.
This is probably where the next-gen
branch would be easier to optimize. Even if all the extra function calls took their toll, at least I could profile each one separately. With this, I see all those re-search-forward
calls listed, but it’s hard to figure out why that’s making it slower than the pre-context-fix version.
Okay, I think I see what the problem is, or almost:
Pre-context-fix: re-search-forward 61250 3.4628969270 5.653...e-05
Post-context-fix: re-search-forward 78050 10.705968030 0.0001371680
The time per call to this function in the old version is much shorter, so the problem must be the regexp complexity. And that is a bit annoying, because I thought I was being careful to make it simpler, like by wrapping the whole regexp in the word-boundary matcher instead of each token in the or
group.
It’s almost surely this one: (re-search-forward positive-re node-end t)
, because the other two are the negation one (which isn’t being called in this test), and the per-node matcher (re-search-forward positive-re nil t)
, which is only run once per partially-matching node, in the main loop, while the other one runs multiple times per partially-matching node. They both use the same regexp though. Maybe if I can optimize the regexp used in that one…
I’m not sure that I can, though, because IIRC I had to do it this way to avoid substring matching:
(positive-re (mapconcat 'helm-org-rifle-prep-token input "\\|"))
Maybe having each token wrapped with helm-org-rifle-prep-token
is the problem, but I think if I change that, I’ll get substring matching, which I don’t want. Also there’s this: while before I thought I wasn’t getting substring matching, it might be that I actually was, but only for tokens after the first.
Sigh. I can see how having a testing framework for this would help a lot…
Well, I’m going to try a quick experiment: the faster version has this:
(setq matching-positions-in-node
(or (cl-loop for token in all-tokens
do (goto-char node-beg)
while (re-search-forward (helm-org-rifle-prep-token token) node-end t)
when negations
when (cl-loop for negation in negations
thereis (s-matches? negation
(buffer-substring-no-properties (line-beginning-position)
(line-end-position))))
return nil
collect (line-beginning-position) into result
do (end-of-line)
finally return (sort (delete-dups result) '<))
;; Negation found; skip node
(throw 'negated (goto-char node-end))))
And the slower version has this:
(when (and negations
(re-search-forward negations-re node-end t))
(throw 'negated (goto-char node-end)))
(setq matching-positions-in-node
(cl-loop initially (goto-char node-beg)
while (re-search-forward positive-re node-end t)
collect (line-beginning-position) into result
do (end-of-line)
finally return (sort (delete-dups result) '<)))
It’s hard for me to imagine how the first one is faster, even without negations, because it should be running more searches, about one for each token times the number of matching lines, rather than one for the number of matching lines. And helm-org-rifle-prep-token
is being called…well it should be a lot of times, once per token per node, at least, so that should be much slower! But maybe the more complex regexp is that much slower, so that running more, simpler searches is faster. Let’s find out… one, ta-hoo-hoo, tha-ree…
(setq matching-positions-in-node
(cl-loop for token in input
do (goto-char node-beg)
while (re-search-forward (helm-org-rifle-prep-token token) node-end t)
collect (line-beginning-position) into result
do (end-of-line)
finally return (sort (delete-dups result) '<)))
Well, that’s basically the same. Even though helm-org-rifle-prep-token
is being called 19,400 times now (whereas before it wasn’t even on the chart), the overall run is about the same speed. And re-search-forward
is being called 110,600 times instead of 78,050 times, and that’s adding two seconds to the overall time, yet the overall time is only 1 second slower, and each run is only 0.02 seconds slower.
I really don’t know. It’s probably still acceptably fast, but I’m not happy that it’s 240 ms slower per run than it was before.
Wait…is it the context matching that’s slowing it down? That would seem to make sense, but I don’t see string-match
or match-string-no-properties
on the chart, which are called a lot in the context-getting part. Again, this is where the next-gen
branch would be easier to profile, because that part would be in a separate function, which would show up on the benchmark.
Okay, so let’s try disabling the context-matching and see if that helps narrow it down.
Wow…nope. I set the context matches to a hardcoded string, and it actually took longer. That makes noooooo sense. I guess the context matching isn’t the problem.
Ok then, let’s see if avoiding substring matches is really the problem. Let’s change that back so that it does match substrings and see if it’s faster again:
Uh, before I do that… I see a discrepancy in the code:
(setq matching-positions-in-node
(cl-loop initially (goto-char node-beg)
while (re-search-forward positive-re node-end t)
collect (line-beginning-position) into result
do (end-of-line)
finally return (sort (delete-dups result) '<)))
;; Get list of line-strings containing any token
;; (setq matching-lines-in-node
;; (cl-loop for pos in matching-positions-in-node
;; do (goto-char pos)
;; ;; Get text of each matching line
;; for string = (buffer-substring-no-properties (line-beginning-position)
;; (line-end-position))
;; unless (org-at-heading-p) ; Leave headings out of list of matched lines
;; ;; (DISPLAY . REAL) format for Helm
;; collect `(,string . (,buffer ,pos))))
(setq matching-positions-in-node
(cl-loop for token in input
do (goto-char node-beg)
while (re-search-forward (helm-org-rifle-prep-token token) node-end t)
collect (line-beginning-position) into result
do (end-of-line)
finally return (sort (delete-dups result) '<)))
Somehow I put two of these loops in while commenting out the matching-lines-in-node
part. So running that loop twice could explain the slowdown…but then how were any context lines being displayed at all? Wow…how did I manage to do that… Oh I think I see, when I was testing the other matching-positions-in-node
loop, I commented out and replaced the wrong one. So…let’s fix that and profile again:
Okay, that is slightly faster, but this matches substrings, which I don’t want. So if I kept this, it would be a slight improvement over the current master in that it would fix the context matching while being a little bit slower.
I wonder if I could compromise and match substrings but only at the beginning of words (or after punctuation). That could be useful anyway, because it would avoid the “did I use a plural” problem. Let’s see if I can try it…
Wait, if I do that, it might mess up the tags matching that took so long to fix.
I wonder if I should separate out the tags matching. I already have it getting a list of tags in a separate string. If I removed tags-matching tokens from the input and matched them separately, maybe it would let me use a simpler regexp for everything else and avoid the prep function. I should probably make another branch to test that idea…sigh. And I don’t even know if that would improve performance. I’d have to first separate out the tags matching, then verify that it works properly, and then simplify the main positive-re
regexp, and then see if it is faster.
I think I’m going to stop here. It seems to work properly right now: context-matching, tag-matching, avoids substring matches, and negation works. And it seems fast enough, even if it is slower than before. Maybe there is some combination of these changes that makes everything work at about the same speed as before, but I think trying to figure it out is too complicated with this big candidates-getting function. I think it would be better to settle on this code that works correctly, and then go back to the next-gen
branch and try to improve that, which is structured in a simpler way.
[2016-04-02 Sat 02:21] I decided to test in the MELPA sandbox before merging with master and pushing, and it’s a good thing I did, because I discovered another weird bug: if the show-tags
setting is off, the results are way off. Probably a simpleish logic error in the code somewhere…but I think at this point I should just remove that setting. As it is it’s off by default, and I wonder how many people have gotten bad results because of it and decided that this package is no good. I doubt anyone would want it off anyway, and it doesn’t seem to hurt performance. So let’s just remove that so it’s consistent…
This helps for debugging, in case I need it in the future:
(let ((inhibit-read-only t)
(helm-org-rifle-show-full-entry t)
(results-buffer (get-buffer-create helm-org-rifle-occur-results-buffer-name)))
(with-current-buffer results-buffer
(unless (eq major-mode 'org-mode)
(read-only-mode)
(visual-line-mode)
(org-mode)
(hi-lock-mode 1)
(use-local-map helm-org-rifle-occur-keymap))
(erase-buffer)
(pop-to-buffer results-buffer))
(helm-org-rifle-occur-process-input "today Dodie" (list (find-buffer-visiting "~/org/log.org")) results-buffer)
(pop-to-buffer results-buffer))
See emacs-helm/helm#1806 (comment)
SOMEDAY [#B] Look into using Sallet
- State “SOMEDAY” from “TODO” [2018-04-16 Mon 13:30]
Not as full-featured as Helm, but might be an interesting alternative approach.
[2017-10-30 Mon 18:37] As suggested by Matus Goljer:
I looked at your code a bit and it looks quite good. I would try to enable lexical binding, I’ve noticed that you depend on dynamic lookup somewhere: instead pass the data as arguments, it’s going to be much faster still (dynamic lookup can be awful slow)
I thought I had already done this, but apparently I forgot. Might be an easy, nearly free speed boost.
I forgot to add this when I rewrote the input handling.
- State “MAYBE” from [2017-09-11 Mon 10:27]
Matus mentioned that he’s experimenting with emacs-deferred (which also has concurrent.el
in it) for his Sallet project, that it’s working well so far. I wonder if I could use that to improve performance, maybe even use it with Helm (and/or Sallet eventually).
[2017-10-30 Mon 18:35] More discussion here.
Using overlays should prevent Org itself from re-fontifying the paths.
I don’t know why I didn’t realize this sooner, but org-map-entries
could likely do much of the logic in helm-org-rifle--get-candidates-in-buffer
. I don’t know for certain if it would be faster, but since it has optional caching, it might very well be. And it makes it easy to get inherited tags, properties, etc, and to run on regions, subtrees, etc. Might even completely handle tag matching for me. Very powerful. I should definitely try it, and if the performance is good enough, use it.
e.g. this code from swiper/counsel/ivy
:
(org-map-entries
(lambda ()
(let* ((components (org-heading-components))
(level (make-string
(if org-odd-levels-only
(nth 1 components)
(nth 0 components))
?*))
(todo (nth 2 components))
(priority (nth 3 components))
(text (nth 4 components))
(tags (nth 5 components)))
(list (mapconcat 'identity
(cl-remove-if 'null
(list level todo
(if priority (format "[#%c]" priority))
text tags))
" ")
(buffer-file-name)
(point))))
nil
'agenda)
(org-map-entries FUNC &optional MATCH SCOPE &rest SKIP) Call FUNC at each headline selected by MATCH in SCOPE. FUNC is a function or a lisp form. The function will be called without arguments, with the cursor positioned at the beginning of the headline. The return values of all calls to the function will be collected and returned as a list. The call to FUNC will be wrapped into a save-excursion form, so FUNC does not need to preserve point. After evaluation, the cursor will be moved to the end of the line (presumably of the headline of the processed entry) and search continues from there. Under some circumstances, this may not produce the wanted results. For example, if you have removed (e.g. archived) the current (sub)tree it could mean that the next entry will be skipped entirely. In such cases, you can specify the position from where search should continue by making FUNC set the variable ‘org-map-continue-from’ to the desired buffer position. MATCH is a tags/property/todo match as it is used in the agenda tags view. Only headlines that are matched by this query will be considered during the iteration. When MATCH is nil or t, all headlines will be visited by the iteration. SCOPE determines the scope of this command. It can be any of: nil The current buffer, respecting the restriction if any tree The subtree started with the entry at point region The entries within the active region, if any region-start-level The entries within the active region, but only those at the same level than the first one. file The current buffer, without restriction file-with-archives The current buffer, and any archives associated with it agenda All agenda files agenda-with-archives All agenda files with any archive files associated with them (file1 file2 ...) If this is a list, all files in the list will be scanned The remaining args are treated as settings for the skipping facilities of the scanner. The following items can be given here: archive skip trees with the archive tag comment skip trees with the COMMENT keyword function or Emacs Lisp form: will be used as value for ‘org-agenda-skip-function’, so whenever the function returns a position, FUNC will not be called for that entry and search will continue from the position returned If your function needs to retrieve the tags including inherited tags at the *current* entry, you can use the value of the variable ‘org-scanner-tags’ which will be much faster than getting the value with ‘org-get-tags-at’. If your function gets properties with ‘org-entry-properties’ at the *current* entry, bind ‘org-trust-scanner-tags’ to t around the call to ‘org-entry-properties’ to get the same speedup. Note that if your function moves around to retrieve tags and properties at a *different* entry, you cannot use these techniques.
Thierry showed me this example which I should be able to use:
(condition-case _err (helm :sources my-source <etc...>) (quit (delete-my-buffers-or-whatever)))
Similar to this in Helm, text properties could be used to store timestamps for results in helm-org-rifle-get-candidates-in-buffer
, and then it wouldn’t be necessary to transform the candidates list into a plist and back. Also, an arbitrary list of helper functions could be passed in and run on each node as the candidates list is built, making it easy to optionally record extra metadata.
Since entire entry contents are displayed by default in the occur
commands, it should happen that some Org nodes may be displayed twice in the results buffer. i.e. given a subtree like:
* Emacs stuff
** Packages of interest
*** ace-window
*** Helm
**** helm-info-emacs command
A search for emacs
would first return the entire Emacs stuff
subtree, including all 4 child nodes. But it would also return the helm-info-emacs command
node as a separate result since emacs
appears in its heading.
Since the second result fits entirely inside the first result, the second should be discarded.
Alternatively, the whole command could be changed to only return each entry’s own text, i.e. not child headings. This seems like it might be more “correct,” but it also seems like a matter of preference: in the example above, if the user searches for emacs
, should the ace-window
node be displayed? It doesn’t mention emacs
directly, but it is relevant to Emacs since it’s in that subtree.
This could probably be configurable without too much added complexity…
e.g search agenda files, or files in a directory. Maybe write a with-unopened-file
macro (or something like that) to find-buffer-visiting
or find-file-noselect
, and close the buffer afterward if it wasn’t already open.
This would probably make it simpler and faster. Rather than trying to match a tags token across the entire node, it could just be matched against the tags string. Could probably do away with the complex and confusing tags regexp matching and simplify the prep-token function.
It would be easy to disable case-folding if caps are present in the search string.
Does searching for “solution” match this subheading?
(helm-org-rifle-get-candidates-in-buffer (get-file-buffer "test.org") "solution")
…No, it does not. That will probably need to be an option, customizable and/or with a prefix arg.
Solutions
- State “DONE” from “TODO” [2016-04-02 Sat 04:48]
This seems to be fixed now.
incidentally, on the matter of searching for substrings… if i enter a single word to search for i get a results list. if i then start entering a second word helm filters the results for each character that i enter. so, i get substring searches for words after the first! (this is for headings…it gets more complicated if i do searches that return topic content.)
Hm, this is strange. I’ll have to check on it.
Now it’s doing substring matching again. I specifically tested this earlier and it was working correctly, not matching substrings. Now it’s doing it again. What.
It might be faster, especially for unopened files, to use grep -b
to get matching lines in a file, and then backtrack to find the node’s heading, and then search the node.
Look at how Deft searches files
It probably has some good techniques for doing it quickly.
This swiper issue may have some good info about caching and such. It might be too slow for rifle, or at least it might be too slow with lots of results. Hmm…
- State “UNDERWAY” from “MAYBE” [2017-08-11 Fri 17:07]
It might be nice to only match against headings, but this is not as easy as it might seem. This whole package is made to search both headings and content.
This Org function might make this fairly easy: org-goto-local-search-headings
Underway in the heading-only-searches
branch.
MAYBE Testing with Buttercup
Could be good for testing e.g. negation, to make sure I don’t break it.
Thanks to Thierry’s help, this should help prevent flickering. This will be available in Helm 1.9.4 or commits after [2016-04-01 Fri].
After reading about Emacs testing packages, it looks like the best way to test this package is with some combination of Assess, Buttercup, Ecukes, ERT, and Espuds. Espuds’s steps should help testing interactive things, like Helm (although this will still be difficult), and Buttercup should make unit testing easier, and Assess should help with everything. Buttercup is intended as an alternative to ERT, but ERT might be useful too.
With all the options we have now, we need a magit-popup
-style UI, e.g. to temporarily enable an option like helm-org-rifle-test-against-path
.
I’m not sure if we can, but if so, it should help performance.
MAYBE Use recoll for indexing
- State “MAYBE” from [2018-08-17 Fri 07:53]
It’s in Debian/Ubuntu, and there’s already a helm-recoll
package. Maybe support for Org files could even be added to Recoll so it could present results as Org nodes.
It would be handy to have a built-in command to jump to the next match instance in the occur buffers, maybe something like M-g n
. Suggested by washy99999.
Don’t know how I overlooked this for this long. Shouldn’t be too hard to implement searching for phrases in quotes. Should probably match multiple spaces (but probably not newlines or tabs) between words; wouldn’t want an accidental double-spacebar press in the searched file to prevent a match.
helm-follow-mode
can be activated from within Helm already with C-c C-f
, and on an individual-item basis with C-j
, and anyone can define a custom command to set it themselves, but it might be worth having an argument to enable it too.
Along the lines of:
(defun my/helm-org-rifle-with-full-paths ()
(interactive)
(let ((helm-org-rifle-show-path (not helm-org-rifle-show-path))
(helm-org-rifle))))
Helm only seems to highlight the first match in each candidate.
It would be interesting to be able to search for timestamps, e.g. for nodes timestamped on a certain day, or within a certain date range. Might be a bit slow, because it would require comparing every timestamp in every result, but if it’s what you need, then it would probably be usable and worth it.
By setting a custom xfuncname
for a git repo containing org files (see man 5 gitattributes
), git diff will display the org heading as the hunk header in its output. Then running git grep -W
shows entire org entries that match. And git grep
has boolean operators. And git grep
is very fast. Plug these into an async Helm source and boom, lightning-fast searching of org files, even if they aren’t open in an Emacs buffer. Well, as long as the files are in a git repo–but you are storing your org files in a git repo, aren’t you? =)
Sift sounds like it might be a perfect solution here, since it supports multi-line matching, replacements, etc.
ripgrep might also be useful, although I don’t think it supports multi-line yet.
[2020-11-26 Thu 02:20] It supports multiline search now, so it might be suitable now.
It might be interesting to use emacs-async to do matching in files that aren’t already open in the current Emacs process. I’m not sure if it would be worth it, because even if it were faster in some cases for unopened files, it wouldn’t be faster compared to searching already opened files. And even though loading large files can be slow, once they are opened, the price is paid, and searching is faster; doing it in external Emacs processes would be slow every time, not just the first.
But there might be some cases where it would be helpful. It might be possible to do it without loading the files in Org in the other processes, and it might be helpful to do all the searching in one process instead of one for each file. For the case of opening many small files that don’t need to be frequently accessed, that the user doesn’t want to keep open, doing it in another process might actually be good.
But it might also be complicated to keep the search process open while the user is changing the query; and without doing that, a new search process would be started every time the user changed the query, which would mean loading the files all over again. So I’m not sure this idea would be generally useful.
Currently matches are made against substrings, like most other commands in Helm. However, this might not always lead to the best results. For example, if someone were searching for “Sol”, referring to the sun, he probably wouldn’t want to match “solution” or “solvent” or “soliloquy”. But if someone were trying to dig up a note he made a while back about apple pie, did he write about “an apple pie” or “some apple pies”? Dessert hangs in the balance!
To solve this, matches could be made against word, punctuation, or symbol boundaries. However, this is less “Helm-like,” and it might not be what most users expect. So it would be good to make this a configurable default. A prefix could override the default, and/or it could be toggleable from within a Helm session.
Right now, if more than one term appears in the same range, parts of that range will show up more than once in the context. Not a big deal, but should be fixable.
helm-org-rifle-get-candidates-in-buffer
might be able to be optimized more with elp
. But the “low-hanging fruit” is probably gone, and performance seems good.
It would be nice to have a regexp mode…maybe.
org-search-goto
had a match limit. I removed it to simplify things, but it might still be useful, depending on how big one’s org files are. However, performance seems good now, so this probably isn’t needed.
s-truncate
truncates and adds ...
, which means that the chosen length of entry text gets reduced by 3. Could fix this by using a setter for the defcustom
that adds 3.
[2020-01-04 Sat 09:04]
[2018-05-26 Sat 00:38] I’m not sure that it works correctly or is consistent. Need a test for it.
When a search term is a to-do keyword, Helm overrides the face of the keyword in headings with the Helm highlight face. I’m not sure if this can be fixed outside of Helm. I wish we could remove the keyword from a list of terms that Helm would highlight, but there doesn’t seem to be such a list.
If only a negation pattern is given, an error happens. Not a big deal, doesn’t interfere with anything, just change the pattern and it goes away.
- State “DONE” from “TODO” [2017-03-13 Mon 16:52]
Fixed!
When matching multiple tags in a string, the order of the tags matters, e.g. :website:Emacs
does not match entries that are tagged :Emacs:website:
or :website:something:Emacs:
. Not a big deal, but would be nice to fix it. I suppose it could be useful to have this behavior, because the tags can always be specified separately, but it might be unexpected for it to work this way.
Hmm, that seems like a long list. But I want stable releases to actually be stable.
It’s been used for a while now.
Use x.y.0
, not x.y
.
e.g. :tag1:tag2:
If a new minor version (not new patch version), make new x.x
branch. Then tag the new branch, using x.x.0
for the first release in a minor version branch, not x.x
.
Hmm, that seems like a long list. But I want stable releases to actually be stable.
Last MELPA release was on December 2, with a fix from a user. No problems since then, so I think it can be considered tested.
Use x.y.0
, not x.y
.
Nothing to do here AFAIK.
All Buttercup tests pass.
If a new minor version (not new patch version), make new x.x
branch. Then tag the new branch, using x.x.0
for the first release in a minor version branch, not x.x
.
Hmm, that seems like a long list. But I want stable releases to actually be stable.
It’s been 10 days since the last change to the code, and Z has said it’s working well.
Use x.y.0
, not x.y
.
The buttercup
tests handle the important stuff, and the other stuff hasn’t changed, and I’ve tested it recently.
e.g. :tag1:tag2:
If a new minor version (not new patch version), make new x.x
branch. Then tag the new branch, using x.x.0
for the first release in a minor version branch, not x.x
.
Minimal changes, been sitting in non-stable MELPA for a while, no complaints.
Use x.y.0
, not x.y
.
Nothing’s changed that should affect this; only added two commands and they work.
e.g. :tag1:tag2:
If a new minor version (not new patch version), make new x.x
branch. Then tag the new branch, using x.x.0
for the first release in a minor version branch, not x.x
.
Got some good feedback from Jack and zeltak, seems to be working well.
Use x.y.0
, not x.y
.
e.g. :tag1:tag2:
Maybe in 1.3.
If a new minor version (not new patch version), make new x.x
branch. Then tag the new branch, using x.x.0
for the first release in a minor version branch, not x.x
.
Hmm, that seems like a long list. But I want stable releases to actually be stable.
I tried.
(:tag1:tag2:
)
Pushing this back to 1.2.
Hmm, that seems like a long list. But I want stable releases to actually be stable.
Use x.y.0
, not x.y
.
e.g. :tag1:tag2:
If a new minor version (not new patch version), make new x.x
branch. Then tag the new branch, using x.x.0
for the first release in a minor version branch, not x.x
.