-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow code search by filename #32210
Conversation
9c77450
to
05a5090
Compare
tests/gitea-repositories-meta/org42/search-by-path.git/description
Outdated
Show resolved
Hide resolved
89f67c6
to
12c542d
Compare
The year of copyright in new files should be 2024. |
12c542d
to
19b08c0
Compare
Done |
19b08c0
to
0de61a1
Compare
Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
Please do not force push, it's hard to follow changes this way |
Sure. My bad 😔 |
Ah, right - for this to happen, we need to increase |
Can you add a search test with filename as keyword? |
I have included some., I'll provide some comments on the test cases to better describe their underlying scenarios. What do you think ? |
Hey @lunny, I've added some comments to the test cases to better describe their scenarios :) |
go-gitea#32210) This is a large and complex PR, so let me explain in detail its changes. First, I had to create new index mappings for Bleve and ElasticSerach as the current ones do not support search by filename. This requires Gitea to recreate the code search indexes (I do not know if this is a breaking change, but I feel it deserves a heads-up). I've used [this approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html) to model the filename index. It allows us to efficiently search for both the full path and the name of a file. Bleve, however, does not support this out-of-box, so I had to code a brand new [token filter](https://blevesearch.com/docs/Token-Filters/) to generate the search terms. I also did an overhaul in the `indexer_test.go` file. It now asserts the order of the expected results (this is important since matches based on the name of a file are more relevant than those based on its content). I've added new test scenarios that deal with searching by filename. They use a new repo included in the Gitea fixture. The screenshot below depicts how Gitea shows the search results. It shows results based on content in the same way as the current version does. In matches based on the filename, the first seven lines of the file contents are shown (BTW, this is how GitHub does it). ![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858) Resolves go-gitea#32096 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
go-gitea#32210) This is a large and complex PR, so let me explain in detail its changes. First, I had to create new index mappings for Bleve and ElasticSerach as the current ones do not support search by filename. This requires Gitea to recreate the code search indexes (I do not know if this is a breaking change, but I feel it deserves a heads-up). I've used [this approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html) to model the filename index. It allows us to efficiently search for both the full path and the name of a file. Bleve, however, does not support this out-of-box, so I had to code a brand new [token filter](https://blevesearch.com/docs/Token-Filters/) to generate the search terms. I also did an overhaul in the `indexer_test.go` file. It now asserts the order of the expected results (this is important since matches based on the name of a file are more relevant than those based on its content). I've added new test scenarios that deal with searching by filename. They use a new repo included in the Gitea fixture. The screenshot below depicts how Gitea shows the search results. It shows results based on content in the same way as the current version does. In matches based on the filename, the first seven lines of the file contents are shown (BTW, this is how GitHub does it). ![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858) Resolves go-gitea#32096 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* giteaofficial/main: Make `owner/repo/pulls` handlers use "PR reader" permission (go-gitea#32254) make `show stats` work when only one file changed (go-gitea#32244) Update scheduled tasks even if changes are pushed by "ActionsUser" (go-gitea#32246) Support migrating GitHub/GitLab PR draft status (go-gitea#32242) Only rename a user when they should receive a different name (go-gitea#32247) Fix dropdown content overflow (go-gitea#31610) Make git push options accept short name (go-gitea#32245) Allow code search by filename (go-gitea#32210) Allow maintainers to view and edit files of private repos when "Allow maintainers to edit" is enabled (go-gitea#32215) Use per package global lock for container uploads instead of memory lock (go-gitea#31860)
go-gitea#32210) This is a large and complex PR, so let me explain in detail its changes. First, I had to create new index mappings for Bleve and ElasticSerach as the current ones do not support search by filename. This requires Gitea to recreate the code search indexes (I do not know if this is a breaking change, but I feel it deserves a heads-up). I've used [this approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html) to model the filename index. It allows us to efficiently search for both the full path and the name of a file. Bleve, however, does not support this out-of-box, so I had to code a brand new [token filter](https://blevesearch.com/docs/Token-Filters/) to generate the search terms. I also did an overhaul in the `indexer_test.go` file. It now asserts the order of the expected results (this is important since matches based on the name of a file are more relevant than those based on its content). I've added new test scenarios that deal with searching by filename. They use a new repo included in the Gitea fixture. The screenshot below depicts how Gitea shows the search results. It shows results based on content in the same way as the current version does. In matches based on the filename, the first seven lines of the file contents are shown (BTW, this is how GitHub does it). ![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858) Resolves go-gitea#32096 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
This is a large and complex PR, so let me explain in detail its changes.
First, I had to create new index mappings for Bleve and ElasticSerach as the current ones do not support search by filename. This requires Gitea to recreate the code search indexes (I do not know if this is a breaking change, but I feel it deserves a heads-up).
I've used this approach to model the filename index. It allows us to efficiently search for both the full path and the name of a file. Bleve, however, does not support this out-of-box, so I had to code a brand new token filter to generate the search terms.
I also did an overhaul in the
indexer_test.go
file. It now asserts the order of the expected results (this is important since matches based on the name of a file are more relevant than those based on its content).I've added new test scenarios that deal with searching by filename. They use a new repo included in the Gitea fixture.
The screenshot below depicts how Gitea shows the search results. It shows results based on content in the same way as the current version does. In matches based on the filename, the first seven lines of the file contents are shown (BTW, this is how GitHub does it).
Resolves #32096