Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat(search): support code search by zoekt #33850

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adlternative
Copy link

WIP: support zoekt code search

Try to support #33702

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 11, 2025
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 11, 2025
@github-actions github-actions bot added modifies/go Pull requests that update Go code modifies/dependencies labels Mar 11, 2025
@adlternative adlternative changed the title WIP feat(search): support code search by zoekt WIP: feat(search): support code search by zoekt Mar 11, 2025
@wxiaoguang
Copy link
Contributor

There are already so many search engines builtin into Gitea. Many of them have various bugs.

So the questions are:

  1. Will more search engines be added into Gitea to make Gitea have plenty of builtin search engines?
  2. Will the search engines become unmaintained and the bugs will never be fixed?

@hiifong
Copy link
Member

hiifong commented Mar 11, 2025

To be honest I prefer this zoekt search engine compared to the existing search engine

@lunny
Copy link
Member

lunny commented Mar 11, 2025

maybe this can replace bleve but we need some comparsion tests.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 11, 2025

To be honest I prefer this zoekt search engine compared to the existing search engine

That's understandable. So a few months later, another one feels "yoekt" is better, then introduce "yoekt", then a few months later, someone feels "xoekt" is better, then introduce "xoekt", and then "woekt", "voekt", "uoekt" ... "coekt", "boekt", "aoekt". Then Gitea contains all search engines on the internet.


I do not mean objection to introduce improvements. But actually it needs to:

  1. Clarify the existing problems & fix existing problems.
  2. Remove unnecessary search engine before introducing new ones.

So a clear roadmap about the "search engine plan" is necessary.

@wxiaoguang wxiaoguang marked this pull request as draft March 11, 2025 05:07
@adlternative
Copy link
Author

There are already so many search engines builtin into Gitea. Many of them have various bugs.

So the questions are:

  1. Will more search engines be added into Gitea to make Gitea have plenty of builtin search engines?

In my opinion, supporting multiple search engines is a good thing, as users may have different needs. Even GitLab now supports both ES and Zoekt search engines. see https://docs.gitlab.com/user/search

  1. Will the search engines become unmaintained and the bugs will never be fixed?

I'm not too worried about this; Gitea should have good community maintenance. It might be because the code search functionality is not exposed by default, so many bugs haven't been discovered.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 11, 2025

In my opinion, supporting multiple search engines is a good thing, as users may have different needs. Even GitLab now supports both ES and Zoekt search engines. see https://docs.gitlab.com/user/search
I'm not too worried about this; Gitea should have good community maintenance. It might be because the code search functionality is not exposed by default, so many bugs haven't been discovered.

Well, do you know how many search engines are in Gitea now? And what longstanding bugs do they have? https://github.com/go-gitea/gitea/issues?q=is%3Aissue%20state%3Aopen%20code%20search

And some bugs didn't get fixed in months, for example: "Search Functionality Issues with Bleve Engine #31565", I don't see "good community maintenance"

@adlternative
Copy link
Author

To be honest I prefer this zoekt search engine compared to the existing search engine

That's understandable. So a few months later, another one feels "yoekt" is better, then introduce "yoekt", then a few months later, someone feels "xoekt" is better, then introduce "xoekt", and then "woekt", "voekt", "uoekt" ... "coekt", "boekt", "aoekt". Then Gitea contains all search engines on the internet.

you don't need to worry about this: zoekt is a popular code search engine, currently used by code platforms like Gerrit, Sourcegraph, and GitLab, wrote by Gerrit author, and maintained by Sourcegraph. Zoekt has advantages that traditional search engines (like ES) do not possess: support for regex matching, substring search, etc. I don't think any new open-source code search engines will be able to replace it in the short term.

I do not mean objection to introduce improvements. But actually it needs to:

  1. Clarify the existing problems & fix existing problems.
  2. Remove unnecessary search engine before introducing new ones.

So a clear roadmap about the "search engine plan" is necessary.

You are right, where should the roadmap be written? I don't have experience with this. I will supplement its documentation when the zoekt functionality is more complete

@wxiaoguang
Copy link
Contributor

I don't think any new open-source code search engines will be able to replace it in the short term.

Yep, if zoekt wins, we need to drop some others.

@adlternative
Copy link
Author

In my opinion, supporting multiple search engines is a good thing, as users may have different needs. Even GitLab now supports both ES and Zoekt search engines. see https://docs.gitlab.com/user/search
I'm not too worried about this; Gitea should have good community maintenance. It might be because the code search functionality is not exposed by default, so many bugs haven't been discovered.

Well, do you know how many search engines are in Gitea now? And what longstanding bugs do they have? https://github.com/go-gitea/gitea/issues?q=is%3Aissue%20state%3Aopen%20code%20search

And some bugs didn't get fixed in months, for example: "Search Functionality Issues with Bleve Engine #31565", I don't see "good community maintenance"

Sure, it's regrettable that this part of the content is unmaintained. However, for the zoekt code search, I can commit to maintaining it thoroughly.

@adlternative
Copy link
Author

I don't think any new open-source code search engines will be able to replace it in the short term.

Yep, if zoekt wins, we need to drop some others.

Yeah, I hope this can be divided into at least two steps:

  1. Support zoekt
  2. Deprecate other search engines

Zoekt may also have some issues, as GitLab has not completely deprecated ES and fully switched to Zoekt...

@adlternative adlternative force-pushed the adl/dev/search/support-zoekt-code-indexer branch from 17d7c30 to 212fc79 Compare March 11, 2025 11:16
@wxiaoguang
Copy link
Contributor

To make the code clear, we need to refactor the related code first: Refactor issue & code search #33860

Each "indexer" should provide the "search modes" they support by themselves. And we need to remove the "fuzzy" search for code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/dependencies modifies/go Pull requests that update Go code size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants