gitea/modules
Bruno Sofiato f64fbd9b74
Updated tokenizer to better matching when search for code snippets (#32261)
This PR improves the accuracy of Gitea's code search. 

Currently, Gitea does not consider statements such as
`onsole.log("hello")` as hits when the user searches for `log`. The
culprit is how both ES and Bleve are tokenizing the file contents (in
both cases, `console.log` is a whole token).

In ES' case, we changed the tokenizer to
[simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.).
In such a case, tokens are words formed by digits and letters. In
Bleve's case, it employs a
[letter](https://blevesearch.com/docs/Tokenizers/) tokenizer.

Resolves #32220

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
2024-11-06 20:51:20 +00:00
..
actions Fix wrong status of `Set up Job` when first step is skipped (#32120) 2024-09-24 18:34:08 +00:00
activitypub Remove SHA1 for support for ssh rsa signing (#31857) 2024-09-07 18:05:18 -04:00
analyze
assetfs
auth Add Passkey login support (#31504) 2024-06-29 22:50:03 +00:00
avatar
badge
base fix OIDC introspection authentication (#31632) 2024-07-23 12:43:03 +00:00
cache bump to go 1.23 (#31855) 2024-09-10 02:23:07 +00:00
charset refactor: remove redundant err declarations (#32381) 2024-10-30 19:36:24 +00:00
container Allow disabling authentication related user features (#31535) 2024-07-09 17:36:31 +00:00
csv Render embedded code preview by permlink in markdown (#30234) 2024-04-02 17:48:27 +00:00
dump Refactor "dump" sub-command (#30240) 2024-04-03 02:16:46 +00:00
emoji
eventsource
generate
git Fix git error handling (#32401) 2024-11-02 11:20:22 +00:00
gitgraph Fix milestone deadline and date related problems (#32339) 2024-11-05 07:46:40 +00:00
gitrepo Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
globallock Use global lock instead of NewExclusivePool to allow distributed lock between multiple Gitea instances (#31813) 2024-09-06 10:12:41 +00:00
graceful
hcaptcha
highlight
hostmatcher Support allowed hosts for migrations to work with proxy (#32025) 2024-09-11 05:47:00 +00:00
html
httpcache Fix wrong last modify time (#32102) 2024-09-21 21:56:25 +00:00
httplib Fix wrong last modify time (#32102) 2024-09-21 21:56:25 +00:00
indexer Updated tokenizer to better matching when search for code snippets (#32261) 2024-11-06 20:51:20 +00:00
issue/template bump to go 1.23 (#31855) 2024-09-10 02:23:07 +00:00
json
label
lfs Use 8 as default value for git lfs concurrency (#32421) 2024-11-05 13:10:57 +00:00
lfstransfer Add pure SSH LFS support (#31516) 2024-09-27 10:27:37 -04:00
log Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
markup Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
mcaptcha
metrics Rename project board -> column to make the UI less confusing (#30170) 2024-05-27 08:59:54 +00:00
migration Support migrating GitHub/GitLab PR draft status (#32242) 2024-10-13 22:58:13 +03:00
nosql
optional Resolve lint for unused parameter and unnecessary type arguments (#30750) 2024-04-29 08:47:56 +00:00
options
packages Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
paginator
pprof
private Make git push options accept short name (#32245) 2024-10-12 05:42:10 +00:00
process Update misspell to 0.5.1 and add `misspellings.csv` (#30573) 2024-04-27 08:03:49 +00:00
proxy
proxyprotocol
public
queue bump to go 1.23 (#31855) 2024-09-10 02:23:07 +00:00
recaptcha
references Refactor to use UnsafeStringToBytes (#31358) 2024-06-14 01:26:33 +00:00
regexplru
repository Make LFS http_client parallel within a batch. (#32369) 2024-11-04 04:49:08 +00:00
secret
session Improve oauth2 client "preferred username field" logic and the error handling (#30622) 2024-04-25 11:22:32 +00:00
setting Use 8 as default value for git lfs concurrency (#32421) 2024-11-05 13:10:57 +00:00
sitemap
ssh
storage Add artifacts test fixture (#30300) 2024-11-01 10:29:54 +08:00
structs Make admins adhere to branch protection rules (#32248) 2024-10-23 12:39:43 +08:00
svg Refactor markdown attention render (#29984) 2024-03-22 12:16:23 +00:00
sync Use global lock instead of NewExclusivePool to allow distributed lock between multiple Gitea instances (#31813) 2024-09-06 10:12:41 +00:00
system Refactor to use UnsafeStringToBytes (#31358) 2024-06-14 01:26:33 +00:00
templates Fix milestone deadline and date related problems (#32339) 2024-11-05 07:46:40 +00:00
test Remove sub-path from container registry realm (#31293) 2024-06-09 16:29:29 +08:00
testlogger Refactor tests to prevent from unnecessary preparations (#32398) 2024-11-01 23:18:29 +08:00
timeutil Refactor DateUtils and merge TimeSince (#32409) 2024-11-04 11:30:00 +00:00
translation Render embedded code preview by permlink in markdown (#30234) 2024-04-02 17:48:27 +00:00
turnstile
typesniffer
updatechecker
uri
user
util Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
validation
web Refactor names (#31405) 2024-06-19 06:32:45 +08:00
webhook
zstd Support compression for Actions logs (#31761) 2024-08-09 10:10:30 +08:00