-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project?Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of serviceand privacy statement.We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document dictionary compression #4760
Draft
Kerollmops
wants to merge
11
commits into
main
Choose a base branch
from
document-dictionnary-compression
base: main
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Kerollmops
added
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
disk space usage
labels
Jul 2, 2024
Kerollmops
force-pushed
the
document-dictionnary-compression
branch
2 times, most recently
from
July 3, 2024 09:47
b567c8b
to
264baed
Compare
/bench workloads/search/*.json |
/bench workloads/*.json |
/bench workloads/search/*.json |
Kerollmops
changed the title
Document dictionnary compression
Document dictionary compression
Jul 3, 2024
Kerollmops
force-pushed
the
document-dictionnary-compression
branch
from
July 4, 2024 09:33
11e4f9f
to
f73d95d
Compare
/bench workloads/hackernews-ignore-first-100k.json |
☀️ Benchmark invocation completed, please find the results for your workloads below: |
Kerollmops
force-pushed
the
document-dictionnary-compression
branch
from
July 8, 2024 13:33
f73d95d
to
a63f202
Compare
Kerollmops
force-pushed
the
document-dictionnary-compression
branch
from
July 10, 2024 14:42
a63f202
to
deee22b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
disk space usage
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #4750 by introducing document compression to Meilisearch.
I had to use thezstdlibrary directly instead of thelz4_flexbecause the latter doesn't provide an option to specify the compression level.According to the documentation,I used:
Note that the benchmarks only represent the first couple hours of real usage: When the user uploads some documents and settings, the number of documents reaches 10k+. This PR will compress the documents when there are 10k+ with a dictionary generated from them and thenneverchanges the dictionary (but if no documents are left).
The first results show between 2x and 3x compression of the documents database (from a 25GiB to a 16GiB data.ms and an average document size going from 305B to 126B). Still, we can see performance regression due to the compression done on a single thread (👇).
To do
experimental
zstdfeature toavoid copying 64k bytes in memory.