Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limited cache and performance queries #38

Open
oliver-s-lee opened this issue Jan 25, 2023 · 1 comment
Open

Limited cache and performance queries #38

oliver-s-lee opened this issue Jan 25, 2023 · 1 comment

Comments

@oliver-s-lee
Copy link

Hi there, really nice project you've got here! I'm using mongita as an embedded DB in one of my projects and it's great, but I've started making a few tweaks I thought you might be interested in?

Firstly, I see the on-disk storage engine currently has an infinitely growing cache, which naturally leads to memory leaks with large read/write cycles. I've had a go at writing a simple limited cache that should work as a drop-in replacement (#36). Limiting the cache size to below the benchmarking set size will obviously have a negative impact on performance, but when set larger (or to infinite) the change has limited impact.

Secondly, I think one of the current major bottlenecks for performance is the copy.deepcopy() calls on insertion and retrieval. For insertion, I'm fairly certain this can be replaced with a simple shallow copy, as all that's changed is the addition of the _id field? I've made a PR to test this out (#37) and all seems to work fine. On my system, the increase in insertion performance with the benchmarking set is ~50%.

For retrieval it looks like things are more complicated. Currently, the returned record is copied regardless of whether it's fetched from cache or from disk, but of course the record returned from disk is already unique so the copy is wasted. I don't see any easy way to change this at present without changing some other internals, most probably collection.__find_ids(). Is there a reasons this function couldn't return the actual documents rather than just the IDs? It's already gone to the bother of fetching the records so it seems wasteful to discard them only to retrieve again later?

Cheers, and I hope you don't mind the comments!

@scottrogowski
Copy link
Owner

Hi Oliver,

Thanks for the kind words and the PRs! I'm glad you're getting so much use out of this little project.

The cache is something that I've been concerned about for a while so you're right to have found that to be a potential bottleneck.

I've been pretty deeply focused on a startup I've been working for and haven't had much time to devote to open source over the past few months. This will probably (sadly) continue for a while. So, on the surface while the PRs look good, it'll take some time before I'm able to properly review what you submitted. In particular, I'd like to review the unit tests that already exist and maybe think through some new ones.

Either way, I'll keep your PRs open until I do have the time. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants