Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a waitgroup to wait for reserve holders of MFiles before unmapping #876

Merged
merged 7 commits into from
Jan 9, 2025

Conversation

carlhoerberg
Copy link
Member

WHAT is this pull request doing?

Please fill me in.

HOW can this pull request be tested?

Specs? Manual steps? Please fill me in.

@carlhoerberg carlhoerberg force-pushed the mfile-unmapping branch 2 times, most recently from 17e4d7a to c4df472 Compare December 12, 2024 16:51
src/lavinmq/mfile.cr Outdated Show resolved Hide resolved
src/lavinmq/clustering/actions.cr Outdated Show resolved Hide resolved
@spuun
Copy link
Member

spuun commented Dec 13, 2024

Two things that we probably must remove

  • unmap from Queue#rm_consumer
  • unmap from USR2 handler

@carlhoerberg
Copy link
Member Author

Two things that we probably must remove
* unmap from Queue#rm_consumer
* unmap from USR2 handler

I don't know, both call MessageStore#unmap_segments, which doesn't unmap the last segment that is still being written to.

def unmap_segments(except : Enumerable(UInt32) = StaticArray(UInt32, 0).new(0u32))
@segments.each do |seg_id, mfile|
next if mfile == @wfile
next if except.includes? seg_id
mfile.unmap
end
end

@spuun
Copy link
Member

spuun commented Dec 16, 2024

I don't know, both call MessageStore#unmap_segments, which doesn't unmap the last segment that is still being written to.

Isn't it possible that a segment that isn't yet fully replicated is unmapped? If it's a large message I think the unmap may occur in the middle of the lz4 compression.

Will these unmaps be necessary if we handle unmapping properly, which this PR hopefully solves?

@carlhoerberg
Copy link
Member Author

I don't know, both call MessageStore#unmap_segments, which doesn't unmap the last segment that is still being written to.

Isn't it possible that a segment that isn't yet fully replicated is unmapped? If it's a large message I think the unmap may occur in the middle of the lz4 compression.

Will these unmaps be necessary if we handle unmapping properly, which this PR hopefully solves?

True, there's a chance for that, if the unmap is triggered just when we're rolling over to a new segment.

And yes, shouldn't be needed anymore.

@carlhoerberg
Copy link
Member Author

also started to tinker with a ref counter as you did, but yes, it fast gets messy, as we return a BytesMessage from MessageStore, but it points into the mmaping.. so we don't know when the client has sent them..

class MFile
  @counter = 0
  @counter_lock = Mutex.new(:unchecked)

  def borrow : self
    @counter_lock.synchronize do
      counter = @counter
      if counter.zero?
        mmap
      end
      @counter = counter + 1
    end
    self
  end

  def unborrow : Nil
    @counter_lock.synchronize do
      counter = @counter -= 1
      if counter.zero?
        unmap
      end
    end
  end
end

@carlhoerberg
Copy link
Member Author

I'm starting to long for the io_uring implementation, then we could skip mmap:ings all together..

@spuun
Copy link
Member

spuun commented Dec 16, 2024

I'm starting to long for the io_uring implementation, then we could skip mmap:ings all together..

Yes! But I think this PR is working pretty well? I've done some runs without crashes.

@carlhoerberg carlhoerberg marked this pull request as ready for review December 23, 2024 23:08
@carlhoerberg carlhoerberg requested a review from a team as a code owner December 23, 2024 23:08
Can cause seg faults if a follower is still trying
to replicate it.
Add a random delay to the stream queue GC loops,
so that not all loops are executing at the same time.
@carlhoerberg
Copy link
Member Author

@spuun
Copy link
Member

spuun commented Jan 8, 2025

This is still a problem: https://github.com/cloudamqp/lavinmq/blob/e13e7af6a46ef75646184e4f3128ad562d562412/src/lavinmq/amqp/queue/message_store.cr/#L261

Do we need this unmap? Why not let a file be mapped as long as it's opened? I.e. only unmap in #close?

@carlhoerberg
Copy link
Member Author

Do we need this unmap? Why not let a file be mapped as long as it's opened? I.e. only unmap in #close?

You can't mmap unlimited number of files, eg. in linux you're limited by vm.max_map_count. If you have a queue that you only publish to no segment would never be unmapped without this

@carlhoerberg
Copy link
Member Author

This is still a problem: https://github.com/cloudamqp/lavinmq/blob/e13e7af6a46ef75646184e4f3128ad562d562412/src/lavinmq/amqp/queue/message_store.cr/#L261

This is not a problem anymore, now that MFile#unmap is safe by default, MFile#unsafe_unmap is not, it doesn't check the waitgroup

@carlhoerberg carlhoerberg merged commit 501ba4c into main Jan 9, 2025
24 of 25 checks passed
@carlhoerberg carlhoerberg deleted the mfile-unmapping branch January 9, 2025 19:12
@spuun spuun mentioned this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants