Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flush() after writing to gzip_file #2753

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

flush() after writing to gzip_file #2753

wants to merge 2 commits into from

Conversation

vin
Copy link

@vin vin commented Nov 15, 2024

Summary

In order to better support streaming responses where the chunks are smaller than the file buffer size, we flush after writing.

Without the explicit flush, the writes are buffered and the subsequent reads see an empty self.gzip_buffer until the file automatically flushes due to either (1) the 32KiB write buffer1 fills or (2) the file is closed because the streaming response is complete.

Without flushing, the GZipMiddleware doesn't work as expected for streaming responses, especially not for Server-Sent Events which are expected to be delivered immediately to clients. The code as written appears to intend to flush immediately rather than buffering, as it does immediately call await self.send(message), but in practice that message is often empty.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

Footnotes

  1. https://github.com/python/cpython/blob/main/Lib/gzip.py#L26

@Kludex
Copy link
Member

Kludex commented Nov 15, 2024

Can we add a test to prove your point?

@vin
Copy link
Author

vin commented Nov 15, 2024

I'll work on that today. I have a small repro case I'll share but I need to make it run as a test

@vin
Copy link
Author

vin commented Nov 16, 2024

I've added a test, but it's a bit complicated. Without the flush, the entire contents of the response are correct, but to show that they are received iteratively rather than all at once, I use a wrapping middleware to assert that GZipMiddleware isn't sending empty message bodies, which is what it does without the flush.

@vin
Copy link
Author

vin commented Nov 25, 2024

@Kludex could you take another look or recommend a good reviewer? Thanks!

@Kludex
Copy link
Member

Kludex commented Dec 5, 2024

Not related.

@vin
Copy link
Author

vin commented Dec 13, 2024

Any concerns here, or how can we best move this forward?

@Kludex
Copy link
Member

Kludex commented Dec 13, 2024

Any concerns here, or how can we best move this forward?

The best way to move forward would be to present the problem first, with an MRE, and references to other issues where other people had the same problem.

I think the current behavior is intentional, so I need to get more references around before reviewing this.

@vin
Copy link
Author

vin commented Dec 16, 2024

Best observed in a browser, but here's a small repro:

import time
from fastapi import FastAPI
from fastapi.middleware.gzip import GZipMiddleware
from fastapi.responses import StreamingResponse

app = FastAPI()
app.add_middleware(GZipMiddleware)

@app.get("/")
def streaming_response():

    def generate():
        for i in range(10):
            yield f'event: ping\ndata: {{"i": {i}}}\n\n'
            time.sleep(1)
    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

and what it looks like before this PR:
image

and after:
image

In the first case, all responses appear together after 10 seconds. In the second, every 1 second a new event appears.

@vin
Copy link
Author

vin commented Dec 16, 2024

If the behavior is intentional, we really need to update the documentation here:
https://fastapi.tiangolo.com/advanced/middleware/#gzipmiddleware

The middleware will handle both standard and streaming responses.

to indicate that this will cause streaming responses to be buffered 32KiB at a time; this was certainly a surprising result for us, and one that caused our users to report that our app appeared broken as realtime status updates stopped working.

@Kludex
Copy link
Member

Kludex commented Jan 23, 2025

This seems a valid PR.

It seems other web frameworks have the same issue. I'm confused as to how no one noticed... It's even hard to find references about people having issues with this. 🤔

@Kludex
Copy link
Member

Kludex commented Jan 24, 2025

Okay. I look around, and based on what you said, I understand we have a real problem with SSEs.

It seems we should either:

  • Ignore the middleware on SSE.
  • Flush the buffer on every message, otherwise we can't see the ping.

I'm not so sure about all the streaming responses.


Reference to my future self: https://www.npmjs.com/package/compression#server-sent-events

vin added 2 commits January 24, 2025 12:55
In order to better support streaming responses where the chunks are smaller than the file buffer size, we flush after writing.

Without the explicit flush, the writes are buffered and the subsequent reads see an empty self.gzip_buffer until the file automatically flushes due to either
(1) the write buffer fills, probably at 8kiB, or (2) the file is closed because the streaming response is complete.

Without flushing, the GZipMiddleware doesn't work as expected for streaming responses, especially not for Server-Sent Events which are expected to be delivered immediately to clients. The code as written appears to intend to flush immediately rather than buffering, as it does immediately call `await self.send(message)`, but in practice that `message` is often empty.
@vin
Copy link
Author

vin commented Jan 24, 2025

I think having both GZip + SSE should be supportable. I think the most correct behavior is to send each message as it becomes available, even if this is not the most optimal compression. But still, enabling compression for streaming responses (including SSE) can be a reasonable choice; those individual messages could theoretically be larger than 32k and may benefit from compression the same as non-streaming responses might.

It seems the current implementation (without the flush) favors optimization of the compression at the expense of timely delivery. Enabling the flush allows the developer to accept the suboptimal compression and get the expected timely delivery, or to choose to trade off timely delivery for potential compression improvements by implementing their own buffering.

@vin
Copy link
Author

vin commented Jan 25, 2025

Similar conversation happening at tuffnatty/zstd-asgi#5. I argue that the least-surprising behavior of the middleware is to deliver small messages immediately rather than holding them in a buffer for an indefinite amount of time. As a user of the middleware, I'd rather read "compression won't be as effective for small messages; buffer them into larger chunks if you wish to maximize compression" vs "small messages will be held in a buffer and delivered 32kB at a time; if you don't want this, don't use this middleware"

@Kludex
Copy link
Member

Kludex commented Jan 25, 2025

Is there any web framework in any language that does this right in the middleware level?

@tuffnatty
Copy link

As I suggested in tuffnatty/zstd-asgi#5, why not allow both behaviours selecting the desired one with a parameter?

@Kludex
Copy link
Member

Kludex commented Jan 25, 2025

As I suggested in tuffnatty/zstd-asgi#5, why not allow both behaviours selecting the desired one with a parameter?

Because that seems to be taken responsibility from our side and give to the developers. I'm still trying to understand the implications of flushing it on every message. Seems a lot.

But for SSE, we need to send the ping messages, so there's no need for the user to configure it.

@vin
Copy link
Author

vin commented Jan 27, 2025

I agree that adding a parameter should be avoided if possible; I'd rather the middleware figure out the correct behavior and implement it, transparently and consistently.

Do we have any evidence the current implementation was even intentional? Read-after-write without flushing is a common, easy-to-make programming error. Calling await self.send(message) at line 100 -- when message["body"] is empty -- seems questionable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants