-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpredictable function execution/APIGW response times #418
Comments
I've been playing around with frameworkless Lambda + API Gateway and I'm noticing the exact same behavior when I have a standalone Lambda that I never pre-warm. This makes me wonder whether my Zappa-managed Please chime in if you have any relevant data to share! Edit: I must have glossed over this before, but I do see lines like this in Cloudwatch every minute, so this definitely means the event is firing. It does seem like it is having no effect on the startup time of the function, though:
|
I am having similar issues. Below (left axis are miliseconds), a keep warm with a 3 minutes rate, seems to have spikes every three hours or so, which is interesting/weird by itself. Also, when I actually make a manual request (around 10:45), I still have a high response time. Does anyone have a clue on what is going on? Thanks |
Some more feedback on this. It actually seems that the keep warm from zappa is not doing anything. However, if one warms up the webapp by making an actual request via the API gateway (for example, via another service), it is possible to keep the webapp warm. So it seems the API gateway is playing a role here. |
I am seeing this too. We have one api that has irregular but bursty traffic. So not much happens, then suddenly many requests happen at once. If you do the burst batches one after the other then the second batch is super fast. But the first batch has many requests taking several seconds. I noticed its due to several Lambdas spinning up (Instancing... in the logs), and that takes 2-5 seconds. One wild guess is that keep warm just keeps one Lambda warm and a burst obviously requires concurrency..... or it is indeed something to do with a difference in the way Lambda routes the event depending on whether it comes from API Gateway or is scheduled. 🤔 |
Might be worth using https://uptimerobot.com/ or https://updown.io to send a request via API Gateway as an additional keep warm. I do that and haven't seen this behaviour. |
So, today I investigated a long-standing issue we had with one of our zappa-powered lambdas. The lambda is configured with a 30 second timeout and is expected to have an avg runtime of 500ms. This doesn't look right: We're getting a ton of 30 seconds max durations, we can see in the graph on the right that there are increased timeouts at peak time. I had assumed the timeouts were related to some kind of database or external-api contention, but I found that they were all in cold starts. Some of the lambda machines would need up to 40 seconds to do a cold start; if they hit the timeout, they would get stuck permanently trying to cold start and eventually give up and be rotated out. Increasing the default timeout of 30 seconds to 45 seconds actually fixed the issue. See the red line here: Now, the real question is why is a cold start taking so long? I profiled a lambda that took 22 seconds to complete. Of the 22 seconds, the following two lines of code (in from django.core.wsgi import get_wsgi_application # 6500ms
...
get_wsgi_application() # 14500ms This is a 128MB lambda (Python 3.6 runtime), so it has the weakest cpu. I expected cold starts to be bad, but this is pretty brutal. Hope this helps someone. |
This is really good information, kudos! I suspect it is probably something to with this dependency loading since I noticed our cold starts are slower for lambdas where we have more dependencies/larger deploy files (a factor of 2 to 3 from an api that mostly has django, django rest and psycopg2 to one that also has numpy). For now we are using brute force of a higher CPU lambda to bring it down to a few seconds so it doesn't affect our users as much but its obviously not a long term solution. |
p.s. how did you profile the lambda to put a number on those two lines? |
I just added some timing calls in Zappa and deployed that instead. |
I've done some work recently to reduce application startup times on cold start, and came up with the following conclusions. They might be specific to my application, but I'm posting them in case they can guide someone else:
|
Can you elaborate on that? I usually use |
This is because you're bumping up the CPU of the lambda every time you increase its RAM. Anything CPU bound will linearly increase accordingly. |
This is more of a question/investigatory issue rather than a bug report per se, but I've noticed this behavior on 2 different Zappa-based APIs I have started working on.
Basically, I've noticed that when I call an endpoint after not using it for a while, I will get a very long response time, e.g.
but if I call it again immediately after I get a much faster response
The "API" implementation is a simple print statement so it's not worth showing, but here are my Zappa settings:
^ I increased the frequency on the
keep_warm_expression
key to see if it helps, but it doesn't seem to.Don't see anything in CloudWatch either, all of the log lines look like this:
any ideas where to look for more info?
The text was updated successfully, but these errors were encountered: