-
Notifications
You must be signed in to change notification settings - Fork 335
Analyzer DictProxy / KeyError #32
Comments
Should note this is what I saw causing #31. |
I'm not sure if this is a red herring, but if I change https://github.com/etsy/skyline/blob/master/src/analyzer/analyzer.py#L173 logger.info('total analyzed :: %d' % (len(unique_metrics) - sum(self.exceptions.values()))) to: e = self.exceptions.values()
logger.info('total analyzed :: %d' % (len(unique_metrics) - sum(e))) leaves my node now running stable for over 12 hours rather than crashing. I'm wondering if this has to do with trying to access the Manager().dict() structure directly for values() rather than copying it first? |
Can you try logging self.exceptions and self.exceptions.values()? |
Also, does this happen when your metrics are at FULL_DURATION? |
Happens at FULL_DURATION and also when I run flushall in redis-cli before starting the analyzer daemon. |
Okay, I think I've narrowed down which code is really giving the issue. https://github.com/etsy/skyline/blob/master/src/analyzer/analyzer.py#L176 logger.info('anomaly breakdown :: %s' % self.anomaly_breakdown) The above causes the eventual crash of the daemon. This appears to be an immediate stability fix: ab = self.anomaly_breakdown
logger.info('anomaly breakdown :: %s' % ab) |
Interesting. And that happens both when anomaly_breakdown is an empty dictionary and when it contains the breakdown? |
Let me clear out Redis and run it both ways. |
For the future, it's probably easier to change your FULL_DURATION so that Incomplete gets raised for each one and no anomalies get run. |
Any update on this? |
FYI, we had this issue, too. We managed to fix it by rolling back and uninstalling some python packages. I can't be specific because we did quite a few. Hopefully this sends you in the right direction. |
Rolling back to an earlier version of Skyline? And does @bflad's fix work for you? |
No, we kept skyline. We just rolled back python package versions to match our test environment. @bflad's fix did not work. |
I have the exact same issue.
instead of:
Any suggestion to solve the root cause of this issue? |
Hello, Python is2.6.6. Detailed python packages here: https://gist.github.com/lcoulet/6086911 Thanks, |
+1 for this being an issue. We're having the same issue, though it only seems to happen when metrics aren't at FULL_DURATION. If we start horizon agent and let it fill redis prior to starting the analyzer things work smoothly. Example fresh start logs with errors: https://gist.github.com/draco2003/7007503 With the suggested patch above it seems to get ride of the more verbose python errors, but still prints out the |
Sorry - the lines similar to what? I have been unable to replicate so far which is pretty frustrating. |
Sorry the python error seems like it was mangled due to looking like html. Errors similar to this line: |
After further investigation it continues to happen even if metrics are at FULL_DURATION. |
@bflad @draco2003 @lcoulet Can you confirm that the above pull fixes the issue? |
This fixes the issue for us when running skyline with python 2.6.6. |
Closing this for now - please pipe up if you notice it again. Thanks, @ctpence! |
Thanks Abe and ctpense, sorry for the late answer I am travelling in mission right now so I did not have any capability to check at this. I will give it a try ASAP. |
Please see this Gist for all of the information: https://gist.github.com/bflad/5863991
The text was updated successfully, but these errors were encountered: