Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

Analyzer DictProxy / KeyError #32

Closed
bflad opened this issue Jun 26, 2013 · 25 comments
Closed

Analyzer DictProxy / KeyError #32

bflad opened this issue Jun 26, 2013 · 25 comments

Comments

@bflad
Copy link
Contributor

bflad commented Jun 26, 2013

Please see this Gist for all of the information: https://gist.github.com/bflad/5863991

@bflad
Copy link
Contributor Author

bflad commented Jun 26, 2013

Should note this is what I saw causing #31.

@bflad
Copy link
Contributor Author

bflad commented Jun 26, 2013

I'm not sure if this is a red herring, but if I change https://github.com/etsy/skyline/blob/master/src/analyzer/analyzer.py#L173

logger.info('total analyzed    :: %d' % (len(unique_metrics) - sum(self.exceptions.values())))

to:

e = self.exceptions.values()
logger.info('total analyzed    :: %d' % (len(unique_metrics) - sum(e)))

leaves my node now running stable for over 12 hours rather than crashing. I'm wondering if this has to do with trying to access the Manager().dict() structure directly for values() rather than copying it first?

@astanway
Copy link
Contributor

Can you try logging self.exceptions and self.exceptions.values()?
Like this:
logger.info(self.exceptions)
logger.info(self.exceptions.values())

@astanway
Copy link
Contributor

Also, does this happen when your metrics are at FULL_DURATION?

@bflad
Copy link
Contributor Author

bflad commented Jun 26, 2013

Happens at FULL_DURATION and also when I run flushall in redis-cli before starting the analyzer daemon.

@bflad
Copy link
Contributor Author

bflad commented Jun 26, 2013

Okay, I think I've narrowed down which code is really giving the issue.

https://github.com/etsy/skyline/blob/master/src/analyzer/analyzer.py#L176

logger.info('anomaly breakdown :: %s' % self.anomaly_breakdown)

The above causes the eventual crash of the daemon. This appears to be an immediate stability fix:

ab = self.anomaly_breakdown
logger.info('anomaly breakdown :: %s' % ab)

@astanway
Copy link
Contributor

Interesting. And that happens both when anomaly_breakdown is an empty dictionary and when it contains the breakdown?

@bflad
Copy link
Contributor Author

bflad commented Jun 26, 2013

Let me clear out Redis and run it both ways.

@astanway
Copy link
Contributor

For the future, it's probably easier to change your FULL_DURATION so that Incomplete gets raised for each one and no anomalies get run.

@astanway
Copy link
Contributor

Any update on this?

@otac0n
Copy link

otac0n commented Jul 18, 2013

FYI, we had this issue, too. We managed to fix it by rolling back and uninstalling some python packages. I can't be specific because we did quite a few. Hopefully this sends you in the right direction.

@astanway
Copy link
Contributor

Rolling back to an earlier version of Skyline? And does @bflad's fix work for you?

@otac0n
Copy link

otac0n commented Jul 18, 2013

No, we kept skyline. We just rolled back python package versions to match our test environment. @bflad's fix did not work.

@lcoulet
Copy link

lcoulet commented Jul 24, 2013

I have the exact same issue.
I work-around the crash by using the following code in the analyzer.py, but is stills is not perfect (don't ask me why it doesn't crash this way):

        ex = self.exceptions
        logger.info('total analyzed    :: %d' % (len(unique_metrics) - sum(ex.values())))
        logger.info('total anomalies   :: %d' % len(self.anomalous_metrics))
        logger.info('exception stats   :: %s' % ex)
        ab = self.anomaly_breakdown
        logger.info('anomaly breakdown :: %s' % ab)

instead of:

        logger.info('total analyzed :: %d' % (len(unique_metrics) - sum(self.exceptions.values())))
        logger.info('total anomalies :: %d' % len(self.anomalous_metrics))
        logger.info('exception stats :: %s' % self.exceptions)
        logger.info('anomaly breakdown :: %s' % self.anomaly_breakdown)

Any suggestion to solve the root cause of this issue?

@astanway
Copy link
Contributor

@lcoulet @bflad @otac0n What version of Python are you all using?

@bflad
Copy link
Contributor Author

bflad commented Jul 24, 2013

@lcoulet
Copy link

lcoulet commented Jul 26, 2013

Hello, Python is2.6.6.
Platform is CentOs 6.4.

Detailed python packages here: https://gist.github.com/lcoulet/6086911

Thanks,
Loic

@draco2003
Copy link
Contributor

+1 for this being an issue.

We're having the same issue, though it only seems to happen when metrics aren't at FULL_DURATION.

If we start horizon agent and let it fill redis prior to starting the analyzer things work smoothly.
Otherwise we get a similar error/issue as above.

Example fresh start logs with errors: https://gist.github.com/draco2003/7007503

With the suggested patch above it seems to get ride of the more verbose python errors, but still prints out the
lines similar to this one:
013-10-16 08:58:59 :: 18445 :: anomaly breakdown :: <DictProxy object, typeid 'dict' at 0x7f61846f5090; 'str()' failed>
2

@astanway
Copy link
Contributor

Sorry - the lines similar to what? I have been unable to replicate so far which is pretty frustrating.

@draco2003
Copy link
Contributor

Sorry the python error seems like it was mangled due to looking like html.

Errors similar to this line:
https://gist.github.com/draco2003/7007503#file-gistfile1-txt-L42

@draco2003
Copy link
Contributor

After further investigation it continues to happen even if metrics are at FULL_DURATION.

@astanway
Copy link
Contributor

@bflad @draco2003 @lcoulet Can you confirm that the above pull fixes the issue?

@draco2003
Copy link
Contributor

This fixes the issue for us when running skyline with python 2.6.6.

@astanway
Copy link
Contributor

Closing this for now - please pipe up if you notice it again. Thanks, @ctpence!

@lcoulet
Copy link

lcoulet commented Oct 25, 2013

Thanks Abe and ctpense, sorry for the late answer I am travelling in mission right now so I did not have any capability to check at this. I will give it a try ASAP.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants