Skip to content
This repository has been archived by the owner on Jul 24, 2018. It is now read-only.

Bubbling up relevant error information during site deploy #17

Open
darrendejaeger opened this issue Jan 31, 2018 · 0 comments
Open

Bubbling up relevant error information during site deploy #17

darrendejaeger opened this issue Jan 31, 2018 · 0 comments

Comments

@darrendejaeger
Copy link

I've encountered a couple scenarios where the site deploy toolset at my disposal has not been very "clear" when it comes to understanding an issue that has occurred during the deployment process. An example of this: I specified a particular package to be installed onto the hosts during the deployment process via the site manifests. There wasn't any issue at all on the Genesis host. However, when I got to the actual site deploy, I ran into some trouble that was difficult to track down. The MaaS GUI was showing nodes "deployed", but they weren't joining to my k8s cluster. Digging deeper showed the following:

Queried node's BMC - Power state queried: onFri, 19 Jan. 2018 17:13:37
Node post-installation failure - 'cloudinit' running modules for configFri, 19 Jan. 2018 17:13:32
Node post-installation failure - 'cloudinit' running config-apt-configure with frequency once-per-instanceFri, 19 Jan. 2018 17:13:24
Node changed status - From 'Deploying' to 'Deployed'Fri, 19 Jan. 2018 17:13:17

Digging deeper, I searched the clout-init logs on the particular host (/var/log/cloud-init-output.log, /var/log/cloud-init.log), but came up empty-handed. It wasn't until I examined /var/log/syslog that I found my problem:

17:23:59 promjoin.sh[2230]: + apt-get install -y --no-install-recommends ceph-common=10.2.7-0ubuntu0.16.04.1 curl jq docker-engine=1.13.1-0~ubuntu-xenial socat=1.7.3.1-1
17:23:59 promjoin.sh[2230]: Reading package lists...
17:23:59 promjoin.sh[2230]: Building dependency tree...
17:23:59 promjoin.sh[2230]: Reading state information...
17:23:59 promjoin.sh[2230]: E: Version '10.2.7-0ubuntu0.16.04.1' for 'ceph-common' was not found
17:23:59 promjoin.sh[2230]: ++ date +%s
17:23:59 promjoin.sh[2230]: + now=1516382639
17:23:59 promjoin.sh[2230]: + [[ 1516382639 -gt 1516382635 ]]
17:23:59 promjoin.sh[2230]: + log Failed to install apt packages.
17:23:59 promjoin.sh[2230]: ++ date
17:23:59 promjoin.sh[2230]: + echo Fri Jan 19 17:23:59 UTC 2018 Failed to install apt packages.
17:23:59 promjoin.sh[2230]: Fri Jan 19 17:23:59 UTC 2018 Failed to install apt packages.
17:23:59 promjoin.sh[2230]: + exit 1
17:23:59 systemd[1]: promjoin.service: Main process exited, code=exited, status=1/FAILURE
17:23:59 systemd[1]: promjoin.service: Unit entered failed state.
17:23:59 systemd[1]: promjoin.service: Failed with result 'exit-code'.

Would it be possible, in some fashion, to make it easier to determine root cause for these types of troubles?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant