You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 24, 2018. It is now read-only.
I've encountered a couple scenarios where the site deploy toolset at my disposal has not been very "clear" when it comes to understanding an issue that has occurred during the deployment process. An example of this: I specified a particular package to be installed onto the hosts during the deployment process via the site manifests. There wasn't any issue at all on the Genesis host. However, when I got to the actual site deploy, I ran into some trouble that was difficult to track down. The MaaS GUI was showing nodes "deployed", but they weren't joining to my k8s cluster. Digging deeper showed the following:
Queried node's BMC - Power state queried: onFri, 19 Jan. 2018 17:13:37
Node post-installation failure - 'cloudinit' running modules for configFri, 19 Jan. 2018 17:13:32
Node post-installation failure - 'cloudinit' running config-apt-configure with frequency once-per-instanceFri, 19 Jan. 2018 17:13:24
Node changed status - From 'Deploying' to 'Deployed'Fri, 19 Jan. 2018 17:13:17
Digging deeper, I searched the clout-init logs on the particular host (/var/log/cloud-init-output.log, /var/log/cloud-init.log), but came up empty-handed. It wasn't until I examined /var/log/syslog that I found my problem:
17:23:59 promjoin.sh[2230]: + apt-get install -y --no-install-recommends ceph-common=10.2.7-0ubuntu0.16.04.1 curl jq docker-engine=1.13.1-0~ubuntu-xenial socat=1.7.3.1-1
17:23:59 promjoin.sh[2230]: Reading package lists...
17:23:59 promjoin.sh[2230]: Building dependency tree...
17:23:59 promjoin.sh[2230]: Reading state information...
17:23:59 promjoin.sh[2230]: E: Version '10.2.7-0ubuntu0.16.04.1' for 'ceph-common' was not found
17:23:59 promjoin.sh[2230]: ++ date +%s
17:23:59 promjoin.sh[2230]: + now=1516382639
17:23:59 promjoin.sh[2230]: + [[ 1516382639 -gt 1516382635 ]]
17:23:59 promjoin.sh[2230]: + log Failed to install apt packages.
17:23:59 promjoin.sh[2230]: ++ date
17:23:59 promjoin.sh[2230]: + echo Fri Jan 19 17:23:59 UTC 2018 Failed to install apt packages.
17:23:59 promjoin.sh[2230]: Fri Jan 19 17:23:59 UTC 2018 Failed to install apt packages.
17:23:59 promjoin.sh[2230]: + exit 1
17:23:59 systemd[1]: promjoin.service: Main process exited, code=exited, status=1/FAILURE
17:23:59 systemd[1]: promjoin.service: Unit entered failed state.
17:23:59 systemd[1]: promjoin.service: Failed with result 'exit-code'.
Would it be possible, in some fashion, to make it easier to determine root cause for these types of troubles?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I've encountered a couple scenarios where the site deploy toolset at my disposal has not been very "clear" when it comes to understanding an issue that has occurred during the deployment process. An example of this: I specified a particular package to be installed onto the hosts during the deployment process via the site manifests. There wasn't any issue at all on the Genesis host. However, when I got to the actual site deploy, I ran into some trouble that was difficult to track down. The MaaS GUI was showing nodes "deployed", but they weren't joining to my k8s cluster. Digging deeper showed the following:
Digging deeper, I searched the clout-init logs on the particular host (/var/log/cloud-init-output.log, /var/log/cloud-init.log), but came up empty-handed. It wasn't until I examined /var/log/syslog that I found my problem:
Would it be possible, in some fashion, to make it easier to determine root cause for these types of troubles?
The text was updated successfully, but these errors were encountered: