Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(MODULES-10653) Failed to upgrade agent using puppet task #494

Merged
merged 1 commit into from
Jun 10, 2020

Conversation

luchihoratiu
Copy link
Contributor

@luchihoratiu luchihoratiu commented May 18, 2020

Before this commit, when trying to run the puppet_agent::install_powershell task, with a specific version as parameter, it was failing with below error message:

Error: Timed out waiting for status response from <node_name>

On the node, both puppet agent and pxp-agent services got stopped and the event log viewer showed:

Application or service 'task_wrapper' could not be shut down.

Due to Windows agents becoming unresponsive when running this task and as per MODULES-10633 discussions, the puppet_agent::install_poweshell task should not be used for upgrading Puppet Agents while either ofpuppet agent or pxp-agent services are still running. This Puppet Agent module task will now fail immediately and output a message if any of the two affected services are still running.

The task will stop and output a message if the given version is already installed on the node, thus avoiding unnecessary msi download/installation. This behaviour is aligned with the existing implementation for no version given and Puppet Agent already installed on the node.

@luchihoratiu luchihoratiu requested a review from a team May 18, 2020 06:07
@puppetcla
Copy link

CLA signed by all contributors.

@luchihoratiu luchihoratiu force-pushed the MODULES-10653 branch 2 times, most recently from bda5def to ec12882 Compare May 18, 2020 09:51
@donoghuc
Copy link
Contributor

I dont think this matches the bash implementation. I also think we explicitly want this upgrade behavior and test that it works

# Upgrade from puppet5 to puppet6
results = run_task('puppet_agent::install', 'target', { 'collection' => 'puppet6', 'version' => 'latest' })
results.each do |res|
expect(res).to include('status' => 'success')
end
# Verify that it upgraded
results = run_task('puppet_agent::version', 'target', {})
results.each do |res|
expect(res).to include('status' => 'success')
expect(res['result']['version']).not_to match(%r{^5\.\d+\.\d+})
expect(res['result']['version']).to match(%r{^6\.\d+\.\d+})
expect(res['result']['source']).to be
end
end
end

I think the tasks are used more for ensuring library code that bolt apply uses is installed on an agent and it is expected for that use case that there is no service being used. If you are using the service you will be expected to upgrade with the manifest code.

@donoghuc donoghuc requested a review from a team May 18, 2020 14:39
@lucywyman lucywyman requested a review from beechtom May 21, 2020 23:03
@gimmyxd
Copy link
Contributor

gimmyxd commented May 22, 2020

@donoghuc after chatting with @npwalker I think that this task should be used only to install an agent on clean box(that does not have any agent installed). If someone wants to do an upgrade(an agent is already present on that box) the best way to do that would be to use the puppet code from this module to handle the upgrade(using bolt apply i guess).
I know Nick started to look into creating a plan: #483

We would like to have a single codepath that's used when doing upgrades to eliminate confusion for the users and also be easier for us to maintain.

@donoghuc
Copy link
Contributor

I am thinking of these tasks from a bolt only perspective. The big use case there is "I need to apply some puppet code and run ruby tasks" but i don't want to care about managing a puppet agent. To serve that need, the task just installing an agent seems fine.

If removing the functionality to upgrade/downgrade in the tasks is fully replaced by plans that use apply blocks to do the upgrade/downgrade that bolt users can have that sounds great. I do think it is important to have the different implementations of the tasks be as close as possible in functionality, so simply removing it from the powershell implementation is not idea.

I also think removing the functionality is a breaking change and will need to be rolled out properly.

@adreyer
Copy link
Contributor

adreyer commented May 22, 2020

This seems like this is removing functionality from bolt users and a breaking change to the module.

Allowing the "install" task to "ensure" that a new enough version of puppet is present is really useful in the context of bolt and plans. Removing code paths should be transparent to users. This is removing functionality and we should be very careful about doing so until there is an alternative we can recommend for users.

@npwalker
Copy link
Contributor

npwalker commented May 22, 2020

I feel like I'm missing something.

Trying to use the install script on windows to upgrade an agent doesn't work right? So we're just providing a good error message now instead of failing in some less useful way?

So we're not removing functionality we're just providing better error messages.

@donoghuc
Copy link
Contributor

My review and comments were based on the commit message and this implementation. dc318aa

@donoghuc
Copy link
Contributor

Here is my take on this situation:

At a high level there are two scenarios for an upgrade.

  1. The first is simply upgrading the package (in this case the use case is simply needing the ruby interpreter and the underlying puppet content for using bolt apply). In this case the the powershell task should work just fine (though there may be a bug in stop_service implementation

    function Cleanup {
    if($stop_service -eq 'true') {
    C:\"Program Files"\"Puppet Labs"\Puppet\bin\puppet resource service puppet ensure=stopped enable=false
    }
    Write-Output "Deleting $msi_dest and $install_log"
    Remove-Item -Force $msi_dest
    Remove-Item -Force $install_log
    where puppet is stopped but not pxp-agent). Similarly this would match what the bash implementation does.

  2. The second use case for an upgrade is to upgrade an agent that is actively being used for enforcement (regardless of whether it is being used with bolt). In this scenario the task should not be used.

One path forward may be to check that the service is running and refuse to upgrade with the task if it is and instead ask the user to execute the manifest code.

Open questions: Does the powershell implementation work if puppet/pxp-agent services are stopped (i think yes according to our acceptance tests)? When using the stop_service parameter are all services associated with the package (including pxp-agent) stopped? Can we move forward with task implementations that check the service status and then refuse to continue if running?

@luchihoratiu luchihoratiu force-pushed the MODULES-10653 branch 3 times, most recently from 9402efc to cd56e62 Compare May 27, 2020 12:49
Before this commit, when trying to run the
`puppet_agent::install_powershell` task, with a specific version as
parameter, it was failing with below error message:
Error: Timed out waiting for status response from <node_name>

On the node, both `puppet agent` and `pxp-agent` services got stopped
and the event log viewer showed:
Application or service 'task_wrapper' could not be shut down.

Due to Windows agents becoming unresponsive when running this task and
as per MODULES-10633 discussions, the `puppet_agent::install_poweshell`
task should not be used for upgrading Puppet Agents while either of
`puppet agent` or `pxp-agent` services are still running. This Puppet
Agent module task will now fail immediately and output a message if any
of the two affected services are still running.

The task will stop and output a message if the given version is already
installed on the node, thus avoiding unnecessary msi
download/installation. This behaviour is aligned with the existing
implementation for no version given and Puppet Agent already installed
on the node.
@luchihoratiu
Copy link
Contributor Author

luchihoratiu commented May 29, 2020

@donoghuc @adreyer I've updated the pull request with checks for installed version and for affected services. I've also added/modified the tests to cover these changes.

The powershell implementation indeed works if the services are stopped and the stop_service parameter being enabled stops only the puppet agent service (I don't think that we would want any other service to be stopped but also I don't think that using it would help our case in any way).

@luchihoratiu luchihoratiu requested review from donoghuc and adreyer May 29, 2020 07:56
@npwalker
Copy link
Contributor

npwalker commented Jun 2, 2020

@donoghuc @adreyer any concerns with the new implementation?

results = run_task('puppet_agent::install', 'target', { 'collection' => 'puppet6', 'version' => 'latest' })
results.each do |res|
expect(res).to include('status' => 'success')
end

# Verify that it upgraded
installed_version = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this set to nil?

Copy link
Contributor Author

@luchihoratiu luchihoratiu Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there to simply avoid an undefined local variable or method error. Its role is to store the currently installed puppet agent version (since we've previously installed the latest puppet6 version) and it's used below, in the next step, when trying to install the same version again.

I have no doubt that there are better/more elegant approaches to testing this. I just tried to be minimalist with the changes and use the existing steps as much as possible. If there are any concerns, I'm always open to improvement ideas 😃.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I did not see that was used below.

Copy link
Contributor

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for powershell. Is this the behavior for the shell implementation? I think they should be the same.

@npwalker
Copy link
Contributor

npwalker commented Jun 3, 2020

I don't see a reason to prevent linux users from using the linux task to upgrade. The problem only exists on windows if I'm understanding correctly.

@luchihoratiu
Copy link
Contributor Author

luchihoratiu commented Jun 9, 2020

@donoghuc, the behaviour is the same for running the task with the same puppet agent version specified as the one on the node and without specifying any version when it is already installed on the node. The output message also got aligned for both.

The behaviour is different when trying to upgrade. As @npwalker said, it wouldn't be beneficial for anyone to prevent Linux users from upgrading their agents (only because Windows users can't), especially since it would be just for the sake of consistency.

From my perspective and understanding, this PR is (or at least should be) just a small/quick patch to prevent Windows nodes from getting into a really messy state where no upgrade happens whatsoever, puppet services get killed and PE loses control over the node until the services get manually restarted. For a good consistent behaviour, the end goal should be to improve Windows nodes upgradeability (with a smarter/different approach), rather than the other way around but all of that doesn't seem possible (to me at least) only here, at this level.

Copy link
Contributor

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general implementations for a task should try aim to be as close in behavior as possible. In this case it seems like the difference is warranted given the limitations of attempting to support such a wide range of OS. I think this is certainly an improvement for the powershell implementation and will avoid painful situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants