(MODULES-10653) Failed to upgrade agent using puppet task #494

luchihoratiu · 2020-05-18T06:07:46Z

Before this commit, when trying to run the puppet_agent::install_powershell task, with a specific version as parameter, it was failing with below error message:

Error: Timed out waiting for status response from <node_name>

On the node, both puppet agent and pxp-agent services got stopped and the event log viewer showed:

Application or service 'task_wrapper' could not be shut down.

Due to Windows agents becoming unresponsive when running this task and as per MODULES-10633 discussions, the puppet_agent::install_poweshell task should not be used for upgrading Puppet Agents while either ofpuppet agent or pxp-agent services are still running. This Puppet Agent module task will now fail immediately and output a message if any of the two affected services are still running.

The task will stop and output a message if the given version is already installed on the node, thus avoiding unnecessary msi download/installation. This behaviour is aligned with the existing implementation for no version given and Puppet Agent already installed on the node.

puppetcla · 2020-05-18T08:00:19Z

CLA signed by all contributors.

donoghuc · 2020-05-18T14:35:20Z

I dont think this matches the bash implementation. I also think we explicitly want this upgrade behavior and test that it works

puppetlabs-puppet_agent/task_spec/spec/acceptance/init_spec.rb

Lines 103 to 118 in b83c5b9

    
               # Upgrade from puppet5 to puppet6 
        
               results = run_task('puppet_agent::install', 'target', { 'collection' => 'puppet6', 'version' => 'latest' }) 
        
               results.each do |res| 
        
                 expect(res).to include('status' => 'success') 
        
               end 
        
               # Verify that it upgraded 
        
               results = run_task('puppet_agent::version', 'target', {}) 
        
               results.each do |res| 
        
                 expect(res).to include('status' => 'success') 
        
                 expect(res['result']['version']).not_to match(%r{^5\.\d+\.\d+}) 
        
                 expect(res['result']['version']).to match(%r{^6\.\d+\.\d+}) 
        
                 expect(res['result']['source']).to be 
        
               end 
        
             end 
        
           end

I think the tasks are used more for ensuring library code that bolt apply uses is installed on an agent and it is expected for that use case that there is no service being used. If you are using the service you will be expected to upgrade with the manifest code.

gimmyxd · 2020-05-22T11:15:20Z

@donoghuc after chatting with @npwalker I think that this task should be used only to install an agent on clean box(that does not have any agent installed). If someone wants to do an upgrade(an agent is already present on that box) the best way to do that would be to use the puppet code from this module to handle the upgrade(using bolt apply i guess).
I know Nick started to look into creating a plan: #483

We would like to have a single codepath that's used when doing upgrades to eliminate confusion for the users and also be easier for us to maintain.

donoghuc · 2020-05-22T13:40:54Z

I am thinking of these tasks from a bolt only perspective. The big use case there is "I need to apply some puppet code and run ruby tasks" but i don't want to care about managing a puppet agent. To serve that need, the task just installing an agent seems fine.

If removing the functionality to upgrade/downgrade in the tasks is fully replaced by plans that use apply blocks to do the upgrade/downgrade that bolt users can have that sounds great. I do think it is important to have the different implementations of the tasks be as close as possible in functionality, so simply removing it from the powershell implementation is not idea.

I also think removing the functionality is a breaking change and will need to be rolled out properly.

adreyer · 2020-05-22T15:46:09Z

This seems like this is removing functionality from bolt users and a breaking change to the module.

Allowing the "install" task to "ensure" that a new enough version of puppet is present is really useful in the context of bolt and plans. Removing code paths should be transparent to users. This is removing functionality and we should be very careful about doing so until there is an alternative we can recommend for users.

npwalker · 2020-05-22T16:51:25Z

I feel like I'm missing something.

Trying to use the install script on windows to upgrade an agent doesn't work right? So we're just providing a good error message now instead of failing in some less useful way?

So we're not removing functionality we're just providing better error messages.

donoghuc · 2020-05-22T17:07:01Z

My review and comments were based on the commit message and this implementation. dc318aa

donoghuc · 2020-05-22T20:44:09Z

Here is my take on this situation:

At a high level there are two scenarios for an upgrade.

The first is simply upgrading the package (in this case the use case is simply needing the ruby interpreter and the underlying puppet content for using bolt apply). In this case the the powershell task should work just fine (though there may be a bug in stop_service implementation

puppetlabs-puppet_agent/tasks/install_powershell.ps1

Lines 91 to 97 in ec12882

    
           function Cleanup { 
        
               if($stop_service -eq 'true') { 
        
                 C:\"Program Files"\"Puppet Labs"\Puppet\bin\puppet resource service puppet ensure=stopped enable=false 
        
               } 
        
               Write-Output "Deleting $msi_dest and $install_log" 
        
               Remove-Item -Force $msi_dest 
        
               Remove-Item -Force $install_log

where puppet is stopped but not pxp-agent). Similarly this would match what the bash implementation does.

The second use case for an upgrade is to upgrade an agent that is actively being used for enforcement (regardless of whether it is being used with bolt). In this scenario the task should not be used.

One path forward may be to check that the service is running and refuse to upgrade with the task if it is and instead ask the user to execute the manifest code.

Open questions: Does the powershell implementation work if puppet/pxp-agent services are stopped (i think yes according to our acceptance tests)? When using the stop_service parameter are all services associated with the package (including pxp-agent) stopped? Can we move forward with task implementations that check the service status and then refuse to continue if running?

Before this commit, when trying to run the `puppet_agent::install_powershell` task, with a specific version as parameter, it was failing with below error message: Error: Timed out waiting for status response from <node_name> On the node, both `puppet agent` and `pxp-agent` services got stopped and the event log viewer showed: Application or service 'task_wrapper' could not be shut down. Due to Windows agents becoming unresponsive when running this task and as per MODULES-10633 discussions, the `puppet_agent::install_poweshell` task should not be used for upgrading Puppet Agents while either of `puppet agent` or `pxp-agent` services are still running. This Puppet Agent module task will now fail immediately and output a message if any of the two affected services are still running. The task will stop and output a message if the given version is already installed on the node, thus avoiding unnecessary msi download/installation. This behaviour is aligned with the existing implementation for no version given and Puppet Agent already installed on the node.

luchihoratiu · 2020-05-29T07:40:01Z

@donoghuc @adreyer I've updated the pull request with checks for installed version and for affected services. I've also added/modified the tests to cover these changes.

The powershell implementation indeed works if the services are stopped and the stop_service parameter being enabled stops only the puppet agent service (I don't think that we would want any other service to be stopped but also I don't think that using it would help our case in any way).

npwalker · 2020-06-02T14:51:42Z

@donoghuc @adreyer any concerns with the new implementation?

donoghuc · 2020-06-02T15:01:09Z

task_spec/spec/acceptance/init_spec.rb

    results = run_task('puppet_agent::install', 'target', { 'collection' => 'puppet6', 'version' => 'latest' })
    results.each do |res|
      expect(res).to include('status' => 'success')
    end

    # Verify that it upgraded
+    installed_version = nil


What is this for?

Why is this set to nil?

It's there to simply avoid an undefined local variable or method error. Its role is to store the currently installed puppet agent version (since we've previously installed the latest puppet6 version) and it's used below, in the next step, when trying to install the same version again.

I have no doubt that there are better/more elegant approaches to testing this. I just tried to be minimalist with the changes and use the existing steps as much as possible. If there are any concerns, I'm always open to improvement ideas 😃.

I see. I did not see that was used below.

donoghuc

I think this looks good for powershell. Is this the behavior for the shell implementation? I think they should be the same.

npwalker · 2020-06-03T16:45:32Z

I don't see a reason to prevent linux users from using the linux task to upgrade. The problem only exists on windows if I'm understanding correctly.

luchihoratiu · 2020-06-09T14:55:57Z

@donoghuc, the behaviour is the same for running the task with the same puppet agent version specified as the one on the node and without specifying any version when it is already installed on the node. The output message also got aligned for both.

The behaviour is different when trying to upgrade. As @npwalker said, it wouldn't be beneficial for anyone to prevent Linux users from upgrading their agents (only because Windows users can't), especially since it would be just for the sake of consistency.

From my perspective and understanding, this PR is (or at least should be) just a small/quick patch to prevent Windows nodes from getting into a really messy state where no upgrade happens whatsoever, puppet services get killed and PE loses control over the node until the services get manually restarted. For a good consistent behaviour, the end goal should be to improve Windows nodes upgradeability (with a smarter/different approach), rather than the other way around but all of that doesn't seem possible (to me at least) only here, at this level.

donoghuc

I think in general implementations for a task should try aim to be as close in behavior as possible. In this case it seems like the difference is warranted given the limitations of attempting to support such a wide range of OS. I think this is certainly an improvement for the powershell implementation and will avoid painful situation.

luchihoratiu requested a review from a team May 18, 2020 06:07

luchihoratiu force-pushed the MODULES-10653 branch 2 times, most recently from bda5def to ec12882 Compare May 18, 2020 09:51

donoghuc requested a review from a team May 18, 2020 14:39

lucywyman requested a review from beechtom May 21, 2020 23:03

luchihoratiu force-pushed the MODULES-10653 branch 3 times, most recently from 9402efc to cd56e62 Compare May 27, 2020 12:49

luchihoratiu force-pushed the MODULES-10653 branch from cd56e62 to eeb02cd Compare May 29, 2020 07:03

luchihoratiu requested review from donoghuc and adreyer May 29, 2020 07:56

donoghuc reviewed Jun 2, 2020

View reviewed changes

donoghuc approved these changes Jun 2, 2020

View reviewed changes

donoghuc approved these changes Jun 9, 2020

View reviewed changes

gimmyxd approved these changes Jun 10, 2020

View reviewed changes

gimmyxd merged commit 419ea40 into puppetlabs:master Jun 10, 2020

murdok5 mentioned this pull request Aug 9, 2020

(maint) Update readme for clarification on Windows agent updates #502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(MODULES-10653) Failed to upgrade agent using puppet task #494

(MODULES-10653) Failed to upgrade agent using puppet task #494

luchihoratiu commented May 18, 2020 •

edited

Loading

puppetcla commented May 18, 2020

donoghuc commented May 18, 2020

gimmyxd commented May 22, 2020

donoghuc commented May 22, 2020

adreyer commented May 22, 2020

npwalker commented May 22, 2020 •

edited

Loading

donoghuc commented May 22, 2020

donoghuc commented May 22, 2020

luchihoratiu commented May 29, 2020 •

edited

Loading

npwalker commented Jun 2, 2020

donoghuc Jun 2, 2020

donoghuc Jun 2, 2020

luchihoratiu Jun 9, 2020 •

edited

Loading

donoghuc Jun 9, 2020

donoghuc left a comment

npwalker commented Jun 3, 2020

luchihoratiu commented Jun 9, 2020 •

edited

Loading

donoghuc left a comment

(MODULES-10653) Failed to upgrade agent using puppet task #494

(MODULES-10653) Failed to upgrade agent using puppet task #494

Conversation

luchihoratiu commented May 18, 2020 • edited Loading

puppetcla commented May 18, 2020

donoghuc commented May 18, 2020

gimmyxd commented May 22, 2020

donoghuc commented May 22, 2020

adreyer commented May 22, 2020

npwalker commented May 22, 2020 • edited Loading

donoghuc commented May 22, 2020

donoghuc commented May 22, 2020

luchihoratiu commented May 29, 2020 • edited Loading

npwalker commented Jun 2, 2020

donoghuc Jun 2, 2020

Choose a reason for hiding this comment

donoghuc Jun 2, 2020

Choose a reason for hiding this comment

luchihoratiu Jun 9, 2020 • edited Loading

Choose a reason for hiding this comment

donoghuc Jun 9, 2020

Choose a reason for hiding this comment

donoghuc left a comment

Choose a reason for hiding this comment

npwalker commented Jun 3, 2020

luchihoratiu commented Jun 9, 2020 • edited Loading

donoghuc left a comment

Choose a reason for hiding this comment

luchihoratiu commented May 18, 2020 •

edited

Loading

npwalker commented May 22, 2020 •

edited

Loading

luchihoratiu commented May 29, 2020 •

edited

Loading

luchihoratiu Jun 9, 2020 •

edited

Loading

luchihoratiu commented Jun 9, 2020 •

edited

Loading