Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪲 Bug Report - Upgrade Fails for SF VMSS with KeyVault Extension #65

Open
mbrat2005 opened this issue Jun 15, 2023 · 5 comments
Open
Assignees
Labels
Basic LB Migration bug Something isn't working

Comments

@mbrat2005
Copy link
Contributor

mbrat2005 commented Jun 15, 2023

Describe the bug

Upgrade fails due to KV VM extension timeout:

2023-06-14T21:07:57+00 [Information]:############################## Initializing Start-AzBasicLoadBalancerUpgrade ##############################
2023-06-14T21:07:57+00 [Information]:[Start-AzBasicLoadBalancerUpgrade] PowerShell Version: **7.3.4**
2023-06-14T21:07:57+00 [Information]:[Start-AzBasicLoadBalancerUpgrade] AzureBasicLoadBalancerUpgrade **Version: 2.0.19**
...
2023-06-14T21:08:00+00 [Information]:[Test-SupportedMigrationScenario] Checking whether VMSS scale set 'quotavmssdevbn' is a Service Fabric cluster...
WARNING: 2023-06-14T21:08:00+00 [Warning]:[Test-SupportedMigrationScenario] **VMSS appears to be a Service Fabric** cluster based on extension profile. SF Clusters experienced potentically significant downtime during migration using this PowerShell module. In testing, a 5-node Bronze cluster was unavailable for about 30 minutes and a 5-node Silver cluster was unavailabile for about 45 minutes. Shutting down the cluster VMSS prior to initiating migration will result in a more consistent experience of about 5 minutes to complete the LB migration. For Service Fabric clusters that require minimal / no connectivity downtime, adding a new nodetype with standard load balancer and IP resources is a better solution.
Do you want to proceed with the migration of your Service Fabric Cluster's Load Balancer?
...
2023-06-14T21:08:00+00 [Information]:[PublicLBMigration] **Public Load Balancer Detected**. Initiating Public Load Balancer Migration
...
2023-06-14T21:24:34+00 [Information]:[NatRulesMigration] Waiting for saving standard load balancer LB-quota-cluster-dev-bn job to complete...
2023-06-14T21:24:34+00 [Information]:[NatRulesMigration] Nat Rules Migration Completed
2023-06-14T21:24:34+00 [Information]:[InboundNatPoolsMigration] Initiating Inbound NAT Pools Migration
2023-06-14T21:24:34+00 [Information]:[InboundNatPoolsMigration] Adding Inbound NAT Pool LoadBalancerBEAddressNatPool to Standard Load Balancer
2023-06-14T21:24:34+00 [Information]:[InboundNatPoolsMigration] Saving Standard Load Balancer LB-quota-cluster-dev-bn
2023-06-14T21:24:49+00 [Information]:[InboundNatPoolsMigration] Waiting for saving standard load balancer LB-quota-cluster-dev-bn job to complete...
2023-06-14T21:24:49+00 [Information]:[GetVmssFromBasicLoadBalancer] Initiating GetVmssFromBasicLoadBalancer
2023-06-14T21:24:49+00 [Information]:[GetVmssFromBasicLoadBalancer] Getting VMSS object '/subscriptions/.../resourcegroups/azure-quota-dev-eastus2/providers/microsoft.compute/virtualmachinescalesets/quotavmssdevbn' from Azure
2023-06-14T21:24:49+00 [Information]:[GetVmssFromBasicLoadBalancer] VMSS loaded Name quotavmssdevbn from RG azure-quota-dev-eastus2
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Adding InboundNATPool to VMSS quotavmssdevbn
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Checking if VMSS 'quotavmssdevbn' NIC 'NIC-azure-quota-dev-eastus2' IPConfig 'NIC-azure-quota-dev-eastus2' should be associated with NAT Pool 'LoadBalancerBEAddressNatPool'
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Adding NAT Pool 'LoadBalancerBEAddressNatPool' to IPConfig 'NIC-azure-quota-dev-eastus2'
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Migrate NetworkInterface Configurations completed
2023-06-14T21:24:49+00 [Information]:[InboundNatPoolsMigration] Saving VMSS quotavmssdevbn
2023-06-14T21:24:49+00 [Information]:[UpdateVmss] Updating configuration of VMSS 'quotavmssdevbn'
2023-06-14T21:25:04+00 [Information]:[UpdateVmss] Waiting for job (id: '5') updating VMSS 'quotavmssdevbn' to complete...
...
2023-06-14T23:10:50+00 [Information]:[UpdateVmss] Waiting for job (id: '5') updating VMSS 'quotavmssdevbn' to complete...
InvalidOperation: Long running operation failed with status 'Failed'. Additional Info:'Provisioning of VM extension **KvVmExtension** has timed out. Extension provisioning has taken too long to complete. The extension did not report a message. More information on troubleshooting is available at https://aka.ms/vmextensionwindowstroubleshoot'
ErrorCode: VMExtensionProvisioningTimeout
ErrorMessage: Provisioning of VM extension KvVmExtension has timed out. Extension provisioning has taken too long to complete. The extension did not report a message. More information on troubleshooting is available at https://aka.ms/vmextensionwindowstroubleshoot
ErrorTarget: 0
StartTime: 6/14/2023 9:24:52 PM
EndTime: 6/14/2023 11:10:27 PM
OperationID: 85ee53b5-9ce3-4458-9edd-f46e8c7baf02
Status: Failed
Write-Error: 2023-06-14T23:10:50+00 [Error]:[InboundNatPoolsMigration] An error occured when attempting to update VMSS network config on the new Standard LB backend pool membership. To recover address
the following error, and try again specifying the -FailedMigrationRetryFilePath parameter and Basic Load Balancer backup State file located either in this directory or the directory
specified with -RecoveryBackupPath

To Reproduce

Steps to reproduce the behavior:

  1. VMSS
  2. Public LB
  3. KVVMExtension [this case, extension adds a cert to local store, auto upgrade disabled]
  4. SF Cluster [?]

Additional context - please include:

See log

@mbrat2005 mbrat2005 added bug Something isn't working Basic LB Migration labels Jun 15, 2023
@mbrat2005 mbrat2005 self-assigned this Jun 15, 2023
@mbrat2005
Copy link
Contributor Author

This issue is reportedly intermittent...still working to repro

@mbrat2005
Copy link
Contributor Author

Closing due to lack of activity and reproducibility

@The-DevBlog
Copy link

@mbrat2005 Im experiencing the same issue. Did you ever find a solution?

@mbrat2005
Copy link
Contributor Author

mbrat2005 commented Jul 3, 2024 via email

@mbrat2005
Copy link
Contributor Author

@AndrewCS149 I haven't made progress on this one, since I couldn't seem to repro it. Would you be able to share your upgrade log for details? Also, are you upgrading a basic LB for a Service Fabric Cluster?

@mbrat2005 mbrat2005 reopened this Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Basic LB Migration bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants