Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(modify): Repair the issue of channel updating #1280

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

CubeZ2mDeveloper
Copy link

When I update channels in zigbee2mqtt and restart, I find that some devices fail to be controlled. I discovered that when changing channels, the value of the nwkUpdateId field remains at 0.

I referred to the ZigBee specification and found the instructions regarding the modification of the nwkUpdateId field when changing channels:

"The network manager should broadcast a Mgmt_NWK_Update_req notifying devices of the new channel. The broadcast shall be to all devices with RxOnWhenIdle equal to TRUE. The network manager is responsible for incrementing the nwkUpdateId parameter from the NIB and including it in the Mgmt_NWK_Update_req. The network manager shall set a timer based on the value of apsChannelTimer upon issue of a Mgmt_NWK_Update_req that changes channels and shall not issue another such command until this timer expires. However, during this period, the network manager can complete the above analysis. However, instead of changing channels, the network manager would report to the local application using Mgmt_NWK_Update_notify and the application can force a channel change using the Mgmt_NWK_Update_req."

let nwkUpdateId: number = 0x00;
const isSupportsBackup = await this.adapter.supportsBackup();
if (isSupportsBackup) {
const backup = await this.adapter.backup(this.getDeviceIeeeAddresses());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think making a new backup is necessary here? Can't we get the value from the existing backup?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
Probably the property should be added to the adapter type NetworkParameters, since it's part of that, and then can be used throughout controller without issue (cached on start).
Remains to see if all adapters provide it though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will try to change it to the implementation method you suggested.

@Koenkk Koenkk requested a review from Nerivec December 30, 2024 13:54
@Nerivec
Copy link
Collaborator

Nerivec commented Dec 30, 2024

Was this tested against the described problem?
Some devices are expected to not switch (bad design), some devices require activation to wake them up, then they switch after a few seconds (as described in docs).
I'd expect if this was required, all devices would fail to switch, which they do not (I've not found a device, even crappy Tuya, that doesn't switch with current implementation, on my end, several users also tested it and reported no issue).

From what I can tell, that quoted paragraph of the spec is for Upon receipt of a Mgmt_NWK_Unsolicited_Enhanced_Update_notify message, which this is not.

In the paragraph about the reception of nwk update req by a device, that value doesn't seem to matter.

3) If the ScanDuration parameter is equal to 0xfe, the message is a command to change channels. The device SHALL do the following.
  a) If the nwkNextChannelChange value in the NIB is non-zero, do the following. Compare the channel to change received over the air to the value in the NIB. If the values do not match, do the following:
    i) Follow the Error Response procedure setting the status to NOT_AUTHORIZED.
    ii) The request SHALL be dropped and no more processing SHALL take place.
  b) If there is more than 1 channel indicated in the ScanChannels bitmap (if the message is Mgmt_NWK_Update_req) or in the ScanChannelsListStructure (if the message is Mgmt_NWK_Enhanced_Update_req), then this is an invalid request. Do the following:
    i) Follow the Error Response procedure setting the status to INV_REQUESTTYPE.
    ii) The request SHALL be dropped and no more processing SHALL take place.
  c) The receiving device SHALL determine if the channel is one within the range of all supported channels.
    i) Examine the SupportedChannels element for each entry in the nwkMacInterfaceTable, and determine if there is a match within the received ScanChannels bitmap or ScanChannelsListStructure.
    ii) If no match is found, do the following:
      (1) Follow the Error Response procedure setting the status to INV_REQUESTTYPE.
      (2) The request SHALL be dropped and no more processing SHALL take place.
    iii) If a match is found, perform a channel change.
      (1) Execute a MLME-SET.request for the PIB value phyCurrentPage.
      (2) Execute a MLME-SET.request for the PIB value phyCurrentChannel.
      (3) No further processing SHALL be done.

As far as I can tell, the case where the nwkUpdateId would be used would be if a router decides to rejoin (for some reason), and it finds the same network on at least two different nodes, then it should pick the node with the higher nwkUpdateId (if any). But that scenario should not matter for a channel change, since routers receive the broadcast, and switch immediately. Leaving any offline router eventually powered back on, with only the new network available to rejoin.

The spec is a bit lacking on that nwkUpdateId, which I noted in the implementation a while back:

// TODO: What does "This value is set by the Network Channel Manager prior to sending the message." mean exactly??
// (isn't used/mentioned in EmberZNet, confirmed working if not set at all for channel change)
// for now, allow to bypass with undefined, otherwise should throw if undefined and duration passes below conditions (see NwkEnhancedUpdateRequest)
if (nwkUpdateId !== undefined && (duration === 0xfe || duration === 0xff)) {
this.writeUInt8(nwkUpdateId);
}

Short version: I don't expect it would solve the problem at hand?
I'm also a little worried about changing this considering the various stacks involved, because it was tested rather extensively during implementation, and after by several users and no particular issue was found that couldn't be attributed to devices being "too cheap". Hopefully, at worse, it does nothing... but I half expect that most devices just don't use the value at all...

@@ -56,6 +94,7 @@ const mocksendZclFrameToGroup = vi.fn();
const mocksendZclFrameToAll = vi.fn();
const mockAddInstallCode = vi.fn();
const mocksendZclFrameToEndpoint = vi.fn();
const mockApaterBackup = vi.fn().mockReturnValue(mockDummyBackup);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const mockAdapterBackup = vi.fn(() => Promise.resolve(mockDummyBackup));

should provide more accurate typing/testing (and fix typo).
(A lot of others also need refactoring, but might as well add new ones properly.)

@CubeZ2mDeveloper
Copy link
Author

I added the same device to Home Assistant, then changed the channel. I found that it can be controlled normally in Home Assistant, and I also discovered that Home Assistant will increase the value of nwkUpdateID.

@CubeZ2mDeveloper
Copy link
Author

Hi, @Koenkk.
I would like to ask about the progress of this PR.
We currently have a project that needs to use this feature in zigbee2mqtt.
I can send the problematic device to you if necessary.

@Nerivec
Copy link
Collaborator

Nerivec commented Jan 20, 2025

Was this tested in Z2M against the initial problem?
And, if it does fix it, we need to implement this for all adapters (state management for nwkUpdateId, make sure it's properly reported by adapters, etc.).

@CubeZ2mDeveloper
Copy link
Author

I have already tested the code on z2m, my device can be controlled normally after the fix.
Currently, the NwkUpdateId state has been maintained on both the Ember and EZSP adapters, but I am not familiar with the protocol of other adapters, so it has not been implemented yet.

@CubeZ2mDeveloper
Copy link
Author

After pulling the latest zigbee2mqtt code from the master branch, I started z2m using a Dongle that supports the zstack protocol stack and modified the channel. After changing the channel, the device could be controlled normally on zstack, and the nwkUpdateId incremented by 1. However, when using an Ember Dongle, the device control failed after changing the channel, and the nwkUpdateId remained unchanged, still at 0.

Since our users are currently encountering related issues, this PR might help resolve the problem. Could you let me know about your plans for this PR? For example, are there any further changes needed, or do you have an estimated timeline for merging it? If there’s anything we can do to assist, such as providing devices for testing, please feel free to let us know. Looking forward to your reply, thank you!

@@ -74,4 +74,5 @@ export interface NetworkParameters {
panID: number;
extendedPanID: string; // `0x${string}` same as IEEE address
channel: number;
nwkUpdateID?: number;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be optional, and support should added be in all drivers.


The change should also be tested at least in zstack and ember (the two officially supporting change channel) with at least a few devices that worked before, to make sure they still work after this change. Since it doesn't appear to be required for a change channel (based on spec & previous testing with many devices), it might have negative impact on other aspects (like some devices not expecting a changed nwkUpdateID when receiving a change channel request, and ignoring the request entirely).
In that regard, can you confirm which devices (models) are having the issue in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the main devices we have seen issues with are S31ZB, S40ZBTPB and S26R2ZB.

Regarding the necessity of changing channels, I’d like to share some real-world user experiences: in some households with complex network conditions (e.g., high Wi-Fi interference), changing channels has significantly improved the operational experience of devices. That said, I fully acknowledge that this might introduce some negative impacts, such as potential incompatibility with certain devices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was not about changing channel (which, of course, is very useful), it was about the changes from this PR, which might fix a couple of oddly working devices, but create issues with others (since it's working without this PR for most devices, and ZigBee spec seems to mostly target PAN change for ID increment, not channel change). This is exacerbated by the fact that the 3 mentioned devices are likely the same internally (all basic sonoff plugs).

I'd like to confirm that other brands/models still work with this PR, at least Tuya, Hue, Inovelli, Ikea, Ledvance for a good sample, and including routers & end devices.
If this PR breaks any brand/model/firmware provider, it will break any according network that tries to change channel after this is merged... We need a good degree of certainty that it won't, hence actual tests with various networks.

Copy link
Author

@CubeZ2mDeveloper CubeZ2mDeveloper Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the change of nwkUpdateID during channel changes, based on my understanding of the Zigbee specification (which I’ve uploaded as an image), incrementing the nwkUpdateID value is necessary. Additionally, when I performed channel changes using the same dongle on ZHA, devices that failed to work properly after a channel change in Z2M were able to function normally after a channel change in ZHA.

I fully understand your concerns about the potential impact of this PR on devices that are already working correctly. The brands you mentioned (Tuya, Hue, Inovelli, Ikea, Ledvance) indeed require comprehensive testing. However, due to limited device availability, I can only test with the devices I currently have. I will provide you with a detailed list of tested devices shortly, so you can see the coverage of my testing.

05-3474-23-csg-zigbee-specification-compressed_Page_1
05-3474-23-csg-zigbee-specification-compressed_Page_2

Copy link
Collaborator

@Nerivec Nerivec Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated images to latest rev.
I find that part of the spec is not very clear:

  • could be specific: Upon receipt of a Mgmt_NWK_Unsolicited_Enhanced_Update_notify message
  • logic is optional/not strictly defined: MAY do the following

Which, at best explains why it's not needed in most cases, most stacks likely just ignore it, because it's easier and not required.
There are two scenario that concern me:

  • a device stack could ignore the update request if the nwkUpdateID is different from current due to some weirdness in implementation (or some leftovers from older spec revisions)
  • a coordinator stack could be overriding the nwkUpdateID logic internally, resulting in mismatch

Also, it would seem the devices you mentioned have a firmware quirk no matter what, because it shouldn't matter for routers that are online at the time of broadcast (they should just switch channel).

I don't have enough devices either for something like this. I had a few users test it out when it was first implemented to confirm with a bigger sample.
Koenkk will confirm zstack impl (no other way than retrieving from NV?) and run some tests too.
Once we confirm with a few (non-affected) brands, it should be fine to merge.
Need to fix the zeroes for other stacks though:

And also, this is likely to require some tests to be updated in Z2M repo since it changes the return type of https://github.com/Koenkk/zigbee2mqtt/blob/dev/lib/zigbee.ts#L216

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to confirm that other brands/models still work with this PR, at least Tuya, Hue, Inovelli, Ikea, Ledvance for a good sample, and including routers & end devices.

I tried changing channel with this change and the following devices 3 times and it worked every time, so looks OK to me (Elivco = Tuya)

Screenshot 2025-02-07 at 21 35 03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants