-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup script causing VLAN issue #606
Comments
Hi, You haven't really given me enough to go on. Specifically, it isn't clear how your Raspberry Pi is connected. Is it:
It also isn't clear which device is implementing your firewall rules. Is that the Pi or something else? A diagram of how you have things set up would really help. In the meantime, I'm going to make some (hopefully) educated guesses about what might be going on; at least to the extent of pointing to what might explain the observed behaviour of the problem resolving itself when you run the backup. I'm assuming you mean the backup script which is supplied with IOTstack, and not my IOTstackBackup scripts. One of the significant differences between the supplied backup script and IOTstackBackup is the supplied script takes your IOTstack down while the backup is running, whereas my IOTstackBackup scripts don't need to do that (one of the reasons why I wrote IOTstackBackup in the first place). With that in mind, assume my stack is running:
How many lines of net filter rules are in place while the stack is running?
Let's simulate the effect of a backup with the "supplied script" by bouncing the stack and checking the filter tables as we go along:
So, without getting into which filter rules are added/removed around stack up/down, let's just suppose a net filter rule in the Pi is going wonky for some reason that we don't yet understand, and assume that explains occasional non-reachability. I can imagine the stack down removing the wonky rule, and the subsequent up restoring the rule to a working state. My knowledge of what actually happens on a reboot while the stack is up is limited but some other behaviours have made me think Docker snap-freezes the state as the machine goes down and thaws the frozen state when the machine comes back. In other words, closer to a "pause" and "unpause" than a "down" and "up". If that's true, I can kinda imagine net filter tables being saved and restored "as is" rather than being withdrawn and recreated. The stack going down/up also changes the Pi's routing table so I'd also be running The other thing I'd be doing is running tcpdump to capture traffic at various points while the problem was present. In thinking about your recent rebuild, did you also rebuild your IOTstack with up-to-date service definitions or did you just do a restore and pick up with your existing
The only other possibility in that list should be If you're interested, I actually have my
That is, I put the Predictable subnets mean I never run into Docker doing a random allocation that collides with something else I'm doing. My rule is 10/8 for ZeroTier, 172.16/12 for Docker, 192.168/16 for me. Having a known subnet for
If your compose file is really ancient, I'd also advise a bit of compare/contrast between your active service definitions and those in the Anyway, hope this helps. |
OK. It happened again today and I did some tests before and after taking the stack down and bringing it back up again. Doing this did result in the network working again. I didn't do the tcpdump but the netstat -rn showed a difference between the working and non-working system: Not working: Kernel IP routing table Working: I'm out of my depth on what is going on here but the bridge at the end appears to be different. My LAN is on 192.168.1.x, the VLAN is 192.168.30.x. I have no idea what 192.168.32.0 is but it is working when it is present. Any thoughts would be much appreciated. |
I think I know what might be going on but I won't know for certain until I can get some more information. Please provide:
Please wrap your ``` Code-fences use monospaced font and respect end-of-lines so the original layout is preserved and everything is much easier to read. |
Thanks for your support with this @Paraphraser . Requested information is below:
Nothing from
|
OK. I think I can explain the problem and tell you how to fix it. tl;drAdd the following line to
Then take down your stack and reboot your machine. Taking the stack down ensures that docker cleans up the networks properly so please don't skip that bit. The detailsWhat follows is me responding to your "I'm out of my depth on what is going on here". I'll try to flesh out the picture with my understanding of how it all hangs together. I might be wrong about some of the details, of course. But I always hope that, if I say something wrong or dumb, people with more knowledge might read it and elect to share. Here's the equivalent (relevant) output from my system:
Here's my routing table:
Stitching this together into what I hope will be a (somewhat) coherent story:
Which brings me back to the original problem. You'll note that, even though my system is running 7 containers, my routing table doesn't have any You're running 10 containers and have 9
So, why does your routing table have
It is because I have the following line in my
You don't have that line. What that line does is tells the DHCP client daemon to do the equivalent of:
This neatly avoids both What seems to be going on is that docker (for I didn't come up with this solution but if you read through Issue 219 you'll see that I faced a similar problem. Basically, if I did a reboot without first taking the stack down, the Pi would appear to freeze on the way up. The solution was provided by GB Smith in Issue 253. It subsequently became one of IOTstack's recommended patches. postscriptThe content of
Will any of this change when Network Manager becomes the go-to solution for Raspberry Pi's (instead of dhcpcd)? No idea. Hope this helps. |
WOW! What a reply. Thank you so much @Paraphraser. I will complete your suggestions at the first opportunity. It will however likely take several re-reads to fully understand all of the background information you have provided but I will definitely learn from it and hopefully others will too. |
I've been trying to get to a bottom of a problem that has plagued me for some time. I have my IOT devices on a separate VLAN with a firewall rule to let me access them from the main LAN. Every so often (once a month or so?) I can't access the Raspberry PI from the LAN, but if I put myself on the IOT VLAN then I can. I can access other devices from the LAN that are on the VLAN so it doesn't look like a firewall issue. It tends to "fix itself" the next day.
As my IOTstack install was quite old and I'd learned quite a bit since the first install, I recently did a clean install on the latest PI OS thinking that would be the end of my network issues. Today however the access to VLAN was again lost. After the usually log checks, reboots of the PI, firewall and network switch, I finally had a thought that the backup script runs at 2am every day. I therefore ran it manually and my network access came back!!!!
Any idea what in the script is causing this issue - especially when a reboot of the PI doesn't fix it, but rerunning the script does?
I've checked the backup log from 2am this morning and the one that I ran this afternoon and there isn't anything that jumps out in the way of errors.
I won't be able to confirm this is definitely the issue until it happens again at some point, but it is the first time I've done something which appears to have immediately fixed the problem so the backup script appears to be the key.
The text was updated successfully, but these errors were encountered: