Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no route to host #221

Open
ajkessel opened this issue Jan 11, 2025 · 22 comments
Open

no route to host #221

ajkessel opened this issue Jan 11, 2025 · 22 comments
Labels
bug Something isn't working

Comments

@ajkessel
Copy link

Running the bridge on MacOS connecting to another MacOS box running the Bluebubbles server, I periodically get the error:

2025-01-11T12:35:01-05:00 ERR Error making GET request error="Get \"http://[redacted]\": dial tcp [redacted]: connect: no route to host" component=bluebubbles
2025-01-11T12:35:01-05:00 ERR Failed to get server info from BlueBubbles error="Get \"[redacted]\": dial tcp [redacted]: connect: no route to host" component=bluebubbles

When this happens, the Bluebubbles server appears to be running fine. And I can access that server from the box running the bridge by telnetting to the Bluebubbles port (i.e., here telnet server-ip-address 1234). But for some reason the bridge reports "no route to host" with the same IP address and port.

I'm not sure how to isolate the cause of this error but am happy to run further diagnostics to identify it.

@ajkessel ajkessel added the bug Something isn't working label Jan 11, 2025
@dltacube
Copy link
Collaborator

Are those private or public ip addresses? Would this happen if you say ran caffeinate -d on both machines? Just trying to rule out your mac falling asleep and then being woken up for new network connections.

@ajkessel
Copy link
Author

ajkessel commented Jan 13, 2025

I've run into this with different topologies, but currently I have a regular Mac running the bridge and a guest VM on that same Mac running Bluebubbles. Both the host Mac and the guest Mac are on the LAN with local IP address. Sleep is completely disabled on both.

I suspect the issue may be that the bridge is occasionally/randomly picking the wrong network interface. I can telnet from the host to the guest IP address and Bluebubbles port, and I can ssh from the host to the guest IP address, at the same time that the bridge is reporting no route to host. So I think the bridge is incorrectly and inconsistently determining how to access this IP address, maybe because it doesn't have logic for selecting the right nic?

But to answer your question directly: both the bridge and Bluebubbles/iMessages are running on private/LAN internal IP addresses (192.168...). Those two boxes can ping, telnet, and ssh each other, even when the Bridge is reporting no route to host.

@dltacube
Copy link
Collaborator

I would suspect the bridge (golang networking libraries) to pick whatever interface is in your route tables. You could possibly run arp -a the next time it goes down to see if there's an ambiguous path or disable every interface except for the one you need.

Another thing to make check and I'm sure you already did this but you're using ip addresses rather than domain names local or otherwise right? And no routers sitting between the 2 macs (though I'm sure you probably have some switches)? Lastly, maybe your computers are switching from 5ghz to 2.4ghz and that's breaking the websocket...though there is a retry mechanism but I know for a fact we haven't tested for that kind of edge case.

@ajkessel
Copy link
Author

In the current configuration, the two Macs are literally the same box. There is the host (real) Mac running the bridge, and then the guest (VM) Mac running iMessages/Bluebubbles. So there is no network infrastructure (WiFi, Ethernet, or otherwise) in between them. And the bridge is only referencing the virtual Mac via hard-coded IP address. I can try running arp -a next time it happens although when it does happen, I'm unable to replicate the error with telnet, ping, or ssh -- it seems particular to the bridge (or, presumably, the go networking libraries).

@ajkessel
Copy link
Author

I should also mention: I think when this happens if I disable WiFi on the Mac and then restart the Bridge, it's fixed. But the problem is inconsistent enough that I'm not sure if it's a coincidence. The Mac has both an Ethernet and a WiFi connection (the latter is necessary for features like unlock-with-Apple-Watch even if you have a hardwired Ethernet connection), but the Bridge should only be communicating with the VM Mac that is running on the same device with its own IP address in the same subnet.

@dltacube
Copy link
Collaborator

Oh that's a completely different setup than I imagined and renders all of my clarifying questions moot. The host<->vm route is definitely not going to randomly change. This might be out of my depth. Is there any hint of the retry mechanism kicking in at least? If not then that might be something we could add. Right now it'll retry if the connection is lost which I assume differs from your case in the sense that the bridge still has a valid route for its next attempts.

You haven't timed how long it takes to lost the connection have you? It couldn't be the vm going into sleep mode? I know that UTM has options like that and also noticed people saying caffeinate doesn't work as well as amphetamine...maybe try that?

utmapp/UTM#4963 (comment)

@ajkessel
Copy link
Author

I have sleep disabled on both the host and the guest. I'm pretty sure neither is sleeping -- they both respond instantly to ssh. Also, if it were really a sleep thing, shouldn't it start working after they wake up, i.e. once I'm logged into both with ssh? In my case, the "no route to host" seems to persist indefinitely until, at least, I kill a relaunch the bridge.

And as to your retry question: it doesn't seem like retry ever succeeds. It just keeps reporting no route to host.

@ajkessel
Copy link
Author

ajkessel commented Jan 15, 2025

Per your suggestion above, I just ran arp -a while the problem occurred. Does this provide any clues? As usual, I can connect fine to the guest VM box (192.168.35.131) from the host machine via ssh or telnet; it's only the bridge that reports no route to host.

The host machine is 192.168.35.1 on the VM bridge interface, 192.168.4.3 on the LAN (Ethernet). WiFi is currently disabled, and the 172.x.x.x subnets are for Docker (should be unrelated to the Bridge which is not running in Docker).

? (169.254.90.89) at b0:41:6f:10:60:88 on en0 [ethernet]
? (169.254.123.159) at (incomplete) on en0 [ethernet]
? (169.254.186.94) at 2:55:ed:b4:d9:c6 on en0 [ethernet]
? (169.254.231.12) at 80:6d:97:1e:ac:23 on en0 [ethernet]
? (172.16.134.1) at 22:e2:a8:7b:ea:64 on bridge100 ifscope permanent [bridge]
? (192.168.2.1) at 82:ba:4b:a6:d4:0 on bridge0 ifscope permanent [bridge]
? (192.168.4.1) at 48:dd:c:c2:b1:ad on en0 ifscope [ethernet]
server (192.168.4.2) at b0:41:6f:10:60:88 on en0 ifscope [ethernet]
? (192.168.4.3) at 20:e2:a8:b7:d4:a7 on en0 ifscope permanent [ethernet]
? (192.168.4.195) at 6c:4a:85:3:8e:80 on en0 ifscope [ethernet]
? (192.168.4.211) at 0:c:8a:4e:4e:b0 on en0 ifscope [ethernet]
? (192.168.4.216) at f2:15:12:18:f3:fe on en0 ifscope [ethernet]
? (192.168.4.220) at a0:80:69:6e:1a:db on en0 ifscope [ethernet]
? (192.168.4.234) at 4c:11:ae:c5:bf:d0 on en0 ifscope [ethernet]
? (192.168.4.243) at 7c:57:58:2a:dc:e3 on en0 ifscope [ethernet]
? (192.168.4.250) at b0:10:41:dd:c4:4e on en0 ifscope [ethernet]
? (192.168.4.251) at f0:d1:a9:a:23:43 on en0 ifscope [ethernet]
? (192.168.35.1) at 22:e2:a8:7b:ea:65 on bridge101 ifscope permanent [bridge]
? (192.168.35.2) at 0:50:56:e9:e3:7e on bridge101 ifscope [bridge]
? (192.168.35.131) at 0:c:29:5f:47:11 on bridge101 ifscope [bridge]
mdns.mcast.net (224.0.0.251) at 1:0:5e:0:0:fb on en0 ifscope permanent [ethernet]
mdns.mcast.net (224.0.0.251) at 1:0:5e:0:0:fb on bridge101 ifscope permanent [ethernet]

@dltacube
Copy link
Collaborator

Your arp output looks good.

When you say they both respond to ssh, are you initiating a new connection? Because that can and should trigger wake-on-lan and I believe behaves differently from trying to re-establish an existing one, which is what the bridge does.

Any chance you could run amphetamine on the host and VM for a day to see if that has any effect?

Otherwise I need to brush up on my networking and dig into how the retry mechanism works under the hood.

@ajkessel
Copy link
Author

Yes, I'm establishing a new connection via ssh or telnet to the bridge IP & port.

This has happened while I'm actually logged in to both boxes and watching the logs on each, so I don't think it could possibly be sleep, but I will try amphetamine just to definitively rule that out.

I also noticed occasionally a different error:

2025-01-15T13:56:49-05:00 ERR Error polling messages from WebSocket error="read tcp 192.168.35.1:54525->192.168.35.131:1234: use of closed network connection" component=bluebubbles

@dltacube
Copy link
Collaborator

It's not sleep then. Must be VM related.

@ajkessel
Copy link
Author

I suppose it could be VM related although I was getting this same problem a while back when I had the bridge and BB running on two real devices. It seems like there must be something with the go network stack, no?

@ajkessel
Copy link
Author

My new working theory is a recently-added MacOS Sequoia security feature that blocks applications from LAN access unless they specifically request an entitlement. I'll dig some more and report back.

@ajkessel
Copy link
Author

I think I'm getting close. It appears if I launch the bridge from the terminal on the Mac, it's fine.

If I launch the bridge as a cron job or in a screen session, it is eventually blocked from accessing the local network, which includes the hosted VM but would also occur if the bridge were running on another box in my LAN.

Sequoia introduced a security feature that blocks apps from accessing the local network without first requesting and receiving user permission.

So I think to eliminate this problem, mautrix-imessages needs to trigger the permissions request. Otherwise at least in some circumstances the OS will prevent it from accessing local IP addresses.

Anyone know if that is even possible when running a locally compiled go executable, as opposed to a .app type package?

@ajkessel
Copy link
Author

Here's a Reddit thread about the overall issue.

@dltacube
Copy link
Collaborator

dltacube commented Jan 16, 2025

That's very strange that it would let you access it for a while and then revoke the permission or cut off the connection.

I have seen this issue with users using systemctl to turn their bridge into a service, so you might be right that it breaks when not run directly.

Out of curiosity, what are you using to run your VM?

/edit btw I run a vm as well but everything including bluebubbles + bridge run on it. The github page for the qemu tooling around it also recommends using Ventura but I can't say for sure why: https://github.com/kholia/OSX-KVM

Here are some tweaks for running a vm on a macos host: https://github.com/kholia/OSX-KVM/blob/master/notes.md#tweaks-for-macos

@ajkessel
Copy link
Author

ajkessel commented Jan 16, 2025

That's very strange that it would let you access it for a while and then revoke the permission or cut off the connection.
I have seen this issue with users using systemctl to turn their bridge into a service, so you might be right that it breaks when not run directly.

Based on further experimentation, it seems to be something about how the bridge (and perhaps other go apps) seek local network access.

When I get this error, if I start a screen session on the host device, I can replicate the no route to host error by attempting to telnet to the guest IP address and Bluebubbles port. Interestingly, if I run an identical telnet command not in screen, it works fine. Or if I use nc instead of telnet in screen, it also works fine.

So my takeaway is MacOS is looking at the context from which a LAN connection request is made and deciding to selectively block it, and when I run mautrix-imessages from a visible terminal window, it's fine, but when it's launched in the background, it gets blocked.

The fix, if possible, would be for mautrix-imessages to get full LAN access permissions in Sequoia, possibly by requesting and receiving it from the user. Anyone know how to do that?

Out of curiosity, what are you using to run your VM?

VMWare Fusion, although this issue appears to be unrelated to the VM or even Bluebubbles. I have MacOS/iMessages/Bluebubbles running in a VM to minimize exposure on a SIP-disabled device. I've used QEmu in the past but it stopped working for me at some point with MacOS upgrade and then I started getting kicked out of iCloud with my virtual device. The VMWare Fusion box, which I installed from a fresh Sonoma image and then upgraded to Sequoia, has been stable so far and not iCloud issues.

@ajkessel
Copy link
Author

@dltacube
Copy link
Collaborator

dltacube commented Jan 16, 2025

I think you nailed it. Are you running the bridge manager or the imessage bridge directly? Cause that'll add a layer. No firewall too I suppose?

I just looked at Privacy & Security tab in system settings and you can't manually add a binary unfortunately.

I'm confused by the outcome of mjtsai's post because it seems to imply that accepting the prompt (if there is one) doesn't work but also that it needs to request local network permissions.

And, apparently, even approving the prompt doesn’t work.

A third-party app or launch agent that wants to interact with devices on a user’s local network must ask for permission the first time that it tries to browse the local network.

Otherwise yea, maybe we can add this.

@ajkessel
Copy link
Author

I was running the bridge either directly from the command-line or as a cron job. It seems to work fine from terminal window but not in a screen session or from cron. Presumably terminal.app at some point sought and received LAN permission, and a process run from "bare" terminal inherits those permissions, but not in screen or from cronjob.

I have firewall completely disabled.

@dltacube
Copy link
Collaborator

That article you posted also mentioned using launchd as root but that's probably not something you want to do.

@ajkessel
Copy link
Author

Yeah, I'm trying to keep the pieces as isolated as possible--running BB in VM, and then running the bridge on a separate unprivileged account.

For now, I can just move the bridge to a Linux box as that seems to work fine. It can still talk directly over the LAN to BB in the Mac VM running on the real Mac. But hopefully we can fix the underlying permissions issue so others don't get stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants