Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dongleman_daemon.py stuck #9

Open
mo3r31337 opened this issue Nov 10, 2022 · 20 comments
Open

dongleman_daemon.py stuck #9

mo3r31337 opened this issue Nov 10, 2022 · 20 comments

Comments

@mo3r31337
Copy link

Hello,
I faced with problem that after around in 12hours of idling the dongleman_daemon is hangs and didn't anything until restart daemon. Is it just me or someone else having this problem?
Thank for any advice to help fix this

@sergei-mironov
Copy link
Owner

Hi, dongleman_daemon runs two async tasks at https://github.com/grwlf/asterisk-dongle-setup/blob/master/python/dongleman_daemon.py#L174 and waits for events. In case no events come, it is expected to do nothing. Are you sure that your case is not a desired behavior?

@mo3r31337
Copy link
Author

mo3r31337 commented Nov 10, 2022

No, I don't think this is what is expected. Because I check using ussd with the code from asterisk and at such moments the daemon does not redirect the output to telegram, but if the daemon is restarted, the message will come instantly.
I suspect that if there is some activity from time to time, then there will be no such problem, I need to experiment with crontab, call a ussd request every few hours.

@sergei-mironov
Copy link
Owner

sergei-mironov commented Nov 12, 2022

The code doesn't have debug facilities so you probably want to add appropriate prints and try to debug the problem.

The path you might need to check looks like follows:

  1. Asterisk receives an SMS and launches a hangup handler here. Does it log the message you are missing?
  2. Dongleman_send.py puts the message in queue and triggers the filesystem notify event. I need to say that this donleman_spool library is prone to errors since I wrote it ad-hoc. Can you see the file corresponding to the message?
  3. dongleman_daemon.py listens for inotify events here https://github.com/grwlf/asterisk-dongle-setup/blob/63cfdd99da8ebef97aa9157413e140c0563a6506/python/dongleman_daemon.py#L77 it should notice the presence of new file in the pool and process it. What does it do in reality?

One moment is bothering me: you are talking about USSD messages. I din't test anything besides SMS and voicecalls, not sure what asterisk does upon receiving USSD.

@mo3r31337
Copy link
Author

Let me try to explain what it looks like. I use this setup on an arm microcomputer with one huawei e1550 modem to serve only one of my mobile numbers while I'm in roaming. Since I don't get many calls and sms, it looks like the telethone library session is being dropped by the telegram servers due to inactivity. At this time, in the spool directory of dongleman, I see the json files of the queue, they are successfully created, but they are not sent to the telegram account. But, if I restart dongleman_daemon.py, then it will immediately send the entire queue to the telegram account. I'm using ussd because it's a free way to test telegram forwarding functionality.
For testing, I created a timer in systemd that calls this asterisk -x "dongle ussd dongle0 *100#" command every two hours. I have been testing the last two days with this timer and there is no problem with losing the telegram session. That is, it is necessary to create an activity so that the telegram session does not freeze.
This is how I see the problem

@mo3r31337
Copy link
Author

For USSD I've add this to extensions.conf file, right after sms section.

exten => ussd,1,Verbose(USSD-IN ${CALLERID(num)} ${USSD_BASE64})
same => n,Set(MSG=--message-base64=${USSD_BASE64})
same => n,Hangup()

@mo3r31337
Copy link
Author

@grwlf
It seems you are right. You pointed me in the right direction, after some time dongleman_daemon.py stops responding to the creation of new files in the /tmp/dongleman/spool/queue directory, while the connection to telegram servers is established and dongleman answers to the voice calls if I call him via telegram. I can also make an outgoing call from telegram via asterisk and chan_dongle

@sergei-mironov
Copy link
Owner

sergei-mironov commented Nov 18, 2022

Interesting. I've reviewed the code and want to say that of cause the listen_system_commands handler almost certainly has problems:

  1. Files from /queue are removed only if the control flow returns to spool_iterate without exceptions.
  2. If processing of some file lead to exception, then
    • It is not removed
    • Other files will not be processed
    • The inotify event is not repeated

As a consequence, a single problematic file may cause the daemon to stall. Could you please try the latest commit and/or monitor the logs? Exception text should appear in console due to this print

@mo3r31337
Copy link
Author

Ok, I've update the script.

<WS (connecting as dongleman-ari-app)
WS> Connected!
Event(wd=1, mask=<Mask.MOVED_TO: 128>, cookie=5183, name=PosixPath('00000000.json'))
Processing path /tmp/dongleman/spool/queue/00000000.json
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))

Left the demon to work. But I think the problem is not in the wrong file, the last few days I have been checking if the daemon is working by copying a known correct json file into the queue directory. And when the daemon is stuck, it just does nothing when a new file appears, even inotify does not report that the file was created on the file system.

@mo3r31337 mo3r31337 reopened this Nov 21, 2022
@mo3r31337
Copy link
Author

It is strange, but with fix from this e055c46 commit the dongleman_daemon running fine for the last two days

@sergei-mironov
Copy link
Owner

It is strange, but with fix from this e055c46 commit the dongleman_daemon running fine for the last two days

I realized that the reason could be simpler - before the commit the daemon may have raised some unhandled exception leading to its termination. By the commit I now catch all the exceptions so now the daemon should print an error but continue to work.

I would be glad if you share some logs to help me figure out what exceptions do you have from it.

@mo3r31337
Copy link
Author

Hello, now I have only these errors and daemon stuck again

Processing path /tmp/dongleman/spool/queue/00000000.json                         
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))  
Event(wd=1, mask=<Mask.MOVED_TO: 128>, cookie=3205, name=PosixPath('00000000.json
'))                                                                              
Processing path /tmp/dongleman/spool/queue/00000000.json                         
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))  
Attempt 1 at connecting failed: TimeoutError:                                    
Attempt 2 at connecting failed: TimeoutError:                                    
Attempt 3 at connecting failed: TimeoutError:                                    
Attempt 4 at connecting failed: TimeoutError:                                    
Attempt 5 at connecting failed: TimeoutError:                                    
Attempt 6 at connecting failed: TimeoutError:                                    
Attempt 1 at connecting failed: TimeoutError:                                    
Attempt 2 at connecting failed: TimeoutError:                                    
Attempt 3 at connecting failed: TimeoutError:                                    
Attempt 4 at connecting failed: TimeoutError:                                    
Attempt 5 at connecting failed: TimeoutError:                                    
Attempt 6 at connecting failed: TimeoutError:                                    
Attempt 1 at connecting failed: TimeoutError:                                    
Attempt 2 at connecting failed: TimeoutError:                                    
Attempt 3 at connecting failed: TimeoutError:                                    
Attempt 4 at connecting failed: TimeoutError:                                    
Attempt 5 at connecting failed: TimeoutError:                                    
Attempt 6 at connecting failed: TimeoutError: 

After losing the internet connection, the daemon got stuck again. It also does not react to the creation of new files in the queue directory.

@mo3r31337
Copy link
Author

mo3r31337 commented Nov 29, 2022

Attempt 5 at connecting failed: TimeoutError: 
Attempt 6 at connecting failed: TimeoutError:
Attempt 1 at connecting failed: TimeoutError: 
Attempt 2 at connecting failed: TimeoutError: 
Attempt 3 at connecting failed: TimeoutError:
Attempt 4 at connecting failed: TimeoutError: 
Attempt 5 at connecting failed: TimeoutError:
Attempt 6 at connecting failed: TimeoutError: 
Attempt 1 at connecting failed: TimeoutError: 
Attempt 2 at connecting failed: TimeoutError: 
Attempt 3 at connecting failed: TimeoutError: 
Attempt 4 at connecting failed: TimeoutError: 
Attempt 5 at connecting failed: TimeoutError: 
Attempt 6 at connecting failed: TimeoutError: 
Automatic reconnection failed 5 time(s)
Future exception was never retrieved
future: <Future finished exception=ConnectionError('Connection to Telegram failed 5 time(s)')>
ConnectionError: Connection to Telegram failed 5 time(s)
Event(wd=1, mask=<Mask.MOVED_TO: 128>, cookie=9157, name=PosixPath('00000000.json'))
Processing path /tmp/dongleman/spool/queue/00000000.json
Exception while processing JSON '/tmp/dongleman/spool/queue/00000000.json':
Cannot send requests while disconnected
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))```

@sergei-mironov
Copy link
Owner

OK, that looks like in your case Telethon looses the connection and gives up re-establishing it..

@mo3r31337
Copy link
Author

I have added some parameters to TelegramClient. Now I don't get connection loss errors, but once after a few days the script stopped without any errors in the console.

  tclient=TelegramClient(session=SESSION,
                         api_id=TELEGRAM_API_ID,
                         api_hash=TELEGRAM_API_HASH,
                         connection_retries=-1,
                         retry_delay=2,
                         auto_reconnect=True)

@sergei-mironov
Copy link
Owner

  tclient=TelegramClient(session=SESSION,
                         api_id=TELEGRAM_API_ID,
                         api_hash=TELEGRAM_API_HASH,
                         connection_retries=-1,
                         retry_delay=2,
                         auto_reconnect=True)

Makes sense! I'll add this to the code, thanks.

but once after a few days the script stopped without any errors in the console.

Could it be a segfault from some of the C/C++ libraries involved? Could you please check your system's segfault log?
Also one could try to call the script with strace to get very verbose logs of system calls..

@sergei-mironov
Copy link
Owner

I use this setup on an arm microcomputer with one huawei e1550 modem to serve only one of my mobile numbers while I'm in roaming

@mo3r31337 , could you please share some information on your ARM setup? Do you use RaspberyPi for this? What kind of Nixpkgs/NixOS do you use? I would like to build this project on a smaller device than I use now.

@mo3r31337
Copy link
Author

@grwlf Of course I share info about my setup. The RaspberryPi too huge for this. I use rock pi s with 512Mb of ram and 8Gb nand storage. The board have RK3308s cpu with 4 cores.
I've install Debian 11 (bullseye) and I didn't use nixpkgs, I didn't like stuff like this package manager. I've crosscompile all needed software and build deb's packages. If you need it, I can share pre-builded deb packages with dsc files
Actually I use it with PoE hat and everything powered via ethernet patchcord from my mikrotik router.
photo_2022-12-12_13-55-26
photo_2022-12-12_13-55-14
photo_2022-12-12_13-55-21
https://user-images.githubusercontent.com/24485702/206980820-f99846b6-d258-4d3b-b273-295ca0c57bab.mp4

@sergei-mironov
Copy link
Owner

sergei-mironov commented Dec 13, 2022

Actually I use it with PoE hat and everything powered via ethernet patchcord from my mikrotik rou

PoE is amazing, I think I should try this setup! But am I understand it correctly that you still use nixpkgs on Host for cross-compilation? I mean I think that it should be possible to use Nix for building the .deb package which installs all the dependencies of this project as /nix/store/... files on the Debian system. Do you use this approach?

@mo3r31337
Copy link
Author

No, I haven't completely used the nixpkg package system to build .deb packages. I used Nixpkg only to generate a json file for dongleman_daemon because I did not quite understand which variable is responsible for what. I tried building with nix on the host machine, but it takes a lot of time and space. Moreover, it is hardly possible to build this setup for arm architecture using nixpkg, I mean that there are some nuances in this process, not all packages were built in a standard way. In total, I spent about two weeks building all the packages, maybe even more.

@sergei-mironov
Copy link
Owner

Thank you, now I understand. It is not an easy job you are doing, good luck with that! I think that building for arm using nixpgks might be possible. I was able to cross-compile some of the system parts of Pinephone 64, but not the GUI user space. AFAIK compiling mobile GUIs currently requires setting up a virtual machine or a real device, which is indeed a troublesome task. I made some notes about it here https://github.com/grwlf/mobile-nixos-cfg

Regarding the Asterisk, I still have plans to run it under Nix on some ARM device, but currently I do not want to use such a tiny computer like yours. I think I will buy a regular Raspberry, but I'd like to test PoE which some of them also support. New modems should arrive to me in January, so I hope to build an arm setup for this project after this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants