-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v290 reprocessing killed #783
Comments
Possibly related to multiprocessing options once more, but I do not find those options anymore. Have there been changes on this topic from v289? |
Yer these Killed: 9 can be anything... from a command like As for the multiprocessing the options are still there (they were never there by default I believe)
You'll have to check your old setup but I think the |
My past notes are not precise enough unfortunately. I thought that it worked with default options in the last version but I may be wrong. I restarted with pool option (val). If it fails again I'll try the linear. |
I'm also moving to 290 and was wondering if REPROCESS_MP_TYPE_VAL shouldbe set to 'linear', as it's 'process' by default. 'process' seems to work for the minidata set, but for the full set of data ? I'll set the kw to linear. |
I hed set it to linear since the beginning with the 288, as it failed with 'process'. what does 'pool' do ? |
Status here: I tried all three options, and had a memory leak with all, even with : |
@luc, this is with any of the options for REPROCESS_MP_TYPE_VAL.value? |
linear |
My crashes come at the validation process like you. It was the case before (v284) with the "process" option but I thought it had been fixed at the 288 version. |
process and pool are just two different ways of multiprocessing: https://stackoverflow.com/questions/18176178/python-multiprocessing-process-or-pool-for-what-i-am-doing
There have been no changes related to this but its a complex web (one which I'm definitely simplifying for v0.8)
This was never "fixed" as I still never got to the bottom of what was causing it - the "linear" option used to fix it for you so that seemed "enough" to have that (slower) option. @larnoldgithub and @clairem789 can you both try with v0.7.289 and v0.7.288 and verify that the problem comes from v0.7.290 I'll have to go through all changes line-by-line to see what changed that could possible affect it. |
So I guess that the issue is not seen on UdM machines...? |
You never should be changing the Please use the you can read the i.e. add the following to
|
I haven't seen any such issues with NIRPS or SPIRou - though both machines have 300+GB of RAM. NIRPS is only doing daily processing and I haven't done a large run for either NIRPS or SPIRou. |
OK, sorry for my mistake in changing the wrong file. I'm doing this 2-3 times a year, not enough to remember all small details. It would be nice if it could explain it all, actually! |
Not a full reduction - the last runs have been done with v0.7.290 (though this is not as recommended as doing a full re-run) but again processing a single run may not show this issue as badly as redoing everything. |
I did the same error in the past... you should have somewhere a folder like .../config/myprofile/ where myprofile is the name of your 'installation' of apero, like offline290. in this folder you have a bunch of files: the *.org are the original files I have cp just in case. The way apero works is that it first reads the default and then updates the values with the user values, then starts the processing. |
I did run the 288 with 'process' months ago, it crashed. I set it to 'linear' and has been very stable regarding the PROC at least? I didn't see any memory leak. For my apero_precheck last night with the 290, the memory usuage increased linearly during the db update, then came back to the 'background level' of the machine. @njcuk9999 do you think this is expected behavior ? apero_precheck ended with no error. I'll not be able to make a test with the 289 before next week. |
So to close this issue, it was due to my mistake of modifying the REPROCESS_MP_TYPE_VAL option value in the wrong file (default instead of user's files). Incidentally I confirm that the linear option for this parameter is the one working for NewWorlds machine. |
Thanks for clearing this up, I'd rather this than trying to figure out what caused it to break in newer versions and not older ones! |
After about 6-12h of processing, I found the terminal with a "Killed: 9" and all terminated.
In google it says that the application has received a signal...
Not sure what to do and when this will show up again. Any clue? Maybe a memory leak?
I'll start ip again with an eye on the activity panel to check memory.
thanks
The text was updated successfully, but these errors were encountered: