-
Notifications
You must be signed in to change notification settings - Fork 460
WorkShop12_WorkshopSummary
Problem: while an app is doing I/O-intensive stuff, other apps get no-heartbeat exits;
- I changed client/API so that the client passes its PID to app, and the app periodically checks whether the client is alive, instead of using heartbeat messages. This mechanism will be used only with new (7.0.37+) clients and new app versions. Other combinations will continue to use heartbeats.
- We discussed having the client send heartbeat messages in a separate thread. I propose not doing this because the problem should be solved by the above.
Problem: need a mechanism for sending long jobs that don't checkpoint only to hosts that are likely to finish them.
- Have client send its current uptime and the duration of its previous session's uptime in scheduler request message.
- On server, allow flagging app versions as non-checkpointing.
- Scheduler: if app version is non-checkpoint, send job to a host only if its expected runtime is less than the host's uptime or previous uptime.
Goals include:
- Increase the quality and frequency of server software releases.
- Increase the stability of the server software in trunk.
We discussed the following:
- Automated system-level testing of server software. We used to have frameworks for this (boinc/test/) but they're not maintained. We lack the manpower to do this; volunteers are needed.
- How to test server software? When to do releases? Automated testing would help, but a large number of features can feasibly be tested only in live use. I think we need projects to help as follows:
- Operate test projects for testing new server software.
- Use these project to beta-test server software.
- When have a release candidate, create a new branch, test it using these projects, release it when all bugs fixed.
- Unit testing of server software. I'm not sure if this has good cost/benefit; few if any would be detected. But if a volunteer wants to write unit tests, I'd be happy to add them to the tree.
-
Automated nightly builds.
Rom will look into this.
How to do for Win, Mac?http://jenkins-ci.org|Jenkins supports build slaves running on any OS that supports Java. - Automated system testing of web software. We lack the manpower to do this; volunteer help is needed. Hint: take a look at http://seleniumhq.org|Selenium.
-
Improved SCM workflow: We need to introduce code branches to isolate ongoing development from release and maintenance processes in order to stabilise the codebase and facilitate stable releases.
- Develop new features in dedicated "feature branches", branching off master. Merge back into master when developer testing was successful (features can be pretty small, merge often)
- Create a "next" or a "release candidate" branch for the upcoming release, branching off master. Test and fix release until ready for release, merge fixes back to master
- Maintain each release in its dedicated branch to allow for maintenance. Merge fixes back to master.
- Alternatively, go for "real thing" using https://github.com/nvie/gitflow.
Some changes were proposed but I forget what they were. Wenjing?
Francisco Sanz described the system developed by Ibercivis. Key features:
- "Subproject": the unit of access control; a set of apps
- "Scientist" and "batch" tables
- Scientists submit/control jobs using "mini-shell"
- WU generate limits outstanding WUs per batch (to limit DB size)
Several people expressed interest in these features. We will work on them, hopefully in the 2-3 month timeframe. Design docs are here: JobPrioritization, PortalFeatures
Comments (on boinc_dev) are welcome.
David Coss worked on documentation for this. David, please add to the Wiki or send to me.
David Coss presented this. I think it would be a useful feature, although no project other than David's had an immediate need for it. We should document it and add it to the source tree.
In David's system, the DAG is generated from a command file, with dependencies determined by the names of input/output files. We discussed the ideas of:
- Deciding when some of these (small jobs) can be done on the server
- Deciding when jobs that are small but have large intermediate files can be grouped together and done (using the wrapper) on a single client
Oliver demonstrated this. My impression is that it's about 90% complete. When done we can potentially add it to BOINC.
This is on hold until someone (e.g. Einstein@home) needs it.
More info: LocalityNew
Current work items:
- Make sure that everything needed to build BOINC/Android, and test apps, is in the BOINC tree and documented (Rom).
- Finish the GUI. Main items:
- Add interface for adding/removing projects and account managers.
- Show graphics of some sort (BOINC and/or project-specific)
- Get some projects to add Android/ARM app versions.
Nils Hoimyr expressed a wish for including VBox in the BOINC installer.