Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow workflow run time (not including EnergyPlus runtime) for large models with many info messages in one or more measures #4821

Open
DavidGoldwasser opened this issue Mar 10, 2023 · 7 comments

Comments

@DavidGoldwasser
Copy link
Collaborator

DavidGoldwasser commented Mar 10, 2023

Issue overview

Typically runtime issues are related to EnergyPlus but here for workflows that take more than 24 hours to run (On my M1 Max mac, twice as slow on Window, I can share those times later) less than 2 hours of that is on the EnergyPlus simulation.

This is an URBANopt project using large (200-400 unit building) with hpXML > OSM approach merged into single model. The OSM file is about 500 MB, the out.osw file in some cases is larger than 40 MB and has more than 150k lines.

Current Behavior

Here is a plot on top of full runtime and below just EnergyPlus. Of the for largest and slowest models two of them (around 50k sqft in size) take much longer than linear scaling of time for the smaller models. I can try to produce this plot for Windows as well. I don't know what is unique about those to models than the larger models that run in less than half the time (it could be they have more units even though less sqft, but maybe something else as well).

Build Residential measure does take a few hours but a number of downstream measures are also slower. Also in on case I saw an odd multi-hour gap between when one measure finished and the next started. That may have been an anomaly. I already have a rake task that gets me the runtime total from the OSW, maybe that can be updated to isolate each measure, and look for any gaps in time between measures.

Screen Shot 2023-03-09 at 1 39 07 PM

Expected Behavior

Unless sizing runs are part of a measure, they generally should run in a few seconds or few minutes. Maybe that isn't reasonable for large models, but at the worst the impact should be linear where a model twice as big is twice as slow, not 4x slower. If there are ways to speed up processing for larger models that would be helpful.

Steps to Reproduce

  1. Check out temp_slow_only branch https://github.com/urbanopt/urbanopt-prototype-district-templates/tree/temp_slow_only
  2. Run bundle Install
  3. You can remove all but one weather files from here so it only goes through this once
    https://github.com/urbanopt/urbanopt-prototype-district-templates/tree/temp_slow_only/example_files/weather
  4. Run the urbanopt_climate_sweep rake task
  5. Wait 24-48 hours for the 4 datapoints to run in parallel. It it currently setup just to run the slower of the two scenarios.

Possible Solution

Some of this may fall on measure writing to minimize looping through objects. Logging is a complex question. I generally encourage good logging of warning and info messages for transparency and diagnostics. I'm wondering if we should have an enhancement request for when running osw in the CLI to not write info messages, and only warning and runner.register values.

Possible steps outside of core OpenSutdio

  1. Update BuildResidential when it merges OSM's together to make cleaner model with less duplicate objects. More advanced approach for large buildings is to start to use multipliers so not every single unit has to be modeled.
  2. Update reduce_epd_by_percentage_for_peak_hours to have less logging of info statements, but they shouldn't have to be removed all together, they should function.
  3. Improve script/rake task that steps through out.osw and list duration for each measure and check for time between measures (on one occasion I saw a multi-hour gap from end of one measure and start of another, but I have not seen if that was isolated case or not.

Possible steps within core OpenStudio

  1. I think a good quick fix could be a special flag for CLI that tries to bypass info statements. It can't not call that code in the measure, but it can exclude the output from runner and the out.osw. Similar to verbose flag. This would allow a quick fix without having to alter measures.
  2. Improve the performance of large models with a lot of logging. Identify why two specific points are outliers and 2x or more slower than would be expected.

Details

The way URBANopt works is that all features in a scenario are run in parallel. I was running at one point 2x scenarios across 16x file locations. When I ran the default way I had all cores but one waiting for the last slow feature to run. To speed up process I ended running without the for slowest features. I then did separate run on separate computer of the slow features with just the right number of cores dedicated so that I could run multiple scenarios in parallel instead of just 1.

Related to the long out.osw one measure (reduce_epd_by_percentage_for_peak_hours) is largely responsible for that, because it is logging all schedules ,and other objects combined with a model that appear to have excessive amount of objects, multiple always on schedules.

Screen Shot 2023-03-08 at 2 59 44 PM

Environment

Some additional details about your environment for this issue (if relevant):

  • Mac 12.6 M1 Max 32GB memory
  • OpenStudio 3.5.1+22e1db7be5
  • URBANOpt 0.9.1

Context

This has made running analysis for paper very slow and complex. Not something we would expect external URBANopt users to do.

@DavidGoldwasser DavidGoldwasser added the Triage Issue needs to be assessed and labeled, further information on reported might be needed label Mar 10, 2023
@tijcolem tijcolem added component - Workspace Enhancement Request and removed Triage Issue needs to be assessed and labeled, further information on reported might be needed labels Mar 31, 2023
@DavidGoldwasser DavidGoldwasser added this to the OpenStudio SDK 3.7.0 milestone May 18, 2023
@joseph-robertson
Copy link
Collaborator

@DavidGoldwasser Could #4919 help at all with "bypassing statements"?

@joseph-robertson
Copy link
Collaborator

@DavidGoldwasser FYI I tried running a "Multifamily" building with number_of_residential_units=100 using urbanopt-cli v0.9.3. Indeed the workflow run time takes an incredibly long time. Some things to note:

  • Re "cleaner model with less duplicate objects" above: I do already have code in BuildResidentialModel for removing unique model objects for unit=2,...,n. This ensures we don't duplicate unique model objects.

  • Each iteration of the units loop takes longer than the previous. I've sort of narrowed it down (I think) to the model.addObjects(unit_model_objects, true) line. Here model grows larger and larger (while unit_model_objects remains pretty constant in size).

  • The following measures aren't being skipped in the residential workflow:

    • add_ems_emissions_reporting
    • export_time_series_modelica
    • export_modelica_loads

    Perhaps they could/should be?

  • After all model measures have been run, in.osm (as well as in.idf) takes a long time to write out.

There's definitely a lot of log lines in in.osw.log, but my hunch is that model is quite huge and is the main culprit here.

I definitely could see updates/improvements here in terms of grouping like units (i.e., sort of a multiplier approach) and applying OS-HPXML on a smaller subset of unique units. Perhaps that would help reduce the size of model. But we're talking pretty major refactor efforts here...

@shorowit
Copy link
Contributor

* Each iteration of the `units` loop takes longer than the previous. I've sort of narrowed it down (I think) to the `model.addObjects(unit_model_objects, true)` line. Here `model` grows larger and larger (while `unit_model_objects` remains pretty constant in size).

@joseph-robertson Perhaps it is faster to store all of the model objects to be added in an array and then call model.addObjects once at the very end (after the loop)?

@joseph-robertson
Copy link
Collaborator

* Each iteration of the `units` loop takes longer than the previous. I've sort of narrowed it down (I think) to the `model.addObjects(unit_model_objects, true)` line. Here `model` grows larger and larger (while `unit_model_objects` remains pretty constant in size).

@joseph-robertson Perhaps it is faster to store all of the model objects to be added in an array and then call model.addObjects once at the very end (after the loop)?

@DavidGoldwasser I tried out the suggestion from @shorowit. It does appear to be faster, maybe like 10-20% (although I'm coming to this conclusion with very limited testing). It probably doesn't solve the large hurdles with huge model, but it does seem to take at least a chunk out.

@shorowit
Copy link
Contributor

Do you have to use model.addObjects(unit_model_objects, true)? Can you get away with model.addObjects(unit_model_objects, false) instead?

@joseph-robertson
Copy link
Collaborator

I'm not real sure what the second argument is used for (check_names or something), but even with it set to false the run time is about the same (and in.osm looks to be the same size).

@chriswmackey
Copy link

Given that this issue is still open and there are potentially multiple reasons for the slowness of this particular case, I opened a new issue that specifically targets one of the causes, which has to do with the adding new objects to the model when Model.size is large:

#5340

I don't want to hinder the improvements to logging that were mentioned here but, given that the OSM of this case is 500 MB, I sense that a significant fraction of the slowness might be the result of what I raised in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants