Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Angular SSR to paths in the sitemap #3682

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alanorth
Copy link
Contributor

@alanorth alanorth commented Nov 22, 2024

References

Description

Only enable Angular SSR for paths in the DSpace sitemap and the home page. This is a compromise after analyzing high CPU usage in DSpace 7+ and discussion with the Google Scholar team. We do not need to be wasting CPU and memory to generate and store SSR pages in the cache for request paths that are not "primary" DSpace objects, for example search and browse—these request paths contain data derived from the primary objects themselves and bots can spend endless time crawling them.

This solution was originally proposed by @vitorsilverio in #3110 (comment).

Some notes:

  • This will require manual porting to DSpace 7
  • We should keep our eye on upstream work related to inlineCriticalCss because it improves the user experience. We disabled it in DSpace 7.6.2 and 8.1 because it made SSR perform even more poorly

Instructions for Reviewers

Please add a more detailed description of the changes made by your PR. At a minimum, providing a bulleted list of changes in your PR is helpful to reviewers.

List of changes in this PR:

  • Restrict SSR to request paths for primary DSpace objects like bitstreams, items, entities, communities, and collections, as well as the home page

Include guidance for how to test or review your PR.
Try browsing the repository to see if all pages work as expected.

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You do not need to complete this checklist prior creating your PR (draft PRs are always welcome).
However, reviewers may request that you complete any actions in this list if you have not done so. If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@alanorth alanorth added bug high priority performance / caching Related to performance, caching or embedded objects port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release labels Nov 22, 2024
@alanorth
Copy link
Contributor Author

Tests are failing because CI is checking for SSR on /home. We can fix this by:

  1. Adding /home to the SSR paths, or
  2. Using another path

The first option is probably the best because /home is one of the only paths that is guaranteed to work by default in DSpace. On the other hand, I just realized our list of SSR-enabled paths will include such endless tarpits like:

https://demo.dspace.org/entities/person/3b087e38-cd6b-4d85-9409-99a9f6f03425?spc.page=1&query=search

With entity search pages we have many combinations of pages depending on filters and number of items similar to /search. Bots will crawl those and get SSR pages, which is a massive waste of CPU and memory.

Perhaps this requires a re-think. What about inverting the logic and enabling SSR for everything, but disabling it on certain paths?

@ybnd
Copy link
Member

ybnd commented Nov 22, 2024

On the other hand, I just realized our list of SSR-enabled paths will include such endless tarpits like:

https://demo.dspace.org/entities/person/3b087e38-cd6b-4d85-9409-99a9f6f03425?spc.page=1&query=search

With entity search pages we have many combinations of pages depending on filters and number of items similar to /search. Bots will crawl those and get SSR pages, which is a massive waste of CPU and memory.

@alanorth #3231 should cover that

@tdonohue
Copy link
Member

tdonohue commented Nov 22, 2024

@alanorth : Thank you so much for getting this PR created! I was just asking someone to do this in yesterday's Developers Meeting.

Regarding the failing tests, I'd recommend adding /home to the list of SSR paths, because many bots/harvesters will start at your homepage (especially if they don't use sitemaps). So, I think that the homepage should always provide SSR.

One other suggestion. I think it'd be better to make these paths configurable instead of hardcoding them in the server.ts. It could look something like this:

ssr:
    paths: [ '/items/', '/entities/', '/collections/', '/communities/', '/bitstream/', '/bitstreams/' ]

(You'd have to update the existing ssr-config.interface.ts to support this new option)

Then in the code use environment.ssr.paths.

I'd argue that there also should be a way to enable SSR for everything (to retain current behavior). Perhaps that's the default behavior if this environment.ssr.paths configuration is unspecified or empty.

Overall, I do like this PR & support adding it quickly. I just want to add more flexibility to the configuration, as there's a chance that different sites will want to add additional paths (or keep the default behavior of SSR enabled for every path).

@alanorth
Copy link
Contributor Author

@alanorth : Thank you so much for getting this PR created! I was just asking someone to do this in yesterday's Developers Meeting.

You're welcome. I saw the meeting notes and was surprised that there wasn't already a PR, since I've been using versions of this patch for a few months already.

Regarding the failing tests, I'd recommend adding /home to the list of SSR paths...

Yes, agreed.

One other suggestion. I think it'd be better to make these paths configurable

Oh good idea, I didn't know about ssr-config.interface.ts. I will be offline for a few days but can work on this soon.

@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 3d544f6 to ccd0449 Compare December 8, 2024 17:57
@alanorth
Copy link
Contributor Author

alanorth commented Dec 8, 2024

I've updated this to use a configurable array of paths, including /home. I think I've done it correctly (my testing appears to show it works).

Duplicating the configuration of the ssr.paths array in each of the environment configurations feels strange to me. I don't know how we decide which default configurations get to go into src/config/default-app-config.ts or if there is a better way.

Copy link

github-actions bot commented Dec 8, 2024

Hi @alanorth,
Conflicts have been detected against the base branch.
Please resolve these conflicts as soon as you can. Thanks!

@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 34c38b6 to eeccff2 Compare December 8, 2024 18:22
@ybnd ybnd self-requested a review December 10, 2024 08:52
@nwoodward
Copy link
Contributor

@alanorth This PR looks good. To make the paths list more easily configurable, would it make more sense to add them to the ssr section of config/config.example.yml? I'm afraid they will be harder to configure in the src/environments/*.ts files.

# Angular Server Side Rendering (SSR) settings
ssr:
# Whether to tell Angular to inline "critical" styles into the server-side rendered HTML.
# Determining which styles are critical is a relatively expensive operation; this option is
# disabled (false) by default to boost server performance at the expense of loading smoothness.
inlineCriticalCss: false

@alanorth
Copy link
Contributor Author

alanorth commented Jan 8, 2025

To make the paths list more easily configurable, would it make more sense to add them to the ssr section of config/config.example.yml

@nwoodward I wasn't sure about the interaction between these defaults. I think the ones in src/environments/*.ts are the defaults, and we can put them in the example config YAML files as well. I see others like inlineCriticalCss defined in both so I assume there is some inheritance or defaulting the values initialized in src/environments/*.ts? I will try to test this week.

@tdonohue
Copy link
Member

tdonohue commented Jan 8, 2025

@alanorth : To answer your question, the config.example.yml is simply for documentation purposes. It provides examples & comments of how to configure available settings. It is not used anywhere though. But in our Installation docs we recommend you create a config.prod.yml based on the existing config.example.yml.

Any settings you set in your config.*.yml will override any default values set in src/environments/*.ts or in src/config/default-app-config.ts. So, with this PR, it should already be possible to configure this setting in your config.*.yml to override the defaults

That said, I would also recommend we add this setting to the config.example.yml with a brief explanation (in comments) along with the default value. This just makes the configuration more visible to installers...without them having to search the documentation.

@alanorth
Copy link
Contributor Author

alanorth commented Jan 8, 2025

Great, thanks @tdonohue. I will add the SSR paths to config.example.yml too so people can easily customize for their environment. I forgot that the dev and prod YAML files are not in git.

server.ts Outdated Show resolved Hide resolved
@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from eeccff2 to 1174068 Compare January 10, 2025 06:54
@alanorth
Copy link
Contributor Author

Thanks for the feedback 🙇 . I've updated the patch to use the startsWith() method instead of includes() and added the paths to config.example.yaml to help users know how to override them. I also did basic tests to make sure customizing the paths works for serving CSR for certain paths.

@nwoodward
Copy link
Contributor

Thanks @alanorth! Everything looks good. I'm running into an issue testing it that may be due to my ignorance about how SSR and CSR work. I'm trying to test it locally on http://locahost:4000 with the backend on http://localhost:8080/server, and I have config.dev.yml and config.prod.yml files in /config, both of which have the paths list copied over from config.example.yml. I've tested with npm run start:dev and npm start to try both YAML config files.

I tested all the paths on the list, and they all were rendered by SSR. Then I removed /items from the list and rebuilt Angular. But it's still getting rendered by SSR. I did the same test with removing /communities and got the same results. But other paths that aren't in the list, such as /search, are not being rendered by SSR. So these changes appear to be working, but for some reason I can't remove a path from the list. I wonder if it's not a caching problem, even though I'm doing a hard refresh on every page. I'll look into it.

@alanorth
Copy link
Contributor Author

alanorth commented Jan 10, 2025

@nwoodward I only tested in production mode with npm run start. Looking at the scripts in package.json now I think that dev mode doesn't use SSR so that might be what you are seeing.

Also, it helps to enable cache.serverSide.debug in the config so you get the log of hits and misses in the console. Try with a browser, then with curl for example.

@tdonohue
Copy link
Member

@nwoodward : @alanorth is correct. SSR only works if you are running in Production Mode. (See how to do that in that README link. To do Production mode on 8.x/7.x you need to use yarn instead of npm obviously)

The best way to test SSR is by starting the UI in production mode on localhost:4000, access it, and then turn off Javascript in your Browser. For example: https://developer.chrome.com/docs/devtools/javascript/disable

With Javascript disabled, you can ONLY see parts of the User Interface that have gone through SSR. Essentially, you are browsing the site like a crawler would. All links/buttons should work, and SSR pages should load properly. However, anything that requires Javascript (e.g. some animations or dropdowns) or CSR will not work.

You could test this PR by comparing it to the https://sandbox.dspace.org. The Sandbox will use SSR on every page, while this PR should not (so some pages should not load with Javascript disabled)

@nwoodward
Copy link
Contributor

@tdonohue @alanorth OK, thanks for the additional information. As I mentioned, I believe I was testing this in production mode with npm start, though it was with the frontend and backend running locally. I'll take another look.

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanorth : I gave this a test today. Mostly it's working great. I can verify that paths not listed in the new ssr.paths configuration do not undergo SSR (the pages just appear blank when Javascript is disabled). Conversely, those path listed in that configuration are all undergoing SSR... so the page will still load the main content with Javascript disabled.

However, I've found a small bug in the logic for the homepage when Javascript is disabled. If you access the homepage via http://localhost:4000/home, then it loads via SSR. However, if you access it via http://localhost:4000/ then it will not load (because the root path doesn't undergo SSR).

I think we may want to see if there's a way to simply hardcode that the root path (/) always undergoes SSR. I initially tried adding '/' to the list of ssr.paths, but that causes all paths to use SSR, because startsWith('/') will always pass for every path.

We may need to add an "OR" clause next to the startsWith logic in server.ts which checks if it's the root path (/), and if so, executes SSR.

Because Angular SSR is not very efficient, after discussion with
the Google Scholar team we realized a compromise would be to only
use SSR for pages in the DSpace sitemap (and the home page).
@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 1174068 to 451b262 Compare January 14, 2025 06:55
@alanorth
Copy link
Contributor Author

alanorth commented Jan 14, 2025

Thanks @tdonohue! Good catch. I added an explicit check for request to the root path:

  if (environment.ssr.enabled && req.method === 'GET' && (req.path === '/' || environment.ssr.paths.some(pathPrefix => req.path.startsWith(pathPrefix)))) {
...

I tested and it's working with curl and with Javascript disabled in the browser for requests to the root.

@alanorth alanorth requested a review from tdonohue January 14, 2025 12:26
@alanorth alanorth dismissed tdonohue’s stale review January 14, 2025 17:49

Added exception for root path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug high priority performance / caching Related to performance, caching or embedded objects port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release
Projects
Status: 👀 Under Review
Development

Successfully merging this pull request may close these issues.

(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR)
5 participants