Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizer built-ins document #244

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

otherdaniel
Copy link
Collaborator

@otherdaniel otherdaniel commented Nov 29, 2024

This is meant as a starting point for our built-ins. It's a "mostly free-form" document, in that it's just text with one line per element, plus markdown-style headings. The idea is to classify elements and attributes into groups.

I started out putting everything into "other", and then moving them into better defined groups. The idea is to work down the "other" list until it's empty.

I copied all the elements from the "proposed allow lists" doc into the "harmless" category. Will do the same for attributes.

Source(s):

  • Elements & attributes lists from Chrome. (Which may include legacy elements no longer supported.)
  • "Proposed allow lists -- Sanitizer API - 2024-03" document prepared by Frederik.

Preview | Diff

@mozfreddyb
Copy link
Collaborator

FYI, this PR is still called "Not ready yet". Let me know when you are seeking review.

@otherdaniel otherdaniel changed the title Sanitizer built-ins document [Not ready yet] Sanitizer built-ins document Dec 13, 2024
@otherdaniel
Copy link
Collaborator Author

FYI, this PR is still called "Not ready yet". Let me know when you are seeking review.

About now would be good, so I renamed it. :)

I added more "sections" with spec links, also for attributes. I think that should cover all of HTML; while SVG + MathML coverage is still a bit of a mess. There are a lot more leftover ("other") attributes than there were with elements.

Changes so far are:

  • I copied the per-element attributes from the HTML spec, for any element that I expect to be default-allowed. (It's a manual process, so I was trying to save myself some time.)
  • I also copied global HTML attributes + aria attributes from the spec(s), including spec links.
  • I removed all attributes that are used locally somewhere from the global list. I'm not sure this is quite correct.
  • All of this was manual, so I wouldn't be surprised if there are some omissions somewhere. I'm unsure how to do QA here.

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this list. Very useful to have something based on the HTML standard. Here's an initial very conservative safelist for "default":

  • html
  • head
  • title
  • All of Sections (body, article, ...)
  • All of Grouping Content (p, hr, ...)
  • Most of Text-level Semantics:
    • a attributes target, referrerpolicy, download, and ping I would omit
  • All of Edits (ins, del)
  • All of Tabular Data
  • SVG & MathML: TBD
  • Global attributes:
    • dir
    • lang
    • title

@mozfreddyb
Copy link
Collaborator

👍 to what @annevk says, that we should build upon the HTML spec rather than casting the widest net.

As written in #245, I could see us being a bit more iterative by using his relatively small list for now and discussing additions individually as they come up (which they are bound to anyway).

@otherdaniel
Copy link
Collaborator Author

otherdaniel commented Dec 18, 2024

  • Updated list according to Anne's suggestion.
  • Changed the list format a little, and added a python script that turns it into JSON.
  • Included this and baseline config from the spec.
  • Moved the builtin files (json, text, script) to a builtins/ directory.

Comment on lines 12 to 145
"ondragenter",
"ondragleave",
"ondragover",
"ondragstart",
"ondrop",
"ondurationchange",
"onemptied",
"onend",
"onended",
"onerror",
"onfocus",
"onfocusin",
"onfocusout",
"onformdata",
"ongotpointercapture",
"onhashchange",
"oninput",
"oninvalid",
"onkeydown",
"onkeypress",
"onkeyup",
"onlanguagechange",
"onload",
"onloadeddata",
"onloadedmetadata",
"onloadstart",
"onlostpointercapture",
"onmessage",
"onmessageerror",
"onmousedown",
"onmouseenter",
"onmouseleave",
"onmousemove",
"onmouseout",
"onmouseover",
"onmouseup",
"onmousewheel",
"onmove",
"onoffline",
"ononline",
"onorientationchange",
"onoverscroll",
"onpagehide",
"onpageshow",
"onpaste",
"onpause",
"onplay",
"onplaying",
"onpointercancel",
"onpointerdown",
"onpointerenter",
"onpointerleave",
"onpointermove",
"onpointerout",
"onpointerover",
"onpointerrawupdate",
"onpointerup",
"onpopstate",
"onprogress",
"onratechange",
"onrepeat",
"onreset",
"onresize",
"onresolve",
"onscroll",
"onscrollend",
"onscrollsnapchange",
"onscrollsnapchanging",
"onsearch",
"onsecuritypolicyviolation",
"onseeked",
"onseeking",
"onselect",
"onselectionchange",
"onselectstart",
"onshow",
"onslotchange",
"onstalled",
"onstorage",
"onsubmit",
"onsuspend",
"ontimeupdate",
"ontimezonechange",
"ontoggle",
"ontouchcancel",
"ontouchend",
"ontouchmove",
"ontouchstart",
"ontransitionend",
"onunload",
"onvalidationstatuschange",
"onvolumechange",
"onwaiting",
"onwebkitanimationend",
"onwebkitanimationiteration",
"onwebkitanimationstart",
"onwebkitfullscreenchange",
"onwebkitfullscreenerror",
"onwebkittransitionend",
"onwheel"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do here? In spec purity terms, I believe we should stick to those in the HTML standard and make a big note that many engines support non-standardized and add them as a hint or such?
But In reality, I can see this going wrong.

@evilpie: How would we best identify the list of supported event handler attributes in Gecko?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just check if an attribute is a https://html.spec.whatwg.org/#event-handler-content-attributes. We could then maybe non-normatively list all of them (they're also in an index in HTML). Implementations can do roughly the same thing they do for Trusted Types.

Copy link

@evilpie evilpie Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Gecko, Trusted Types currently uses the EventNameList.h.

Copy link
Collaborator Author

@otherdaniel otherdaniel Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now removed the list of event handlers, instead adding a rules to remove event-handler-content-attributes. I'm iterating over those, as if they were a list. Not sure if that's legitimate.

I've also added a note and a script that merges in a copy of the event handlers, so it's more easy to see what this does. This should make it easy to modify, and to -- eventually -- just use a list directly derived from the HTML spec text.

Unfortunately, the preview doesn't run the scripts, so that particular bit isn't easy to review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think iterating over them is okay. We might have to revisit this when upstreaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants