-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: encoding/xml: option to treat unknown fields as an error #30301
Comments
Change https://golang.org/cl/191962 mentions this issue: |
I think we should consider returning all unknown elements/attributes (or perhaps the first N). This is useful for cases where xml is being used as a file format and you want to find all element typos/mistakes at once (in the same way the go compiler returns more than one error at once). Behavior could be controlled using an integer instead of a bool. The downside is that the style would be different from the encoding/json package. While consistency is ideal, there are already several differences between encoding/json and encoding/xml so maybe one more would be ok? |
It would be really helpful to have this - the CL mentioned here is fairly compact and goes in style with JSON decoder. |
Change https://golang.org/cl/362243 mentions this issue: |
It appears this has stagnated. I have resolved the merge conflicts and otherwise presented the original CL as-is. |
Update: no update, no traction, no response, no review. Time moves on, another day without strictly enforced XML. Once again, XML is treated as the redheaded stepchild compared to JSON. I tire. 🕓 |
Should this be added to the review meeting minutes? |
If you think it'd help! It's past the freeze so.. |
This is an API change so I'll move it into the proposal process. It's definitely true that there is very little maintenance on XML and also for that matter JSON. |
This proposal has been added to the active column of the proposals project |
@rsc Much thanks! |
It sounds like there are three things to disallow:
Do we need the third? Probably? |
That's an interesting idea. I didn't see any mention of the need in any of the original discussion when Eddie Scholtz submitted his original CL. What would be the potential use case for that? My personal use case for DisallowUnknownElements/DisallowUnknownAttributes is so I can know if my structs match the input rather than vice versa. I use W3C XSD for the actual data input validation (via https://pkg.go.dev/github.com/lestrrat-go/libxml2/xsd but there are a couple other implementations out there as well) - which I'm certainly not suggesting be stdlib (as there are multiple data validation/schema formats out there for XML besides XSD - DTD, Schematron, RELAX NG, etc.). All that to say that typically XML validation itself is done via schema implementation, but as there isn't a way for golang to "self-generate" a struct type definition from an XSD (or et. al.), this is where the Disallow* would come into play. Typically in XML, the element contents would be validated by the schema mechanism so I'm having difficulty figuring out where DisallowUnknownCDATA plays into that. Does that make sense? I'm not opposed to DisallowUnknownCDATA, it just seems to be incongruous to the other two proposed additions in function -- at least in my own mind. |
We needed DisallowUnknownElements and DisallowUnknownAttributes. But adding all three sounds reasonable. Our specific use case is service configuration via xml files. Sometimes customers hand edit the configuration. Catching typos during unmarshalling is a better user experience than silently using default values for a given field. Ideally we would be able to provide line/column number (or atleast byte offset which can be converted) for unknown fields and attributes. It would also be ideal to gather all of the unknown fields and attributes instead of just the first one. I think much of this quickly grows beyond the scope of a standard library package and reaches the point where it would be better served by a more general purpose xml parser. I suspect everyone's needs and use cases will be slightly different. However, at least providing similar standard library functionality to the json package seems reasonable. |
Right; I'm just trying to keep in mind feature creep -- would specifying unknown CDATA inside an element be considered structure non-conformance or data non-conformance? Keeping in mind feature-parity with JSON, is it closer to an improperly formatted JSON key, or is it closer to improper JSON value? This is where things start to get a little philosophical and weird.
Agreed, for sure.
Yeah, and that's my big worry here -- feature creep. Some of the above is already implemented in the various schemas I mentioned, and have analogues in corresponding libraries that DO already exist:
But I think there needs to be some method of Golang recognizing that there are certain structural components that were otherwise silently ignored, which the DisallowUnknownElements and DisallowUnknownAttributes both would serve nicely. The question I'm wrestling with is whether an unknown CDATA qualifies as (formatted/contained) data or structure. The former which I think Golang stdlib oughtn't concern with, since there's no JSON analogue and it's going to be probably specific to implementation. Theoretically CDATA should be allowed anywhere regular data is since it's just that -- character data; not intended to be parsed as actual XML/SGML/etc. by the parser. So if this would be implemented, it probably ought to be implemented as a more generic "must be empty" struct tag rather than a But that's why I'm interested in a use case @rsc had in mind because I don't know if I'm glossing over something mentally. |
I would suggest reframing it ever so slightly. It is about data 'which has no location in the structure'. It's not that CDATA is bad, it's that any data that we can't actually store in the given structure is potentially bad. It most definitely means that it is data that you can't do a round trip between XML -> Go data structures -> XML on. The key here is, of course, the 'unknown' part. This is all about unknown data. And unknown CDATA is just as much of a thing to want to reject as unknown tags or attributes. And for many of the same reasons, it is very likely to be an error by whoever or whatever generated the XML, and using just the tools provided by the standard library you not only can't do anything with it, right now you can't even know it exists. Sure, stepping outside of stdlib gives you more options, but sometimes what you want is the ability to easily handle XML just like you handle JSON, including whatever content validation that you use on the resulting data structures. And that's just not practical if there are ways to insert data that is just silently dropped. |
@zelch Hrmm. Can you provide me with an example of what this would look like with some sample XML and the behavior you're looking for? I'm still having difficulty visualizing this. I think by and large it's because I'm unclear what constitutes exactly what "unknown CDATA" actually is. I've been understanding it to mean "content within an element that is not intended to be understood as structure", which is what CDATA's intent is, in which case this can be captured with the
(https://pkg.go.dev/encoding/xml#Unmarshal) Data "which has no location in the structure" ("structure" here meaning the Golang struct) would either be an element or an attribute and nothing else, would it not? Conceptually I'm unclear what data which has no location in the structure, where structure is that of the XML document, would actually be as that'd be just plain invalid XML from my understanding, e.g. somethinghereisnotvalid
<root>
<child/>
</root> Can you give me some sample XML you're describing and the behavior you want to see? |
Entirely from memory:
Where does the random text go in a Go Structure? |
It isn't captured by Root.Cdata and Sub.Cdata? package main
import (
"encoding/xml"
"fmt"
"log"
)
type Root struct {
Name xml.Name `xml:"root"`
Cdata []byte `xml:",chardata"`
SubItem Sub `xml:"subitem"`
}
type Sub struct {
Name xml.Name `xml:"subitem"`
Attr string `xml:"attr,attr"`
Cdata []byte `xml:",chardata"`
TagItem string `xml:"tag"`
}
var xmlDoc []byte = []byte(`
<root>
random text
<subitem attr='value'>
more random text
<tag>text we want</tag>
</subitem>
</root>`)
func main() {
var err error
var r Root = Root{}
if err = xml.Unmarshal(xmlDoc, &r); err != nil {
log.Panicln(err)
}
fmt.Printf("%#v\n", r)
} |
Yes, and the question is whether, if the |
@rsc AH, okay, I think I see. My confusion was stemming from a couple things:
It seems then that this particular Disallow is desired to catch content that exists that "shouldn't" due to malformatting in the original data rather than attempting to disallow mismatching structure (as I resubmitted the original CL with the intent of getting closer feature parity with (encoding/json).Decoder (i.e. structural-only). While I'm certain that some people may find a use for a Is this acceptable? |
Yes, If |
Still would like to hear what people think in response to the questions in the previous comment. |
I think the The second approach would be to accumulate all unknown elements, attributes, and chardata in a custom error type that gets returned at the end of unmarshalling (assuming the flag is set). This might be easier for callers with a large tree of structs (10s or 100s of different types) since they don't need to add |
Okay, so given that we have |
Based on the discussion above, this proposal seems like a likely decline. |
No change in consensus, so declined. |
Towards the end of 2013, issue #6901 was opened requesting support to error on unknown fields instead of silently dropping the data.
In 2016, that issue was closed in favor of issue #15314, where DisallowUnknownFields was implemented for JSON, however that issue was JSON specific, and as a result the issue was never really resolved for encoding/xml.
As such, I would like to propose either adding DisallowUnknownFields to the encoding/xml Decoder type, to exactly mirror the JSON API, or adding DisallowUnknownElements and DisallowUnknownAttributes to the encoding/xml Decoder type.
The former would provide more consistency between the JSON and XML interfaces, the latter would be a moderately better fit for XML.
From a look through the code, it looks like it should be reasonably trivial to implement either version, and I would be happy to do the implantation work if there is agreement on how to go about it.
The text was updated successfully, but these errors were encountered: