schema validation turns xml into ml posted in xml, apr 28, 2008
i’ve come to the conclusion that xml schema validation is bad. this is the kind of bad in the way that javascript is bad: it’s often misused and / or used for evil. specifically, validating xml against a schema essentially removes the x in xml (which stands for extensible) because it doesn’t allow new tags to be added. thus, by always validating xml against a schema, it becomes simply ml.
i propose that schema validation should not fail when additional tags or attributes are present in the xml but not defined in the schema.
a schema essentially defines what tags are allowed in an xml document, which tags can contain which other tags, and how many times tags can appear. it does the same for attributes.
it may seem that an xml file that contains tags or attributes which are not in the schema should fail validation, when in fact it is normally better if it does not. the simplest reason is that when accepting input, it’s best to handle the widest possible variety of formats. an xml file is a form of input, and it’s certainly possible to accept xml that contains additional information beyond what’s expected to be there according to the schema.
i see schemas most useful when used as an interface in object-oriented programming: an xml file must provide the information the schema says is required, but it is free to also provide additional information for other purposes. of course whatever is going to read the file using the schema would ignore that additional information. given the functions that exist to read xml files, it’s easy to simply skip over any information that isn’t expected to be there.
the reason this came up is that i was working on a c# .net project, and needed to add a setting to the application configuration file (which is an xml file with a schema). this setting needs to be able to be present any number of times (including zero) and as it would probably be present somewhere in the 20-50 times range i didn’t think putting in <add key="setting" value="value1"/> that many times was best. so i made up my own tags and tucked them out of the way near the bottom of the xml file. it was still valid xml, but now .net throws an exception when i try to create an AppSettingsReader object to read the non-custom parts of the configuration file. how annoying — now i have to change existing code to not use AppSettingsReader simply because i added other settings. thankfully since AppSettingsReader requires a lot of repetitive work to handle potential errors i already had a function that everything was using instead of reading directly from the AppSettingsReader. of course i had stopped using the whole application configuration thing on new projects a while before this, but this was a project started before that.





