August 16, 2008
New SgmlReader Release(s)
Steve Bjorg @ 7:51 am
With the launch of the MindTouch Deki 8.08 RC1, we’re also releasing updated versions of SgmlReader, the versatile .NET library written in C# for parsing HTML/SGML files. The benefit of SgmlReader is that it can cope with some fairly loosely formatted documents and convert most of the content into valid XML.
SgmlReader is being released in two versions: 1.7.5 and 1.8.0. 1.7.5 marks the last version compiled for .NET 1.1. Starting with 1.8.0, .NET 2.0 is required. Both can be downloaded from SourceForge.net and include compiled binaries, as well as the source code.
Improvements in 1.7.5 (.NET 1.1)
- Detect ending quote in attributes (e.g.
<p class="para>...</p>) - Each unknown prefix is mapped to a unique namespace, allowing duplicate local names (e.g.
<p o:x="foo" m:x="bar">...</p>)
Improvements in 1.8.0 (.NET 2.0)
- Major code review & clean-up to use generics by jamesgmbutler (thanks!)
- Support XML-only entity ' in HTML/SGML documents (e.g.
<p>It's ok</p>)
To parse a HTML document into valid XML:
XmlDocument FromHtml(TextReader reader) {
// setup SgmlReader
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = reader;
// create document
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
return doc;
}
All in all, some good improvements. If you have any recommendations on how to make it event better, please leave a comment or join us on the forums.
categories: Community, Dream, MindTouch | No Comments »
load up MindTouch Deki by going through the built in VMware Virtual Appliance Marketplace. To simplify things even more we wrote up 



over 70,000 widgets that can be added to websites. With the MindTouch Deki Widgetbox extension you can easily add a widget to a Deki page.


from you! Contact webmaster[at]mindtouch.com and I will be in touch with you right away.

