The
Problem
What are "microformats"?
In other words, microformats are conventions for using standard HTML to indicate the meaning of markup.
Frequently used attributes are @class @title @rel.
The conventions are unobtrusive - they don't have to change the look of your pages - and play well with web standards such as Javascript and CSS.
hNews = hATOM + news specific fields
Publishers
How to use hNews on your website
Let's markup a byline first
Other common
microformat idioms
Using hNews in your
publishing system
Here's an example NewsML-G2 + XHTML payload. The body of the story is in XHTML and the newslines are metadata are carried in NewsML-G2
How can the publisher and the recipients
apply hNews to the web display?
The NewsML-G2 + XHTML has everything
the recipient need to create hNews
The entry content is the XHTML payload in contentSet/inlineXML
Other hNews fields can be populated from the NewsML-G2
The recipients can style their web displays using the hNews microformat, just as TownNews has done in their CMS.
Or the publisher could add hNews as an alternative NewsML-G2 payload.
But how can the publisher add hNews to the XHTML payload, without requiring the recipients to do anything?
And without changing the look and feel of the articles?
The publisher can include all the fields for an entire hNews article, such as headline and publish datetime, within the XHTML payload.
The "trick" is to use the CSS style "display: none"
Does hNews replace NewsML-1 or NewsML-G2 (or ATOM or RSS)?
NewsML, NewsML-G2, ATOM and RSS are ways of enveloping content, to provide structure and metadata. They can all be used to convey HTML and therefore any of them would be an ideal way to deliver hNews. There may be some overlap in the news metadata that is marked up (e.g. you can indicate a headline in NewsML and within hNews).
An XSLT for transforming hAtom to Atom
When a person
looks at a web page...
However, there is a lot of
news on the web.
Reverse engineering
each news web site requires
quite a bit of work to get right,
with consequent risks of
making unfortunate mistakes.
But if an ecosystem of tools
applied to news on the web
makes news more valuable...
... they find it easy to pick out the news story components
and to distinguish them from related content, lists of most-read articles, adverts, etc.
How do we help more people
create better, more accurate tools
for working with news on the web
with less effort?
We believe microformats
and specifically hNews
are a good approach.
Someone writing a tool to work with news on the web, however, has to work with the raw HTML to pick out the various components.
It is often not too hard
to figure out how to "scrape"
one particular news site,
since most websites use
a fairly small set of templates.
Particularly if you are willing
to risk making a mistake once
in a while.
(Just cross your fingers that
they don't redesign their
website *too* often...)
Format
"Microformats are a way of adding simple markup to ... web pages, so that the information in them can be extracted by software and indexed, searched for, saved, cross-referenced or combined."
- from "Introduction to Microformats" http://microformats.org/wiki/introduction
An hCard example:
<div class="vcard">
<a class="url fn" href="http://tantek.com/">
Tantek Çelik
</a>
<div class="org">Technorati</div>
</div>
http://microformats.org/wiki/hnews
http://microformats.org/wiki/hatom
a uF for anything that can be represented as ATOM, such as blog posts and news articles
Some hNews Fields
hATOM Fields
Another hNews Field
I'm just using Yahoo's presentation to illustrate. I don't believe they currently use hNews
hNews
http://microformats.org/wiki/hnews
From hATOM:
- title
- author
- published
- updated
- content / summary
- rel-tag
Plus:
- source-org
- dateline
- geo
- item-license
- principles
A microformat to add some machine-readable
news-specific semantics to display-ready HTML.
http://newscredit.org/development/newscredit-specification/rel-principles-specification/
http://microformats.org/wiki/principles-brainstorming
Many organizations publish a "statement of principles" or a "code of ethics" that documents the practices that their journalists are to adhere to. The rel-principles proposal is a way to associate a given piece of content with those principles.
See http://en.wikipedia.org/wiki/Journalism_ethics_and_standards for a good discussion of journalism ethics and standards
Another hNews Field
Microformats can be broadly divided into "elemental" and "compound".
The elemental microformats such as rel-tag are typically aimed at solving a single, minimal problem. They generally use a single @rel or @class attribute.
Elemental microformats are often used as building blocks in compound microformats, such as hCard. They typically use several @rel or @class attirbutes.
See http://microformats.org/wiki/elemental-microformat and http://microformats.org/wiki/compound-microformat for more discussion.
News organizations have been experimenting with microformats for a while
Here's a 2007 website from the BBC World Service
"Microformats are new, developing standards for adding extra meaning to the HTML of a web page. They create all sorts of possibilities for software (from search engines to browsers) to interact with the content in new and useful ways.
The HTML for each Twitter, Flickr and diary post in the Bangladesh River Journey is written using the hAtom microformat. This means, for example, that an RSS feed can be generated directly from the HTML on the page.
If you use the Firefox browser, you can explore other microformats on the Bangladesh Boat site, with the excellent Operator extension. You'll find xFolk bookmarks, geo locations, hCard contacts and tagged links."
Quoted from http://dharmafly.com/bangladeshboat
Microformats used:
hAtom posts
xFolk bookmarks
geo locations
hCard contacts
link tags
Site Address: http://www.bbc.co.uk/worldservice/bangladeshboat
Blogged about at: http://dharmafly.com/blog/bangladeshboat
Process
"the best way ... is simply start using it. With any microformat, we can discuss all day what should/shouldn't be added/changed, but until people are marking-up their HTML we won't find those pain points. Real-world experience and expertise ... would be great. Please add any and all example URLs you find on the wiki. That will really help move things forward."
- Brian Suda (in an email discussing hListing)
So, try out hNews, provide your feedback and help move news on the web forward.
The microformats process: http://microformats.org/wiki/process
hNews reached draft 0.1 in early October 2009. An example of a recent change, based on feedback, was to relax both rel-principles and item-license from "must" to "should".
hNews is mature enough to use, but is still open to feedback and tuning based on experience.
<?xml version="1.0" encoding="UTF-8"?>
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<sent>2003-02-10T11:25:32.000Z</sent>
<sender>example.com</sender>
</header>
<itemSet>
<newsItem guid="tag:example.com,0000:newsml_L10491990" standard="NewsML-G2" standardversion="2.1" conformance="power" xml:lang="en">
<rightsInfo>
<copyrightHolder literal="Example"/>
</rightsInfo>
<itemMeta>
<itemClass qcode="icls:text"/>
<provider literal="example.com"/>
<versionCreated>2009-10-16T11:25:32.000Z</versionCreated>
<firstCreated>2009-10-16T11:25:32.000Z</firstCreated>
<pubStatus qcode="stat:usable"/>
<role qcode="itemRole:N"/>
</itemMeta>
<contentMeta>
<infoSource role="cRole:origProv" literal="Example"/>
<creator literal="Example"/>
<language tag="en"/>
<subject literal="Entertainment"/>
<headline>Monty Python reunite in NY on 40th anniversary</headline>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+html">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body>
<div class="hnews hentry item">
<h4 style="display: none" class="entry-title">Monty Python reunite in NY on 40th anniversary</h4>
<p class="author vcard">By <span class="fn">Michelle Nichols</span></p>
<div class="entry-content">
<p><span class="dateline">NEW YORK</span>, <div style="display: none" class="updated dtstamp" title="2009-10-16T08:13:32.000Z">Fri Oct 16, 2009 8:13am</div>
(<span class="source-org vcard"><span class="fn">Example</span></span>) - The Monty Python comedy team, the world-renowned British troupe celebrating its 40th anniversary, was honored with a special award on Thursday for its contribution to film and television.</p>
<p>"If you want to get a better view, this will be on eBay tomorrow," joked John Cleese as he accepted the award from the British Academy of Film and Television Arts. Monty Python also included Terry Gilliam, Eric Idle, Terry Jones, Michael Palin and the late Graham Chapman.</p>
<p>The presentation was made at the official 40th anniversary Monty Python reunion event in New York co-hosted by the Independent Film Channel, and followed a screening of a new documentary, "Monty Python: Almost the Truth (The Lawyer's Cut)."</p>
<p>Monty Python created the influential British television show "Monty Python's Flying Circus," which first aired in 1969, and went on to make popular movies including "Monty Python and the Holy Grail" and "Monty Python's Life of Brian."</p>
<p>The five remaining members of Monty Python took questions from the audience at the event and reminisced. Cleese recalled his most embarrassing moment as "when the queen came down to watch and my trousers fell down."</p>
<p>BAFTA, which hands out Britain's equivalent of the Oscars each year, last honored the Monty Python team in 1987 when it received the Michael Balcon Award for Outstanding British Contribution to Cinema.</p>
</div>
<div style="display: none">
<a href="http://example.com/entertainment" rel="tag">Entertainment</a>
</div>
<div style="display: none">Copyright <a href="http://www.example.com/info/copyright" rel="item-license">Example News</a> 2009.</div>
</div>
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
<?xml version="1.0" encoding="UTF-8"?>
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<sent>2003-02-10T11:25:32.000Z</sent>
<sender>example.com</sender>
</header>
<itemSet>
<newsItem guid="tag:example.com,0000:newsml_L10491990" standard="NewsML-G2" standardversion="2.1" conformance="power" xml:lang="en">
<rightsInfo>
<copyrightHolder literal="Example"/>
</rightsInfo>
<itemMeta>
<itemClass qcode="icls:text"/>
<provider literal="example.com"/>
<versionCreated>2009-10-16T11:25:32.000Z</versionCreated>
<firstCreated>2009-10-16T11:25:32.000Z</firstCreated>
<pubStatus qcode="stat:usable"/>
<role qcode="itemRole:N"/>
</itemMeta>
<contentMeta>
<infoSource role="cRole:origProv" literal="Example"/>
<creator literal="Example"/>
<language tag="en"/>
<subject literal="Entertainment"/>
<headline>Monty Python reunite in NY on 40th anniversary</headline>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+html">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body>
<p >By Michelle Nichols</p>
<p>NEW YORK,(Example) - The Monty Python comedy team, the world-renowned British troupe celebrating its 40th anniversary, was honored with a special award on Thursday for its contribution to film and television.</p>
<p>"If you want to get a better view, this will be on eBay tomorrow," joked John Cleese as he accepted the award from the British Academy of Film and Television Arts. Monty Python also included Terry Gilliam, Eric Idle, Terry Jones, Michael Palin and the late Graham Chapman.</p>
<p>The presentation was made at the official 40th anniversary Monty Python reunion event in New York co-hosted by the Independent Film Channel, and followed a screening of a new documentary, "Monty Python: Almost the Truth (The Lawyer's Cut)."</p>
<p>Monty Python created the influential British television show "Monty Python's Flying Circus," which first aired in 1969, and went on to make popular movies including "Monty Python and the Holy Grail" and "Monty Python's Life of Brian."</p>
<p>The five remaining members of Monty Python took questions from the audience at the event and reminisced. Cleese recalled his most embarrassing moment as "when the queen came down to watch and my trousers fell down."</p>
<p>BAFTA, which hands out Britain's equivalent of the Oscars each year, last honored the Monty Python team in 1987 when it received the Michael Balcon Award for Outstanding British Contribution to Cinema.</p>
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
It is a common practice to use the CSS style "display: none" in microformats. See, for example, http://microformats.org/wiki/hcard-example1-steps
But it is somewhat against the microformat philosophy of "visible metadata" so should be used with caution.
See http://www.w3.org/TR/CSS2/visuren.html#display-prop for more information on the style.
<?xml version="1.0" encoding="UTF-8"?>
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<sent>2003-02-10T11:25:32.000Z</sent>
<sender>example.com</sender>
</header>
<itemSet>
<newsItem guid="tag:example.com,0000:newsml_L10491990" standard="NewsML-G2" standardversion="2.1" conformance="power" xml:lang="en">
<rightsInfo>
<copyrightHolder literal="Example"/>
</rightsInfo>
<itemMeta>
<itemClass qcode="icls:text"/>
<provider literal="example.com"/>
<versionCreated>2009-10-16T11:25:32.000Z</versionCreated>
<firstCreated>2009-10-16T11:25:32.000Z</firstCreated>
<pubStatus qcode="stat:usable"/>
<role qcode="itemRole:N"/>
</itemMeta>
<contentMeta>
<infoSource role="cRole:origProv" literal="Example"/>
<creator literal="Example"/>
<language tag="en"/>
<subject literal="Entertainment"/>
<headline>Monty Python reunite in NY on 40th anniversary</headline>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+html">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body>
<p >By Michelle Nichols</p>
<p>NEW YORK,(Example) - The Monty Python comedy team, the world-renowned British troupe celebrating its 40th anniversary, was honored with a special award on Thursday for its contribution to film and television.</p>
<p>"If you want to get a better view, this will be on eBay tomorrow," joked John Cleese as he accepted the award from the British Academy of Film and Television Arts. Monty Python also included Terry Gilliam, Eric Idle, Terry Jones, Michael Palin and the late Graham Chapman.</p>
<p>The presentation was made at the official 40th anniversary Monty Python reunion event in New York co-hosted by the Independent Film Channel, and followed a screening of a new documentary, "Monty Python: Almost the Truth (The Lawyer's Cut)."</p>
<p>Monty Python created the influential British television show "Monty Python's Flying Circus," which first aired in 1969, and went on to make popular movies including "Monty Python and the Holy Grail" and "Monty Python's Life of Brian."</p>
<p>The five remaining members of Monty Python took questions from the audience at the event and reminisced. Cleese recalled his most embarrassing moment as "when the queen came down to watch and my trousers fell down."</p>
<p>BAFTA, which hands out Britain's equivalent of the Oscars each year, last honored the Monty Python team in 1987 when it received the Michael Balcon Award for Outstanding British Contribution to Cinema.</p>
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
hNews
By simply editing their HTML templates, TownNews is able to offer hNews as an option to its customers.
Let's look at how a publisher that publishes
NewsML-G2 + XHTML news articles to multiple recipients
could adopt hNews.
An example
Townnews.com is an early hNews adopter
Let's see how they added hNews into their CMS templates
Recipients of the NewsML-G2 + XHTML assemble the story for the web
For example, by displaying the headline and update date and time above the body
In the original HTML, the byline is
<p class="byline">By Bill Dolan bill.dolan@nwi.com, (219) 662-5328</p>
An Article as Rendered by TownNews
Let's turn it into an hATOM author. First, we wrap it in a class="vcard"
<p class="byline">By <span class="vcard">Bill Dolanbill.dolan@nwi.com, (219) 662-5328</span></p>
Now we pick out the formatted name via the "fn" property and we have a valid hCard
I'm just using this Reuters.com artcicle to illustrate. I don't believe that they currently use hNews
<p class="byline">By <span class="vcard">
<span class="fn">Bill Dolan</span> bill.dolan@nwi.com, (219) 662-5328</span>
</p>
<p>
Let's add the optional hCard properties for email and telephone number
<p class="byline">By <span class="vcard">
<span class="fn">Bill Dolan</span> <span class="email">bill.dolan@nwi.com</span>,
<span class="tel">(219) 662-5328</span></span>
</p>
Finally, let's add the class="author", which is used by hATOM to indicate that this hCard is the author of the article
<p class="byline">By <span class="author vcard">
<span class="fn">Bill Dolan</span> <span class="email">bill.dolan@nwi.com</span>,
<span class="tel">(219) 662-5328</span></span>
</p>
Notice that class attributes in HTML allow lists.
Check out http://microformats.org/wiki/hcard for more on hCard
As well as class attributes, it is common to use the rel attribute
to label a link.
For example rel-tag, item-license and rel-principles
<a rel="tag" href="/topic/merrillville">Merrillville</a>
<a rel="item-license" href="http://townnews.example.com/license/">
All rights reserved</a>
<a href="http://groucho.example.net/statement" rel="principles">
Those are my principles, and if you don't like them... well, I have others.</a>
Also, it is common to put the machine readable version of dates and times in a title attribute
Here we use @title for the hATOM updated property
<span title="2009-08-09T17:32-0500" class="updated">5:32 pm, Sun Aug 9, 2009.</span>
An introduction
by Stuart Myles
smyles@ap.org
X2V
http://suda.co.uk/projects/X2V/
An XSLT to transform hCard / hCalendar into vCard/iCalendar
Browser Plugins
Oomph for IE
Operator for FF
Tools
Several tools work with microformats
And it is easy enough to write your own
hAtom2Atom
http://rbach.priv.at/Microformats/hAtom2Atom/
rel-lint
http://tools.microformatic.com/help/xhtml/rel-lint/
A lint for rel-tag and other microformats that make use of the rel attribute of links.
Picture credits
All photo's are CC licensed on Flickr:
http://www.flickr.com/photos/kirklau/1357780086/
http://www.flickr.com/photos/eggplant/10440398/
http://www.flickr.com/photos/dharmasphere/1848329163/
http://www.flickr.com/photos/falsalama/2216112021
http://www.flickr.com/photos/karramarro/2400739038/
http://www.flickr.com/photos/atoach/3934285771/
http://www.flickr.com/photos/nickwheeleroz/2391631937/
http://www.flickr.com/photos/santarosa/405155915/
http://www.flickr.com/photos/davidandnalini/389537711/
http://www.flickr.com/photos/vernhart/1574355240/
http://microformats.org/about
Many thanks to all