28 February 2008

Attention Profile

Liferea is a news aggregator and each day allows its users to read maybe hundreds of new blog posts, news articles or podcasts. Many of those are tagged by their authors by descriptive categories. So if it know what the user likes to read most why cannot it preselect those favourite "type" of articles?

The new 1.5 code now keeps track of the absolute number of read categories. Under the "Tools" menu you can now find a new option "Attention Profile" to view the per-category count.

While this might not yet be very useful, this statistic keeping opens up the possibility for more sophisticated features. For example search folders for your most favourite categories, feed and item rating, APML exporting...

Be warned this is experimental, it might work out, it might not. It might hurt performance, or not. Also it arises ethical questions about creating user profiles. All things that still need to be thought about.

Update: Due to performance problems, the Attention Profile has been disabled for Liferea 1.6

27 February 2008

"All Rules Match" Search Folders

Until now search folder rules were "additive" or "removing". This mean when only one of the "additive" and none of the "removing" match rules did match an item it was displayed by the search folder. User feedback over time did show that this is not always intuitive and does not match each use case.

To improve this the search folder properties for 1.5 have changed:

Instead of the long logic explanation there are now two radio buttons allowing to define the intended logic. With "Any Rule Matches" you can create search folders that for example match several rare terms. And using "All Rules Must Match" you can filter all feeds for items on a specific topic identified by one or more keywords.

To give proper credit I must mention that this change was motivated by the searching dialog of RSSOwl (a great platform independant Java based aggregator) which has even more nice feature like instant preview and live updating.

26 February 2008

Release Schedule Calendar

For everyone who needs to know when the next Liferea version will be released (approximately) I created an online calendar, which is embedded at the bottom of the blog main page and can be subscribed in ICAL and Atom format.

12 February 2008

Better Handling Plain Text Content

Current Liferea releases do not handle plain text RSS item content gracefully. If item content is not HTML markup-escaped by the feed generator all text of such an new items ends up in one line without it's line breaks being rendering. This doesn't look very well and makes lists or formatted plain texts unreadable.

For 1.5.x the plan is to solve the problem by auto-detecting the text type of the item description. If it contains no markup than it is to be treated as plain text and all ASCII line breaks need to be converted to HTML line breaks for correct rendering. The critical point here is the plain text/HTML detection. The test implementation in SVN trunk currently only checks for physical HTML tags like <i>, <b> or <a href=""> indicating HTML markup. The risk of this approach is to add additional line breaks to valid HTML content that is not correctly recognized.

If you try 1.5.x/SVN trunk and experience formatting problems with twice as much line breaks or missing line breaks for pure plain text please give some feedback!

Handling Redundancy in Content

Nowadays many feed sources do provide content using Atom or RSS and augment it with application specific namespace providing own tags that often duplicate the content in the container format. For example an iTunes podcast can have an item <description> in the Atom/RSS <item> tag along with an <itunes:summary> description of different quality.

Up until 1.4.x Liferea had a simple implementation primarily using the Atom/RSS description. With the exception of the <content:encoded> tag from the Content-Namespace which depending on tag order will always overrule the default description. Only if there was no default item description additional namespace infos (atom:summary, dc:description...) where used as a content source.

This was an unsatisfactory solution for several reasons:

  • More detailed infos in application specific namespaces are invisible.
  • Ordering problems with <description> and <content:encoded> did sometimes hide better content.
  • Dublin Core description (while rare to encounter) did never win.
  • The scenario of a better summary than description always caused the short description to win.
As a simple solution Liferea 1.5.x now selects the "best content" by simple length comparsion. The assumption is that the format of the content (plain text, HTML, XHTML...) doesn't matter, or more exactly the additional length of (X)HTML encoding indicates better content.

As a result you might see additional content in namespace-rich feeds (e.g. iTunes podcast feeds).