Reading RSS in Perl like a DB

Warning - Old Content

This post is quite old, and it might not apply anymore, or maybe there's a better way to do the same thing nowadays. Take with a big grain of salt.

My website currently uses RSS feeds as data sources, and I’m trying to replace some custom parsing code with standard CPAN modules. The DBD::AnyData module seems almost perfect, with the exception that I just spent a few hours banging my head on it trying to get it to read RSS properly. Despite trying many different combinations, it kept reading the channel title instead of the item title.

As it turns out, the secret is to set the ‘root_tag’ flag, which will get passed all the way down to XML::Twig as the “twig_roots” field. By setting the root, you can get the channel tags to be stripped off, leaving only the tags that you actually want. Here’s a complete test script.

Another huge gotcha is that DBD::AnyData isn’t very good at cleaning up, so make sure that the $flags parameter to $dbh->func is NOT re-used - use the clone module to make sure it starts with a fresh set of flags.

#!/usr/bin/perl

use warnings;
use strict;

use Data::Dumper;
use DBI;

# need to install libdbd-anydata-perl for this to work.
require DBD::AnyData;

my $rss_url = "testfeed.rss";

# set up a DBI connection for AnyData
my $dbh = DBI->connect('dbi:AnyData(RaiseError=>1):');

my ( $table, $format ) = ( 'testfeed', 'XML' );

my $flags = {
    'root_tag' => 'item', # this gets rid of the top level channel fields and lets the column map work.
    'col_map' => [
        { "guid" => 'guid' },
        { "title" => 'title' },
        { "pubDate" => 'pubdate' },
        { "source" => 'source' },
        { "link" => 'url' },
        { "description" => 'description' },
    ],
};

# load the data
$dbh->func( $table, $format, $rss_url, $flags, 'ad_catalog' );

# run a regular SQL query to get all of the rows
my $items_sth = $dbh->prepare( "SELECT * FROM testfeed" );
$items_sth->execute();

# output the rows to make sure they have the right info
while ( my $row = $items_sth->fetchrow_hashref() )
{
    print "\n=row:\n";
    print Dumper $row;
}

exit 0;

__END__