Skip to content
World Wide Web Server edited this page Jul 4, 2012 · 31 revisions

[h3]RSSParser[/h3] I have created this page to document the RSSParser library I have put together. I wanted to add RSS items and couldn't find anything to achieve this and so have put this together for anyone else that would like it.

The RSS parsing was adapted closely from [url=href="http://www.techbytes.co.in/blogs/2006/01/15/consuming-rss-with-php-the-sim ple-way/"]this [/url] link

Because the class is loading an externally hosted file performance van very wildly so I have added some simple caching into the class

If anyone has any suggestions or comments I'd be happy to hear them

Cheers

/Matt

[code] <?php if (!defined('BASEPATH')) exit('No direct script access allowed');

/*

class RSSParser { // ===================// //Instance vars // // ===================//

/* Feed URI */
var $feed_uri;

/*  The idea is to push and pop tag names onto a stack depending upon whether */
/*  we encounter an open tag or a close tag respectively */
var $tag_stack;

/* We are parsing items one at a time, and at any time, this array will */
/* contain elements for the current item being parsed.  */
/*  Just prior to parsing the next item in the RSS, the data in this array */
/*  is emptied and pushed on to the $data associative array defined below */
/*  and the process is repeated till there are no further items in the RSS file */
var $current_feed;

/* Store character data between the current open and close tags being parsed */
var $character_data;

/* Associative array containing all the feed items */
var $data;

/* Store RSS Channel Data in an array */
var $channel_data;

/*  Boolean variable which indicates whether an RSS feed was unavailable */
var $feed_unavailable;

/* Cache lifetime */
var $cache_life;

/* Flag to write to cache - defaulted to false*/
var $write_cache_flag = false;

/* Code Ignitor cache directory */
var $cache_dir;

// ================ //
// Constructor      //
// ================ //
function RSSParser($params) {
     $this->CI = &get_instance();
     $this->cache_dir = ($this->CI->config->item('cache_path') == '') ? BASEPATH.'cache/' : $this->CI->config->item('cache_path');

     //$this->cache_dir = '/system/cache';
     $this->cache_life = $params['life'];

     $this->feed_uri = $params['url'];

$this->tag_stack = array(); $this->character_data = ''; $this->current_feed["title"] = ''; $this->current_feed["description"] = ''; $this->current_feed["link"] = ''; $this->data = array(); $this->channel_data = array();

     //Attempt to parse the feed
     $this->parse();
}

// =============== //
// Methods         //
// =============== //
function parse() {
     //Are we caching?
     if ($this->cache_life != 0)
     {

         $filename = $this->cache_dir.'rss_Parse_'.md5($this->feed_uri);

         //is there a cache file ?
         if (file_exists($filename))
         {
              //Has it expired?
              $timedif = (time() - filemtime($filename));
              if ($timedif < ( $this->cache_life * 60))
              {
                     //its ok - so we can skip all the parsing and just return the cached array here
                     $this->data = unserialize(implode('', file&#40;$filename&#41;));
                     return true;
               }
 
          //So raise the falg
          $this->write_cache_flag = true;

          } else {
                //Raise the flag to write the cache
                $this->write_cache_flag = true;
          }
      }

            /*  instantiate the in-built parser */
            $parser =& xml_parser_create();

    /*  Bind this class object to the parser. This tells the parser to call this very class
            object's methods, such as event handlers, instead of any global functions. This is because
            in this example we make the event handler methods a part of this very class.
            Alternatively, what we could have done was create a separate class for these event
            handlers, in which case, instead of passing $this to xml_set_object(), we would have
            passed that other class's object instance variable */

            xml_set_object(&$parser, $this);

   /* Next we configure the parser to not automatically convert all tag names to uppercase. */
            xml_parser_set_option(&$parser, XML_OPTION_CASE_FOLDING, false);

   /*  Okay, now we inform the parser as to which functions will be responsible for handling

events trigerred by open and close xml tags. Note that further down, we will be implementing these functions */

            xml_set_element_handler(&$parser, 'handleOpenTag', 'handleCloseTag');

    /* Finally, we register the callback function for handling events whenever character data
            is encountered by the parser */

            xml_set_character_data_handler(&$parser, 'handleCharacterData');

    /* lets try opening the rss feed file <85> this file can be a local one or can be in a remote
            location specified by a URI */

            if(!($fp = @fopen&#40;$this->feed_uri, 'r'&#41;)) {
                    xml_parser_free(&$parser);
                    $this->feed_unavailable = true;
                    return false;
            }

    /* Let's start reading the file 4096 characters at a time */
            while($line = fread($fp, 4096)) {

                    /* We now invoke the in-built Expat parser, and pass it the parser object that
                    we created, and the data that we've read so far. Note that once control is passed to the
                    Expat parser, it will automatically invoke callback functions that we registered above
                    whenever events are triggered. Shortly, we will implement these callback functions */

                    xml_parse(&$parser, $line, feof($fp));

            }

            /* All done. Time to free up all the resources. */
            fclose($fp);
            xml_parser_free(&$parser);

            //Do we need to write the cache file?
            if ($this->write_cache_flag)
            {
                    if ( ! $fp = @fopen&#40;$filename, 'wb'&#41;)
                    {
                            echo "ERROR";
                            log_message('error', "Unable to write ache file: ".$cache_path);
                            return;
                    }
                    flock($fp, LOCK_EX);
                    fwrite($fp, serialize($this->data));
                    flock($fp, LOCK_UN);
                    fclose($fp);
            }

            return true;
    }


    /* The open tag event handler  */
function handleOpenTag($parser, $tag_name, $tag_attributes) {
            switch($tag_name) {

/* Most RSS feed formats make use of the <rss> tag to specify that we're looking at an RSS file. Some formats, like RSS 1.0, make use of tag so we merge both the above case and this one, and treat them as the same thing. */

                    case 'rss':
                    case 'rdf:RDF':

                            /* Push it onto the stack. Later we'll pop it off, when we encounter the close tag version */
                            array_push($this->tag_stack, $tag_name);

                    break;

                    case 'item':
                    case 'title':
                    case 'description':
                    case 'link':
                    case 'pubDate':

                    /* before pushing these onto the stack just check if the rss or the rdf:RDF tags are in
                    the stack. We dont need to check this actually <85>.. this is just to prevent wrongly formed
                    feeds from messing up our code */

                            if(in_array('rss', $this->tag_stack) || in_array('rdf:RDF', $this->tag_stack))
                                    array_push($this->tag_stack, $tag_name);

                            break;

                    default:
            }
    }


/* The character data handler function */
function handleCharacterData($parser, $cdata) {

            if(in_array('rss', $this->tag_stack) || in_array('rdf:RDF', $this->tag_stack)) {
                    $stack_top_index = count($this->tag_stack) - 1;

                    /* if the last tag that was pushed onto the stack is either one of title,
                    description or link, then let's store the character data. This character data is
                    recorded in the class variable $character_data. Note carefully the .= assignment
                    below. It could easily have been simply an = assignment. But, very often multiple
                    events are triggered for the same data in between an open and a close tag. so we
                    "join" all this data together and store it in the variable $character_data  */

                    if(in_array($this->tag_stack[$stack_top_index], array('title', 'description', 'link', 'pubDate')))
                            $this->character_data .= $cdata;
            }

}


/* The close tag handler function */
function handleCloseTag($parser, $tag_name) {
            switch($tag_name) {
        case 'rss':
                    case 'rdf:RDF':

array_pop($this->tag_stack); break;

        case 'title':
                    case 'description':
                    case 'link':
                    case 'pubDate';

            /* There are two possible locations where the &lt;title&gt;, <description> and
                            &lt;link&gt; can be found: either nested inside item tags, or outside it. Depending upon this,
                            they either correspond to individual feed items or channel data for the entire feed itself.
                            This is what we check now. Also note below that the $character_data class variable holds
                            data in between open and close tags corresponding to the element that has just triggered
                            this handler function.   */

                            if(in_array('item', $this->tag_stack)) {
                                    $this->current_feed["$tag_name"] = $this->character_data;
                            } else {
                                    $this->channel_data["$tag_name"] = $this->character_data;
                            }

                            array_pop($this->tag_stack);
                            $this->character_data = '';

                            break;


                    case 'item':

        /* Okay, we know that the item tags encapsulates the title, description
                    and the link tags. so it's pretty obvious that when you encounter &lt;/item&gt; all
                    data for the current item has already been fetched and placed in the $current_feed
                    class variable. So, we now take this data for the current feed and push it onto the $data
                    array variable and then re-initialize the $current_feed variable in preparation for the
                    next feed item. The $data class variable will, in the end, store all the feed items in an
                    array format */

                    array_pop($this->tag_stack);
                    array_push($this->data, $this->current_feed);
                    $this->current_feed = array();
                    break;

            default:
    }

}



/* Return the feeds one at a time: when there are no more feeds return false
     * @param No of items to return from the feed
     * @return Associative array of items
    */
function getFeed($num) {
            $c = 0;
            $return = array();
            foreach($this->data AS $item)
            {

$return[] = $item; $c++; if($c == $num) break; } return $return; }

/* Return channel data for the feed */
function & getChannelData() {
            $flag = false;
            if(!empty($this->channel_data)) {
                    return $this->channel_data;
            } else {
                    return $flag;
            }
}

    /* Were we unable to retreive the feeds ?  */
function errorInResponse() {
            return $this->feed_unavailable;
}

} [/code]

Clone this wiki locally