Skip to content
World Wide Web Server edited this page Jul 4, 2012 · 31 revisions

I have created this page to document the RSSParser library I have put together. I wanted to add RSS items and couldn't find anything to achieve this and so have put this together for anyone else that would like it.

The RSS parsing was adapted closely from [url=http://www.techbytes.co.in/blogs/2006/01/15/consuming-rss-with-php-the-sim ple-way/]this [/url] link

Because the class is loading an externally hosted file performance van very wildly so I have added some simple caching into the class

If anyone has any suggestions or comments I'd be happy to hear them

Cheers

/Matt

[h3]RSSParser class[/h3]

[code] <php if (!defined('BASEPATH')) exit('No direct script access allowed');

/*

class RSSParser { // ===================// //Instance vars // // ===================//

/* Feed URI */
var $feed_uri;

/*  The idea is to push and pop tag names onto a stack depending upon whether */
/*  we encounter an open tag or a close tag respectively */
var $tag_stack;

/* We are parsing items one at a time, and at any time, this array will
contain elements for the current item being parsed.
Just prior to parsing the next item in the RSS, the data in this array
is emptied and pushed on to the $data associative array defined below
and the process is repeated till there are no further items in the RSS file */
var $current_feed;

/* Store character data between the current open and close tags being parsed */
var $character_data;

/* Associative array containing all the feed items */
var $data;

/* Store RSS Channel Data in an array */
var $channel_data;

/*  Boolean variable which indicates whether an RSS feed was unavailable */
var $feed_unavailable;

/* Cache lifetime */
var $cache_life;

/* Flag to write to cache - defaulted to false*/
var $write_cache_flag = false;

/* Code Ignitor cache directory */
var $cache_dir;

// ================ //
// Constructor      //
// ================ //
function RSSParser($params) {
     $this->CI = &get_instance();
     $dir = $this->CI->config->item('cache_path');
     $this->cache_dir = ($dir == '') ? BASEPATH.'cache/' : $dir;

     //$this->cache_dir = '/system/cache';
     $this->cache_life = $params['life'];

     $this->feed_uri = $params['url'];
     $this->tag_stack = array();
     $this->character_data = '';
     $this->current_feed["title"] = '';
     $this->current_feed["description"] = '';
     $this->current_feed["link"] = '';
     $this->data = array();
     $this->channel_data = array();

     //Attempt to parse the feed
     $this->parse();
}

// =============== //
// Methods         //
// =============== //
function parse() {
     //Are we caching?
     if ($this->cache_life != 0)
     {
         $filename = $this->cache_dir.'rss_Parse_'.md5($this->feed_uri);

         //is there a cache file ?
         if (file_exists($filename))
         {
              //Has it expired?
              $timedif = (time() - filemtime($filename));
              if ($timedif < ( $this->cache_life * 60))
              {
                     //its ok - so we can skip all the parsing and just return the cached array here
                     $this->data = unserialize(implode('', file&#40;$filename&#41;));
                     return true;
               }
 
          //So raise the falg
          $this->write_cache_flag = true;

          } else {
                //Raise the flag to write the cache
                $this->write_cache_flag = true;
          }
      }
      
      /*  instantiate the in-built parser */
      $parser =& xml_parser_create();

      /*  Bind this class object to the parser. This tells the parser to call
      this very class object's methods, such as event handlers, instead of
      any global functions. This is because in this example we make the event
      handler methods a part of this very class.Alternatively, what we could
      have done was create a separate class for these event handlers, in
      which case, instead of passing $this to xml_set_object(), we would have
      passed that other class's object instance variable */

      xml_set_object(&$parser, $this);

      /* Next we configure the parser to not automatically convert all tag names to uppercase. */
      xml_parser_set_option(&$parser, XML_OPTION_CASE_FOLDING, false);

      /*  Okay, now we inform the parser as to which functions will be
      responsible for handling events trigerred by open and close xml tags.
      Note that further down, we will be implementing these functions */

      xml_set_element_handler(&$parser, 'handleOpenTag', 'handleCloseTag');

      /* Finally, we register the callback function for handling events
      whenever character data is encountered by the parser */

      xml_set_character_data_handler(&$parser, 'handleCharacterData');

      /* lets try opening the rss feed file <85> this file can be a local
      one or can be in a remote location specified by a URI */

      if(!($fp = @fopen&#40;$this->feed_uri, 'r'&#41;)) {
          xml_parser_free(&$parser);
          $this->feed_unavailable = true;
          return false;
      }

      /* Let's start reading the file 4096 characters at a time */
      while($line = fread($fp, 4096)) {

         /* We now invoke the in-built Expat parser, and pass it the parser
         object that we created, and the data that we've read so far. Note
         that once control is passed to the Expat parser, it will 
         automatically invoke callback functions that we registered above
         whenever events are triggered. Shortly, we will implement these
         callback functions */
         xml_parse(&$parser, $line, feof($fp));

      }

      /* All done. Time to free up all the resources. */
      fclose($fp);
      xml_parser_free(&$parser);

      //Do we need to write the cache file?
      if ($this->write_cache_flag)
      {
           if ( ! $fp = @fopen&#40;$filename, 'wb'&#41;)
           {
               echo "ERROR";
               log_message('error', "Unable to write ache file: ".$cache_path);
               return;
           }
           flock($fp, LOCK_EX);
           fwrite($fp, serialize($this->data));
           flock($fp, LOCK_UN);
           fclose($fp);
      }

      return true;
 }


 /* The open tag event handler  */
 function handleOpenTag($parser, $tag_name, $tag_attributes) {
      switch($tag_name) {
      /* Most RSS feed formats make use of the &lt;rss&gt; tag to specify
      that we're looking at an RSS file. Some formats, like RSS 1.0, make use
      of <rdf :RDF> tag so we merge both the above case and this one, and
      treat them as the same thing. */

      case 'rss':
      case 'rdf:RDF':
         /* Push it onto the stack. Later we'll pop it off, when we encounter
         the close tag version */
         array_push($this->tag_stack, $tag_name);

         break;

      case 'item':
      case 'title':
      case 'description':
      case 'link':
      case 'pubDate':

       /* before pushing these onto the stack just check if the rss or the
       rdf:RDF tags are in the stack. We dont need to check this actually
       <85>.. this is just to prevent wrongly formed feeds from messing up
       our code */

       if(in_array('rss', $this->tag_stack) || in_array('rdf:RDF', $this->tag_stack))
           array_push($this->tag_stack, $tag_name);

           break;

      default:
   }
}


/* The character data handler function */
function handleCharacterData($parser, $cdata) {

   if(in_array('rss', $this->tag_stack) || in_array('rdf:RDF', $this->tag_stack)) {
       $stack_top_index = count($this->tag_stack) - 1;

       /* if the last tag that was pushed onto the stack is either one of
       title, description or link, then let's store the character data. This
       character data is recorded in the class variable $character_data. Note
       carefully the .= assignment below. It could easily have been simply an
       = assignment. But, very often multiple events are triggered for the
       same data in between an open and a close tag. so we "join" all this
       data together and store it in the variable $character_data  */

       if(in_array($this->tag_stack[$stack_top_index], array('title', 'description', 'link', 'pubDate')))
           $this->character_data .= $cdata;
   }

 }


/* The close tag handler function */
function handleCloseTag($parser, $tag_name) {
  switch($tag_name) {
        case 'rss':
        case 'rdf:RDF':
             array_pop($this->tag_stack);
             break;

        case 'title':
        case 'description':
        case 'link':
        case 'pubDate';

            /* There are two possible locations where the &lt;title&gt;,
            <description> and &lt;link&gt; can be found: either nested inside
            item tags, or outside it. Depending upon this, they either
            correspond to individual feed items or channel data for the
            entire feed itself. This is what we check now. Also note below
            that the $character_data class variable holds data in between
            open and close tags corresponding to the element that has just
            triggered this handler function.   */
            if(in_array('item', $this->tag_stack)) {
                $this->current_feed["$tag_name"] = $this->character_data;
            } else {
                $this->channel_data["$tag_name"] = $this->character_data;
            }

            array_pop($this->tag_stack);
            $this->character_data = '';

            break;

        case 'item':

            /* Okay, we know that the item tags encapsulates the title,
            description and the link tags. so it's pretty obvious that when
            you encounter &lt;/item&gt; all data for the current item has
            already been fetched and placed in the $current_feed class
            variable. So, we now take this data for the current feed and
            push it onto the $data array variable and then re-initialize the
            $current_feed variable in preparation for the next feed item.
            The $data class variable will, in the end, store all the feed
            items in an array format */

            array_pop($this->tag_stack);
            array_push($this->data, $this->current_feed);
            $this->current_feed = array();
            break;

        default:
    }

}



/* Return the feeds one at a time: when there are no more feeds return false
 * @param No of items to return from the feed
 * @return Associative array of items
*/
function getFeed($num) {
   $c = 0;
   $return = array();
   foreach($this->data AS $item)
   {
      $return[] = $item;
      $c++;
      if($c == $num) break;
   }
   return $return;
}

/* Return channel data for the feed */
function & getChannelData() {
   $flag = false;
   if(!empty($this->channel_data)) {
      return $this->channel_data;
   } else {
      return $flag;
   }
}

/* Were we unable to retreive the feeds ?  */
function errorInResponse() {
   return $this->feed_unavailable;
}

} ?> [/code]

Clone this wiki locally