You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi (again)
This is just a suggestion for improvement
I am making any scrappers to get data of several webs, and. I'm concerned about
the possibility of
any changes in the structure of the webs that I'm scrapping.
My scrappers do sistematic (and unnatended) work so... I always need to check
if all the tags what are I
spected are in the web page and log it for posterior analysis.
With this in my main... I never can concatenate several operations (select,
getPlainText, etc) because if any of
the selects returns null, the script crash with the error:
Fatal error: Call to a member function getPlainText() on a non-object in ...
Sometimes I call to select just for test if a node is present (for example,
test if the div with id
"LastMinuteOffer" it's present.
In this case, I dont concatenate calls, just do:
$t1=$html->select('div#LastMinuteOffer',0);
if ($t1){
//There are a last minute offfer...
}
But sometimes, I just want to get the text of a delimited node, so, in any
cases, I concatenate several
calls in one, something like this:
$MovieTitle=$html->select('h3.title a.title',0)->getPlainText();
In this case, if the select fails, returns null, so... the getPlainText() fires
the error:
Fatal error: Call to a member function getPlainText() on a non-object in ...
and the script fails.
This circunstance forces me to no concatenate nothing and test every thing,
with nasty code like this:
$t1=$html->select('h3.title a.title',0)->getPlainText();
if (!$t1) {$TheError='Fail in Movie Title'; return false }
$MovieTitle=$t1->getPlainText();
I have done a new function to improve my code, perhaps any other guy is
interested in:
select_imperative
With this function, I can concatenate all I want without danger of errors and I
can catch the exception if any of the
selects fails.
I can do something like:
try {
$MovieTitle=$html->select_imperative('h3.title a.title',0)->getPlainText();
} catch(Exception $e) {
$TheError='Fail in Movie Title: '.$e->getMessage()."\n";
return false; //Return with error
}
return true; //Return All ok
Or can catch group all the errors in just one:
try {
$MovieTitle=$html->select_imperative('h3.title a.title',0)->getPlainText();
$Author=$html->select_imperative('span.author',0)->getPlainText();
$Date=$html->select_imperative('span.date',0)->getPlainText();
$Format=$html->select_imperative('span.format',0)->getPlainText();
} catch(Exception $e) {
$TheError='Error scrapping Movie: '.$e->getMessage();
return false; //Return with error
}
return true; //Return All ok
With this I reduce my code huff.... a lot.
In the class HTML_Node:
function select_imperative($query = '*', $index = false, $recursive = true, $check_self = false) {
if ( ($rv=$this->select($query,$index,$recursive, $check_self)) == null){
throw new Exception('Null query in select: '.$query);
} else return $rv;
}
and, in the class HTML_Parser:
function select_imperative($query = '*', $index = false, $recursive = true, $check_self = false) {
return $this->root->select_imperative($query, $index, $recursive, $check_self);
}
Regards!
Original issue reported on code.google.com by [email protected] on 21 Sep 2012 at 6:26
The text was updated successfully, but these errors were encountered:
Original issue reported on code.google.com by
[email protected]
on 21 Sep 2012 at 6:26The text was updated successfully, but these errors were encountered: