Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory exhausted errors #20

Closed
whiteatom opened this issue Apr 27, 2014 · 57 comments
Closed

memory exhausted errors #20

whiteatom opened this issue Apr 27, 2014 · 57 comments
Assignees
Labels
Milestone

Comments

@whiteatom
Copy link

I have been a long time users of Browscap v1.0, however, I decided to upgrade to the newest version. Now I am getting constant out of memory errors in the Browscap script.

Fatal error: Allowed memory size of 201326592 bytes exhausted (tried to allocate 32 bytes) in /www/tracker/dev/incs/Browscap.php on line 683

I have had to increase the memory limit 3 times (up to 256MB) to get it to work.

Why is this new script so inefficient on memory? is there a fixed planned as it is not sustainable to run a production server with a single php script requiring that much memory.

@mimmi20
Copy link
Member

mimmi20 commented Apr 27, 2014

Which version of the browscap file do you use (5026 full)?

@whiteatom
Copy link
Author

the latest version of the Browscap.php file uses the full one.. I was hoping to switch to the light, but there is a comment above that says I can't.

(note: light version doesn't work, a fix is on its way)

I would assume that the php script is pulling the latest file 5027 I believe.

@mikesitguys
Copy link

I too have this same issue, i have been using Browscap (the latest version and the full version), for 2 months, and this problem just occurred yesterday! Even increasing the memory limit to 256MB, doesn't work, or switching to the light version of the ini file.

@asgrim
Copy link
Member

asgrim commented Apr 28, 2014

Hi there - we are aware of the increased file size causing memory issues in some cases. The problem is caused by the sizes of the INI file dramatically increasing since the project has become more active, and is simply as a result of many more user agents being defined in the file, thus the file has grown quite large.

There are a couple of potential solutions I see to this:

  • Generate split files to reduce memory usage browscap#197 - using a "split file" system where the files are split into importance, e.g. most common in first file, then others in order of frequency. This would not be supported by browscap/browscap-php however
  • Improve parsing on browscap/browscap-php component - I haven't personally looked at it in much detail, but perhaps there is something inefficient in the INI parsing/caching that is going on that we could avoid such high memory usage.

It is also worth noting that our official recommendation is:

  • Set doAutoUpdate value to false to avoid automatically checking on web requests
  • Create a background script which runs once a day or once a week that checks for updates

For example:

// When parsing the UA on the site
$browscap = new phpbrowscap\Browscap($cacheDir);
$browscap->doAutoUpdate = false;
$information = $browscap->getBrowser($userAgent);

and then in a background processed script (e.g. on a cron job) that gets run once a day/week:

$browscap = new phpbrowscap\Browscap($cacheDir);
$browscap->updateCache();

This way, it is only the background script that would consume large amounts of memory, not your actual website.

That said, we are aware of the issue and it is something we definitely want to fix. Pull requests welcome, as always ;)

@whiteatom
Copy link
Author

Thanks for the reply.... but.....
What your saying is that there is a bug that pretty much makes the entire project unusable and you're just going to close the ticket? I might suggest you leave this open so that every other person who tries to use your software and can't might have some idea why.

This issue you have referenced is obviously related.. but to a newbie trying out your project for the first time.. it might appear that it just doesn't work - and an unsupported operand bug doesn't really help.

@asgrim
Copy link
Member

asgrim commented Apr 29, 2014

I did not say I was going to close this ticket, I acknowledged that this is a problem and we are considering solutions. Just because the "unsupported operand" issue is closed it does not mean it won't appear in search results, and if someone has the same issue, they'll end up here after reading that issue.

The project is not unusable you just need to increase the memory limit. Additionally, I have provided you a workaround which means your production servers are minimally affected until we are able to fix this issue in a decent way.

Please don't make assumptions based on no evidence - this issue is staying firmly open until we fix it, rest assured :)

@asgrim asgrim added the bug label Apr 29, 2014
@asgrim
Copy link
Member

asgrim commented Apr 30, 2014

@whiteatom one of the proposals we are making is to re-arrange what is included in each file - any feedback you have would be appreciated over at browscap/browscap#248 :)

@cziegenberg
Copy link
Contributor

I'm currently working on it. After a few first changes I could reduce the memory usage by 8% percent (156M -> 143M).

Question: Which is the minimum PHP version to support?

@asgrim
Copy link
Member

asgrim commented May 8, 2014

@ziege what method are you using?

@asgrim
Copy link
Member

asgrim commented May 8, 2014

@ziege and in answer to your question, currently >= 5.3, we haven't discussed increasing this to 5.5 yet.

@cziegenberg
Copy link
Contributor

First change: unset all variables as early as possible in updateCache(). For the next changes I first have to look a bit deeper into the code...

@cziegenberg
Copy link
Contributor

After the second change: 25% reduction (156M -> 117M). More coming soon.

@cziegenberg
Copy link
Contributor

Good news, I could reduce the memory peak usage by nearly 60%.

I'm currently testing on another system (with PHP 5.6.0beta1) and could optimize the usage there from 101.92M to 41.45M. It will be hard to optimize it more, because PHP requires 39.12M for parsing the ini file into the $browsers array, which can only be changed by changing the ini file.
Tomorrow I will test it once again with the previous system (PHP 5.5.x).

What did I do?

  • unset all variables as soon as possible
  • split loop into multiple parts and change their order to be able to unset unnecessary data between those parts
  • replace foreach loops with for loops
  • use SplFixedArray where possible
  • do not save data in object properties, but in temporary files, to reduce memory peak while preparing cache data

I also fixed a caching bug, see pull request.

@cziegenberg
Copy link
Contributor

Test results with PHP 5.5.11:
Old version: Memory peak 167.87M
New version: 75.43M
Reduction: 54,82%

In PHP 5.5.11 the parsing of the INI file requires much more memory than in PHP 5.6 (61M compared to 39M), which explains the huge difference.

I will prepare a pull request now.

@asgrim
Copy link
Member

asgrim commented May 9, 2014

That's fantastic, and welcome news :)

@xavierhb
Copy link

xavierhb commented May 9, 2014

@ziege nice work! Share it soon! 👍

@cziegenberg
Copy link
Contributor

@xavierhb See pull request #26

@whiteatom
Copy link
Author

Is there a beta build I can try out? I am now getting memory exhausted errors with PHP set to 256MB.

@whiteatom
Copy link
Author

@asgrim In response to your message above, I am not trying to be insulting.. but anyone on shared hosting won't be able to increase their memory allocation - and cron jobs are probably above many people using this project. It is a fantastic effort, I was just pretty put off to have a major but just closed on me as soon as I reported it.

@DaAwesomeP
Copy link
Contributor

I think another fix would be to implement some sort of segmented downloading. It would download the first 4mb, save it, then the next 4mb etc. When it finishes it would combine them all into a single file.

The idea of a cron job would be perfect, but not for most people. A simple require or include runs synchronously and waits for a result before moving on, but maybe a user accessing a webpage could trigger an update, but PHP would not wait for the update to finish. The idea is that the updater runs in the background (seperated from the server), but is triggered by a user. This way newbies who just want to get up and running won'y have to mess with cron. The same currently implemented timer system will be used, but will trigger a different event.

There is an answer on StackOverflow to solve this problem (Is there a way to use shell_exec without waiting for the command to complete?):

How about adding.

"> /dev/null 2>/dev/null &"
shell_exec('php measurePerformance.php 47 844 [email protected] > /dev/null 2>/dev/null &');

Note this also gets rid of the stdio and stderr.

@asgrim
Copy link
Member

asgrim commented May 15, 2014

@whiteatom you could try checking out PR #26 and see if that works for you, @ziege has done s load of work to improve it, but I have not had a chance to look over it yet - any feedback is gratefully received :)

@DaAwesomeP I believe PR #26 includes this sort of staged parsing to reduce memory usage, I just need to check it out. Memory improvements are on the way! :)

@asgrim
Copy link
Member

asgrim commented May 15, 2014

@whiteatom also, I'm really not sure where you get the idea that I closed this ticket, if you look through I have never had the intention of closing this issue at all; I acknowledged it was a problem that we need to deal with and I provided a possible workaround for the interim while we fix the issue.

@cziegenberg
Copy link
Contributor

@DaAwesomeP The INI file is only about 6-7MB and that wasn't a problem in my tests. Parsing it with parse_ini_file was a problem (caused a memory peak of about 40MB). I found a way to reduce it by about 25% (by splitting the file and using parse_ini_string instead), but this was unnecessarily complicated and didn't reduce the memory peak caused by the following steps.

@cziegenberg
Copy link
Contributor

After my changes the remaining problem is the large array that results from parsing the INI file (and the additional arrays created from it to optimize the array search operations).

One way to optimize it for the future would be to process the INI file in multiple steps. I tried it by splitting it into the sections marked by included comments (+ the default settings, which should be available for all sections), but noticed that some "Parent" references link to other sections, so that this didn't work. I think it will be difficult to realize this.

Another way would be to download a prepared "cache file" directly instead of the INI file. I'm currently testing it, because some changes are required - but if it works and such a file could be provided by the browscap server, we could reduce the memory peak for the update to about 4MB...

@cziegenberg
Copy link
Contributor

This is my new idea to optimize memory usage:

  • Do not download the INI file but prepared "cache files" (which aren't cache files anymore, but new data file). As it wouldn't be secure to load a PHP file from an external server (file inclusion/code injection, e.g. if the browscap servers are hacked) I tested other formats. The INI format doesn't work due to special characters (at least without bigger changes in the structure), but JSON works great.
  • I splitted the JSON data in multiple files to reduce the overhead created by json_decode(), because parsing the JSON string requires more memory than just including a PHP file with a variable.
  • The total JSON file size is 3.55M, so much smaller than the INI file (6.19M), and the memory peak for loading the files from the server should't be much bigger than the biggest file (2,52M), at least I think so. This would mean a total memory usage reduction of 97,5% for updating/preparing the data. :)
  • The biggest part now is the already mentioned parsing of the JSON files. I could reduce the memory peak usage to 13.69M (compared to 15.80M in the current version, and 13.76M after my last optimization in pull request Memory optimizations when creating cache file #26)

This solution should work for much more browscap data and only requires some prepared browscap JSON files, structured very similar like the current cache files (also with the double encoded values, to optize performance and memory usage - this can be confusing if used with other clients).

Possible improvements:

  • When downloading the data, we have multiple files now that can be downloaded in parallel by using the curl_multi_* functions.
  • It should be easy to create clients in other languages that use the same JSON files, so it's not only a PHP solution.

My questions:

  • Can we go this way? The JSON files need to be prepared on the browscap servers (I created my test files by adjusting the current cache methods, so the files are still generated from the INI file, but this doesn't have to be done on the client anymore.)
  • Should be split the class into a client and a server class (because many parts are no more necessary)? Should it be possible to run an own "browscap server" that enabled the user to control the generation of the JSON files (based on an INI from the browscap server)?

@asgrim asgrim modified the milestones: 3.0.0, 2.1.0 Jun 23, 2014
@ammont
Copy link

ammont commented Jun 26, 2014

I don't want to use another package, what is going to happen to browscap-php?!!

@asgrim
Copy link
Member

asgrim commented Jun 26, 2014

We are still looking into this. As I've said before, this is an open source project that people contribute to in their free time. If you have the inclination, perhaps you could look into this yourself @ammont ;)

@ammont
Copy link

ammont commented Jun 26, 2014

@asgrim thanks, I will as soon as I get a little free time.

@getriebesand
Copy link

I've switched now to the class from ziege and I like it (@ziege thank you for that class)

@HOCD
Copy link

HOCD commented Jul 10, 2014

Hi.

I solved my probleme.
I made a PHP class that creates new Browscap.ini with only information about browser (ie, firefox, chrome...) and I give the new file to object Browscap.

You must make this:
$parser = new ParserBrowsCap(new Browscap(constant("DOCUMENT_ROOT") .'Core/PHP/tmp')); //create object
$browscapObject = $parser->getBrowscapObject();
$info_current_browser = get_object_vars($browscapObject->getBrowser());
$parser->deleteFile(); //delete file

I tried with 3 browsers and if you have the time, could you try with your application and tell me the result?

Thank

////////////////////////////////////////////////////////////////

browscapObject = $browscap; $this->nameFileTempo = session_id(); //tempo file will be called with session id $this->pathBrowscapIni = $browscap->cacheDir . $browscap->iniFilename; $this->pathNewFile = $browscap->cacheDir . "tmp_" . $this->nameFileTempo . ".ini"; $this->updateCach(); if ($this->createNewIniFile()) { $this->browscapObject->iniFilename = basename($this->pathNewFile); return true; } else { return false; } } else { return false; } } private function updateCach() { $interval = time() - filemtime($this->pathBrowscapIni); if ($interval <= $this->browscapObject->updateInterval) {//befor end of limit update, use local file $this->browscapObject->localFile = $this->pathNewFile; } } private function createNewIniFile() { $fileIni = fopen($this->pathBrowscapIni, 'r'); if ($fileIni) { $previousBuffer = ""; //last line $data = ""; //data to write in new file $endFindParameters = 0; //indicate if we found all generals parameters $inData = false; //indicate if we are in parameter data $inBrowserData = false; //indicate if we are in browser data $valueToSearch = array("[GJK_Browscap_Version]", "[DefaultProperties]"); while (($bufferRaw = fgets($fileIni)) !== false) { $buffer = rtrim($bufferRaw); //delete spaces //recover default parameter if ((!$inData) && ((stripos("[GJK_Browscap_Version]", $buffer) !== false) || (stripos("[DefaultProperties]", $buffer) !== false))) {//we find zone parameters $data = $this->addData($data, $buffer); $endFindParameters++; //indicate that we have find one parameters $inData = true; continue; } if ($inData) {//we are in parameter $data = $this->addData($data, $buffer); //add data in result if (preg_match("/[._]/", $buffer)) {//we fine new zone $inData = false; //stop recover parameter } if ((stripos("[GJK_Browscap_Version]", $buffer) !== false) || (stripos("[DefaultProperties]", $buffer) !== false)) { $inData = true; //we are in other parameters $endFindParameters++; } continue; } if (($endFindParameters == 2) && (!$inData) && (!$inBrowserData)) {//when we search pattern for browser if ((preg_match('/Parent="DefaultProperties"/', $buffer)) && (preg_match("/[._]/", $previousBuffer))) {//we find new zone for browser $pattern1 = str_replace(" ", "/", $previousBuffer); //replace " " by / $pattern = str_replace(".", ".", $pattern1); //replace "." by . if (preg_match($pattern, $_SERVER['HTTP_USER_AGENT'])) {//check if it's user agent $data = $this->addData($data, $previousBuffer); $data = $this->addData($data, $buffer); $inBrowserData = true; continue; } } } if (($endFindParameters == 2) && (!$inData) && ($inBrowserData)) {//find browser, recover all informations $data = $this->addData($data, $buffer); if ((preg_match("/[.*]/", $previousBuffer)) && (preg_match('/Parent="DefaultProperties"/', $buffer))) {//we find other zone for other browser $inBrowserData = false; break; } } $previousBuffer = $buffer; //store last line } fclose($fileIni); file_put_contents($this->pathNewFile, $data); return true; } else { return false; } } private function addData($oldData, $data) { $result = $oldData . $data . "\n"; return $result; } public function getBrowscapObject() { return $this->browscapObject; } public function deleteFile() { $result1 = unlink($this->pathNewFile); //delete file $result2 = unlink($this->browscapObject->cacheDir . $this->browscapObject->cacheFilename); return $result1 && $result2; } }

@frederikbosch
Copy link
Contributor

Also had memory problems while updating cache (even with cronjob at the server). Decide to start using https://github.com/yzalis/UAParser.

@ve3
Copy link

ve3 commented Sep 4, 2015

I have same memory problem.

@frederikbosch
Copy link
Contributor

@ve3 start using yzalis/ua-parser

@asgrim
Copy link
Member

asgrim commented Sep 4, 2015

@frederikbosch @ve3 or, feel free to submit a PR to make it better :) thanks!

@asgrim
Copy link
Member

asgrim commented Sep 4, 2015

In fact, by the way the 3.x branch should be much more memory efficient, so I recommend you use this instead :) https://github.com/browscap/browscap-php/tree/3.x

@asgrim asgrim closed this as completed Sep 4, 2015
@asgrim asgrim self-assigned this Sep 4, 2015
@jaydiablo
Copy link
Contributor

The crossjoin browscap parser https://github.com/crossjoin/Browscap is A LOT more memory efficient, and actually quite a bit faster than browscap-php, as outlined in this ticket: https://github.com/crossjoin/Browscap/issues/12

And it still uses browscap data, which is more complete than what ua_parser offers, IMO.

@asgrim
Copy link
Member

asgrim commented Sep 5, 2015

@jaydiablo is that vs the 3.x branch or existing stable (2.0.5)?

@jaydiablo
Copy link
Contributor

@asgrim The benchmarks in that ticket are just against the latest 2.x version (2.0.5 at the time of that writing), however I have run the browscap tests against the 3.x branch as well, just never posted the results.

Here are some results that I just ran on my mac, all are fresh checkouts of the respective master branch.

browscap-php dev-master (2.x):

Time: 11.35 minutes, Memory: 715.25Mb

OK, but incomplete, skipped, or risky tests!
Tests: 9210, Assertions: 178223, Skipped: 2413.

browscap-php 3.x-dev:

Time: 18.14 minutes, Memory: 194.75Mb

OK, but incomplete, skipped, or risky tests!
Tests: 9210, Assertions: 178223, Skipped: 2413.

crossjoin dev-master

Time: 2.47 minutes, Memory: 163.00Mb

OK, but incomplete, skipped, or risky tests!
Tests: 9210, Assertions: 178223, Skipped: 2413.

browscap-php 3 does use substantially less memory than browscap 2 in these tests, which is good to see, but it's actually worse in speed. It's not just during the cache generation either, as you can watch the tests run and see that it takes a while on individual tests after the cache building steps.

I've also modified the useragent benchmark project a little bit to add browscap 3.x, here are the results (note that the times and memory shown here are to process the user agents in the file. The cache building is done as a separate step prior to running the benchmark). These are run on a linux virtual machine, I only note this because my mac has an SSD drive, the linux machine has magnetic.

Running Benchmark using agents in top-200-last-24.txt
Benchmarking browscap-php ... 26.142086secs 90MB
Benchmarking browscap-php-3 ... 61.493058secs 26.25MB
Benchmarking crossjoin-browscap ... 2.1639847secs 1.75MB

Running Benchmark using agents in top-1000-last-7days.txt
Benchmarking browscap-php ... 130.32173secs 35.75MB
Benchmarking browscap-php-3 ... 273.94758secs 26.5MB
Benchmarking crossjoin-browscap ... 10.996641secs 4.5MB

Running Benchmark using agents in ua-list-all.txt
Benchmarking browscap-php ... 73.313462secs 35.25MB
Benchmarking browscap-php-3 ... 88.502528secs 26.25MB
Benchmarking crossjoin-browscap ... 3.2833108secs 2.5MB

Running Benchmark using agents in ua-list-100-sample01.txt
Benchmarking browscap-php ... 14.938440secs 35.25MB
Benchmarking browscap-php-3 ... 22.302146secs 26.25MB
Benchmarking crossjoin-browscap ... 0.5970990secs 1.25MB

One of the modifications that I made to the benchmark utility was to time each useragent lookup individually and log to a CSV file (I mention this in the crossjoin ticket). This allows me to build a comparison CSV to see which useragents each library is faster or slower at. Here's that CSV for the "top-200-last-24.txt" run above: https://docs.google.com/spreadsheets/d/1mtISknvOyC7KurRVeAeNFtq7ZZpuremkjITjHDlCF7I/pub?gid=1776088907&single=true&output=csv

Certainly the memory use of Browscap 3 is looking better than 2, but at a cost for performance (as it is in its present state). The crossjoin parser excels at memory use, and speed.

I have branches of browscap/browscap to run the tests, a simple clone of these should allow you to replicate the tests in your environment:

https://github.com/jaydiablo/browscap
https://github.com/jaydiablo/browscap/tree/crossjoin
https://github.com/jaydiablo/browscap/tree/browscap-php-3

The browscap 3.x one is mostly a copy of @mimmi20 's refactor branch, but I changed the formatter that it uses. If you see something incorrect with the setup/config of browscap 3 in that branch, let me know. As far as I can tell it generates the cache files, so I assume it's using them for the useragent parsing.

I'd like to contribute back some of the changes I've made to the useragent benchmark utility, but I have to clean it up a little bit. It requires a bit of hoop jumping to get browscap-php and browscap-php 3 running at the same time (composer related) which I don't think can be committed back upstream, but a branch to replace browscap-php with browscap-php 3 could be made.

@jaydiablo
Copy link
Contributor

I looked around a bit at the 3.x code, and it does look like a lot of it is based on crossjoin, particularly the storage of the patterns/iniparts in separated cache files.

I ran a profile of 3.x with blackfire against the slowest useragent it processed in that csv I linked to above: https://blackfire.io/profiles/bdb371a4-11cb-428f-8281-66c9079ff64d/graph

Like I was seeing with crossjoin when I first started testing with it, the majority of the time is spent in preg_match, and preg_match is called a lot of times (nearly 800 in this case). I suspect you'll see the most gains in speed by looking at reducing the number of patterns that are stored in the cache files, mainly be "compressing" them (as it's called in browscap-php 2.x's source), i.e., removing the digits from the patterns and replacing with regex \d, then removing any patterns that are the same. This was the biggest win for me when I was trying to optimize crossjoin: https://github.com/crossjoin/Browscap/commit/2ee9663bd4b971c658d87f33fe780858580f7bd4 (the digit replacement in that commit is a bit overzealous, and causes tests to fail, which was fixed later https://github.com/crossjoin/Browscap/commit/afd9f8eaec3fe506ed864a93a536e2370b570922).

I'd love to help, but have spent too much time on browscap related tasks lately. We're VERY happy with crossjoin, but do dislike the idea of there being two (php) parsers out there. I'd love to see browscap 3 be the one true parser, but in order for that to happen I think the performance has to get better (at least for us).

@asgrim
Copy link
Member

asgrim commented Sep 6, 2015

@jaydiablo thanks very much for your feedback. I've created a new issue in #101 to investigate the performance of the 3.x branch.

FWIW, I don't mind there being multiple parsers, it promotes a healthy ecosystem imo :) however, we'll only "officially" support stuff on github.com/browscap (although we'll do our best for other stuff too!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests