Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex rules do not appear to match Adblock Rules #11

Open
essandess opened this issue Jun 3, 2017 · 11 comments
Open

Regex rules do not appear to match Adblock Rules #11

essandess opened this issue Jun 3, 2017 · 11 comments

Comments

@essandess
Copy link

Is this correct?

Using the example &ad_box_ from the the easylist block, Privoxy shows no matches:

http://config.privoxy.org/show-url-info?url=http%3A%2F%2Ftest.com%2F%26ad_box_

The exact string works, but I believe this should be a substring.

@faxotherapy
Copy link

Refer to this website for more info.

@essandess
Copy link
Author

Thanks! Does your fork https://github.com/faxotherapy/privoxy-adblock have the fixes described in your post?

@faxotherapy
Copy link

faxotherapy commented Jun 4, 2017

No, it doesn't. Just retrieve the archive that sits at the bottom of this post and you'll be good.

@essandess
Copy link
Author

Thanks again. Those look like useful mods. Do you mind if I ask a few follow-up questions?

  1. Do you know which AdBlock formats are and are not handled by the regex's in this repo and your archive?
  2. Have you tackled corrections to the Privoxy .filter files created by privacy-adblock.sh? For example, the entry corresponding to &ad_box_ doesn't appear correct:
s@<([a-zA-Z0-9]+)\s+.*id=.?ad_box.*>.*</\1>@@g
  1. I'm interested in translating the EasyList blocks into an efficient(ish) Javascript proxy.pac filtering file, which would allow for HTTPS blocks without setting up Privoxy as a SSL intercept. Are you aware of existing code for this?

Using a proxy.pac in conjunction with Privoxy would cover the board for applications that use TLS versus HTTP and automatic proxy configurations versus not.

@faxotherapy
Copy link

  1. I've never looked at the page you mentioned, but I found it interesting. Thanks for bringing this to my attention. Apparently, the path part in the original conversion script is correct “in the middle”, not at the beginning of a path pattern to block. Just like you mentioned above for &ad_box_; the add-on I offer only turns it to /.*&ad_id. With regard to domain blocking, the add-on removes the backslashes.
  2. I did not tackle the filter part. So, I guess I can leave it to you… 😀Correcting the syntax for both the host and the path side exhausted me.
  3. No, I'm not aware. Not sure whether it's possible to use an auto config file at the same time Privoxy is used. Isn't it either one or the other?

As Privoxy doesn't see the path side of a HTTPS request—it can only block the host part of a HTTPS request—I use ProxHTTPSProxy to allow Privoxy see it.

@essandess
Copy link
Author

Yes, you can use proxy.pac as a front-end filter before passing to the proxy—see osxfortress for an example.

This works very well, and avoids the pitfalls of an SSL intercepting proxy.

It is necessary to write the regex rules efficiently as a DFA in JavaScript.

All these various AdBlock translation issues look like something that bison should cover if anyone ever wrote out the AdBlock grammar. Are you aware of this?

That would be a lot easier than writing a zillion one-off regex's for every scenario.

@faxotherapy
Copy link

All these various AdBlock translation issues look like something that bison should cover if anyone ever wrote out the AdBlock grammar. Are you aware of this?

No, I'm not.

Certainly, based on what I learnt from your input, there's definitely another approach, much simpler than the overcomplicated translation from the add-on.

@essandess
Copy link
Author

I've created an EasyList to filtering proxy.pac file in the repo easylist-pac-privoxy.

It's quite efficient and runs on mobile devices. Comments welcome.

@essandess
Copy link
Author

essandess commented Jul 19, 2017

The forked repo adblock2privoxy, is a full-feature implementation of EasyList rules in Privoxy, complete with element hiding.

This is a clever, efficient, and effective implementation: Privoxy handles the domain and path blocking rules using efficient and correct regular expressions, and Privoxy inserts targeted EasyList-based CSS files to handle element hiding.

In addition to Privoxy, this approach requires a simple (nginx) web server for the CSS files. I've posted an example nginx .conf configuration file that hosts these on the LAN over port 8119.

I'd encourage anyone interested in achieving network layer EasyList tracker and ad blocking to take a look at this approach.

@Redback812
Copy link

It be great if this could run from a router with DD-WRT O/S, which is a nix in itself, since DD-WRT has privoxy running on the system, all this work would need is a slight modification. If the system internal flash is to small , then a USB 3 stick could do the job.

@essandess
Copy link
Author

It would be straightforward to get adblock2privoxy on DD-WRT.

You'd simply need to copy over the privoxy configuration and CSS files, and
must install nginx on DD-WRT, or use its native (lighttpd?) webserver to serve the CSS files that are used for element blocks.

Regular updates could be rsync'd to the router.

Here's an updating daemon example for macOS: com.github.essandess.adblock2privoxy.plist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants