Skip to content
This repository has been archived by the owner on Jun 9, 2022. It is now read-only.

Remove boilerplate text and meta data from scraped sources #19

Open
ricjhill opened this issue Dec 22, 2020 · 5 comments
Open

Remove boilerplate text and meta data from scraped sources #19

ricjhill opened this issue Dec 22, 2020 · 5 comments

Comments

@ricjhill
Copy link
Collaborator

ricjhill commented Dec 22, 2020

The scraped text has some part which are not useful content for analysing the patterns of language. Should these be removed? For example

135 of 313
Labels
Share this...FacebookTwitter

Read the complete document: www.carbon-sense.com/wp-content/uploads/2008/05/alexander-2008.pdf [PDF, 266KB].

  	__ATA.cmd.push(function() {
  		__ATA.initDynamicSlot({
  			id: 'atatags-1460517861-5fc7e51d7aed8',
  			location: 120,
  			formFactor: '001',
  			label: {
  				text: 'Advertisements',
  			},
  			creative: {
  				reportAd: {
  					text: 'Report this ad',
  				},
  				privacySettings: {
  					text: 'Privacy settings',
  				}
  			}
  		});
  	});
  Share this:PrintEmailTwitterFacebookPinterestLinkedInRedditLike this:Like Loading...
@ricjhill
Copy link
Collaborator Author

ricjhill commented Dec 22, 2020

> <!--
> google_ad_client = "ca-pub-3545577860068042";
> /* neu test */
> google_ad_slot = "6412247007";
> google_ad_width = 200;
> google_ad_height = 200;
> //-->

@ricjhill
Copy link
Collaborator Author

  jQuery(document).ready(function(){
  	jQuery('#dd_43fe30d37a49f1713b8a3a44662e0bc2').on('change', function() {
  	  jQuery('#amount_43fe30d37a49f1713b8a3a44662e0bc2').val(this.value);
  	});
  });

@ricjhill
Copy link
Collaborator Author

__ATA.cmd.push(function() {
__ATA.initDynamicSlot({
id: 'atatags-1460517861-5fc7ea6b2c1e4',
location: 120,
formFactor: '001',
label: {
text: 'Advertisements',
},
creative: {
reportAd: {
text: 'Report this ad',
},
privacySettings: {
text: 'Privacy settings',
}
}
});
});
Share this:PrintEmailTwitterFacebookPinterestLinkedInRedditLike this:Like Loading...

@ricjhill
Copy link
Collaborator Author

__ATA.cmd.push(function() {
__ATA.initDynamicSlot({
id: 'atatags-1460517861-5fc7ea4e14bff',
location: 120,
formFactor: '001',
label: {
text: 'Advertisements',
},
creative: {
reportAd: {
text: 'Report this ad',
},
privacySettings: {
text: 'Privacy settings',
}
}
});
});
Share this:PrintEmailTwitterFacebookPinterestLinkedInRedditLike this:Like Loading...

@ricjhill
Copy link
Collaborator Author

  jQuery(document).ready(function(){
  	jQuery('#dd_16efd3924a8804ec558ac63db78e3d5e').on('change', function() {
  	  jQuery('#amount_16efd3924a8804ec558ac63db78e3d5e').val(this.value);
  	});
  });

Donate - choose an amount5101520501002505001000
Share this...FacebookTwitter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant