Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 378 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 378 Bytes

parse-tokenize-html-js

Parser for html files created in javascript. Tokenizing terms, no stemming(set to lower case in comment). Heap's law for frequncy increase of new terms. Zipf's law constant. Keep number of each term in the vocabulary, print the vocabulary. Get frequency for every term.

Function "walkSync" to read all files from a directory.

Run iojs parser.js