A python script to convert DMOZ content.rdf.u8.gz into a CSV file. It also includes the output CSV file generated from it.
The structure of the file is
"URL","Category 1","Category 2",..........
Example:
is in
DMOZ Categories (1-4 of 4) Business: Food and Related Products: Beverages: Coffee (1) Regional: Europe: Italy: Regions: Friuli-Venezia Giulia: Localities: Trieste: Business and Economy (1) World: Italiano: Affari: Alimentazione e Prodotti Correlati: Bevande: Caffè (1) World: Italiano: Regionale: Europa: Italia: Friuli-Venezia Giulia: Provincia di Trieste: Località: Trieste: Affari e Economia (1)
The corresponding line for it will be generated as: