Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default options when not using bin/dumpster.js #95

Open
shreyasminocha opened this issue Jul 15, 2021 · 1 comment
Open

Default options when not using bin/dumpster.js #95

shreyasminocha opened this issue Jul 15, 2021 · 1 comment
Labels

Comments

@shreyasminocha
Copy link

shreyasminocha commented Jul 15, 2021

  • disambiguation pages / redirects --skip_disambig, --skip_redirects
    by default, dumpster skips entries in the dump that aren't full-on articles, you can
let obj = {
  file: './path/enwiki-latest-pages-articles.xml.bz2',
  db: 'enwiki',
  skip_redirects: false,
  skip_disambig: false
};
dumpster(obj, () => console.log('done!'));

I'm not sure if this is unintentional or if the docs are misleading, but the default options are applied only when invoking the dumpster bin script, and not when it's imported and used in a script like in the example above. So the snippet I quoted is identical to:

let obj = {
  file: './path/enwiki-latest-pages-articles.xml.bz2',
  db: 'enwiki'
};
dumpster(obj, () => console.log('done!'));

…and skipping redirects and disambiguation pages requires an explicit skip_redirects: true, skip_disambig: true.

I'm guessing this is also true of the other default options.

@spencermountain
Copy link
Owner

spencermountain commented Jul 15, 2021

thanks @shreyasminocha yeah, you're right - the argv stuff is a mess and should be cleaned up.
don't have a free afternoon now, but will mark it as a bug. prs welcome
cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants