forked from cupslab/neural_network_cracking
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
807 lines (601 loc) · 28.2 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
Paper
-----
Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks.
W. Melicher, Blase Ur, Sean M. Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, Lorrie Faith Cranor. USENIX Security 2016.
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/melicher
TODO
----
- Update code to newer versions of keras and Theano.
- Refactor code to split into separate files.
- Change to YAML for configuration files to support comments and reduce the
number of configuration writing errors.
- Remove support for things that are no longer used or keras no longer
supports (e.g., bidirectional models, JSZ1).
- Make live demo of JavaScript guesser.
- Improve testing on the JavaScript guesser.
- Improve documentation about versions and check compatibility with previous
versions of keras and Theano.
- Improve state of saving data in the intermediate_sqlite file. Its easy to end
up with data in the intermediate files that doesn't match the training data
and leads to obscure and sometimes silent errors.
- Improve performance for enumerating guesses.
Bugs
----
This is software used and maintained by students for a research project and
likely will have many bugs and issues.
Setup using Docker
------------------
Make sure you have installed the NVIDIA driver (https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#how-do-i-install-the-nvidia-driver) and Docker (https://docs.docker.com/install). For GPU support, additionally install nvidia-docker (https://github.com/NVIDIA/nvidia-docker).
Build a CPU-only container and start an interactive bash session within it:
./deploy.py build-cpu
./deploy.py run-cpu
Build a GPU-supported container and start an interactive bash session within it:
./deploy.py build-gpu
./deploy.py run-gpu
Note: You may need to specify python3 when executing python scripts within the Docker container, e.g. `python3 pwd_guess_unit.py`.
Setup (Manual)
--------------
Requirements:
+ python - Version 3.4.2 was used during development. Should work with any
version of python3
+ python packages:
- theano - Theano requires the version from github instead of the version on
pip.
https://github.com/Theano/Theano. To setup the GPU, make sure that you
read the documentation. Make the .theanorc file in your home directory
with this:
[cuda]
root = /usr/local/cuda
[global]
device = gpu # change this to be gpu# if necessary
floatX = float32
warn_float64 = ignore
Using the GPU will require that you have nvidia drivers installed and
CUDA.
Make sure that gcc is compatible with nvcc. At the time of writing, gcc
version 4.9 is required. You can check this by executing:
`which gcc` --version
Theano 0.7.0-0.8.2 was used during development.
If using 0.8.2, you may need to add the following lines to you .theanorc
due to https://github.com/Theano/Theano/pull/4369. If you don't you might
get errors like "WARNING (theano.sandbox.cuda): CUDA is installed, but
device gpu is not available (error: cuda unavailable)":
[nvcc]
flags = -D_FORCE_INLINES
- keras - at time of writing, the version on pip is not current and will
cause model saving to fail. Use the version from github instead
(https://github.com/fchollet/keras). Version 0.2.0 was the main keras
version during development. However, during development, keras changed to
version 0.3.1. Some commits support on version or the other. It is
currently a todo item to improve the state of keras support. On my
machine the current commit's tests pass with Keras commit
1e58b895236f6a80f5e07de74af25f16d9cc4625.
- scikit-learn - pip install scikit-learn
- sqlitedict - pip install sqlitedict
- numpy
- cython
Compiling:
python3 setup.py build_ext --inplace
Set up:
- Cuda must be in path and library path. Add these two lines to your .bashrc
file:
export PATH="$PATH":/usr/local/cuda-7.5/bin
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH":/usr/local/cuda-7.5/lib64
Tests
-----
Run automated tests by:
python pwd_guess_unit.py
Running all tests takes roughly 15 minutes on my machine. It may take more
depending on the GPU you are using.
or to run only specific tests:
python -m unittest pwd_guess_unit.<specific unit test>
Help
----
python3 pwd_guess.py --help
usage: pwd_guess.py [-h] [--pwd-file PWD_FILE [PWD_FILE ...]]
[--arch-file ARCH_FILE] [--weight-file WEIGHT_FILE]
[--pwd-format {trie,tsv,list,im_trie} [{trie,tsv,list,im_trie} ...]]
[--enumerate-ofile ENUMERATE_OFILE] [--retrain]
[--config CONFIG] [--args ARGS] [--profile PROFILE]
[--log-file LOG_FILE]
[--log-level {debug,info,warning,error}] [--version]
[--pre-processing-only] [--stats-only]
[--config-args CONFIG_ARGS]
[--forked {guesser,random_walker}]
[--calc-probability-only] [--train-secondary-only]
Neural Network with passwords. This program uses a neural network to guess
passwords. This happens in two phases, training and enumeration. Either --pwd-
file or --enumerate-ofile are required. --pwd-file will give a password file
as training data. --enumerate-ofile will guess passwords based on an existing
model. Version <version number>
optional arguments:
-h, --help show this help message and exit
--pwd-file PWD_FILE [PWD_FILE ...]
Input file name.
--arch-file ARCH_FILE
Output file for the model architecture.
--weight-file WEIGHT_FILE
Output file for the weights of the model.
--pwd-format {trie,tsv,list,im_trie} [{trie,tsv,list,im_trie} ...]
Format of pwd-file input. "list" format is onepassword
per line. "tsv" format is tab separated values: first
column is the password, second is the frequency in
floating hex. "trie" is a custom binary format created
by another step of this tool.
--enumerate-ofile ENUMERATE_OFILE
Enumerate guesses output file
--retrain Instead of training a new model, begin training the
model in the weight-file and arch-file arguments.
--config CONFIG Config file in json.
--args ARGS Argument file in json.
--profile PROFILE Profile execution and save to the given file.
--log-file LOG_FILE
--log-level {debug,info,warning,error}
--version Print version number and exit
--pre-processing-only
Only perform the preprocessing step.
--stats-only Quit after reading in passwords and saving stats.
--config-args CONFIG_ARGS
File with both configuration and arguments.
--forked {guesser,random_walker}
Internal use only.
--calc-probability-only
Only output password probabilities
--train-secondary-only
Only train on secondary data.
Pretrained Network Usage
------------------------
Enumerating passwords
Edit guess_len8_config.json to replace "g1_len8.tsv" in the "enumerate_ofile"
key with the output file you would like.
If you want to guess more passwords, you should change the value of
"lower_probability_threshold" to something lower, e.g. 1e-8.
Passwords are not sorted, so if you want in order guessing, then sort the
output file by descending probability:
sort -gr -k2 -t$'\t' [OUTPUT_FILE] -o [SORTED_OUTPUT_FILE]
Monte Carlo Simulation
Edit guess_len8_config.json to replace "g1_len8.tsv" in the "enumerate_ofile"
key with the output file you would like. Edit "<input_file>" in the
"password_test_fname" key to set the password input file. This file should
point to a line-delimited password file where each line is one password.
Command:
python3 <path_to_root>/pwd_guess.py --config-args <config_file.json>
e.g.:
python3 ../pwd_guess.py --config-args guess_len8_config.json
Version
-------
python pwd_guess.py --version
Output format
-------------
delamico_random_walk - This output format performs a monte-carlo estimation of
the guess number, the strength of a password. The output file is a TSV where
each line has 7 fields: the password, the probability of that password, the
estimated output guess number (the strength of the password), the std deviation
of the randomized trial for this password (in units of number of guess), the
number of measurements for this password, the estimated confidence interval for
the guess number (in units of number of guesses).
human - This output format enumerates guesses and stores the list of passwords
guessed to the output file. The guesses are not in order of probability. The
otuput file is a TSV with each line having two fields: the password, and the
probability. You can sort the passwords by probability using the unix sort
command.
calculator - This output format calculates the exact number of guesses for a
test set of passwords by enumerating guesses. The output file is a TSV with 3
fields: the password, the probability for that password, and the guess number.
generate_random - This output format generates random passwords and stores them to
disk. The output is a TSV with 2 fields: the random password and its probability.
Config files
------------
Configuration information for guessing and training. Can be read from a file
in json format.
# Files Configuration Options:
intermediate_fname - File name to store intermediate information about
processing relative to the current directory. A value of ':memory:' will
store all values in memory. Default is ':memory:'. This is necessary if
enumeration and training happen at different times.
Neural network Model Configuration:
char_bag - alphabet of characters over which to guess. By default this includes
all keyboard keys (e.g., alphanumeric characters and some symbols).
model_type - type of model. Should be LSTM or GRU or JZS{1,2,3} (JZS1,2,3 are
only supported in earlier versions of the Keras library).
hidden_size - Size of each layer hidden recurrent layer.
dense_layers - Number of additional dense layers.
dense_hidden_size - Size of dense layer.
layers - Number of hidden layers.
max_len - Maximum length of any password in training data. This can be
larger than all passwords in the data and the network may output guesses
that are this many characters long.
min_len - Minimum length of any password that will be guessed.
model_optimizer - Model optimizer. Default is 'adam'. Read about optimzer
values from the Keras documentation: http://keras.io/optimizers/.
context_length - Number of context characters to use. Lower means less time to
train, more could potentially increase accuracy.
generations - More generations means it takes longer but is more accurate.
Default is 20.
dropouts - Use neural network drop out weights. If true, can prevent
overfitting.
dropout_ratio - Ratio of dropouts.
train_backwards - If true, train on passwords backwards: e.g., guessing d from
'rowssap' instead of guessing d from 'passwor'.
bidirectional_rnn - Only supported for some versions of Keras. If true, then
use a Bidirectional version of the neural network model.
deep_model - If true, then train a deeper NN model. Set this to true if you
use more than one layer in the 'layers' argument.
padding_character - If true, then use a padding character. This should
generally be false, but is included for backward compatibility. Models trained
before version 275 include a padding character.
# Training Configuration Options:
freq_format - can be 'hex' or 'decimal'. This defines the format of frequency
integers in the training sets. Only applicable when using TSV format for
input.
secondary_training - If true, use a secondary training set after the primary
training set.
secondary_train_sets - Json dictionary in this format:
"secondary_train_sets" : {
"pwd_file" : [
"<pwd_file>"
],
"pwd_format" : [
"list"
]
}
pwd_file is a list of files. pwd_format is a list of formats corresponding
to each file. Accepts the same options as the --pwd-format argument.
freeze_feature_layers_during_secondary_training - If true, then during
secondary training, the feature layers will be frozen. This is useful for
avoiding overfitting to the secondary training set, especially if the
secondary training set is significantly smaller than the primary set.
secondary_training_save_freqs - If true, then use the secondary training set
for post-processing frequencies instead of the primary set.
training_chunk - Smaller training chunk means less memory consumed on
the GPU. Larger value training chunk means more GPU memory consumed. Ideally,
this value would be as large as possible without running out of memory on the
GPU. Potentially, there is a possibility that large values also have lower
quality training but I have not observed this to happen in practice.
chunk_print_interval - Interval over which to print info to the log.
train_test_ratio - Ratio of training data to holdout testing data. A value of
20 means using one out of every 20 passwords for holdout testing. These
passwords are only used to print accuracy statistics in the log data and for
early-quit statistics. The logged accuracy statistics are only for diagnostic
and debugging purposes and should not be used in a real test. To perform a
real test, you should not give any test-passwords during training.
training_accuracy_threshold - If the accuracy is not improving by this
amount each generation, then quit. Set to -1 to never quit early.
rare_character_optimization - Default false. If you specify a list of
characters to treat as rare, then it will model those characters with a
rare character. This will increase performance at the expense of accuracy.
rare_character_lowest_threshold - Default 20. The characters with the lowest
frequency in the training data will be modeled as special characters. This
number indicates how many to drop. A value of 20 means treating the 20 least
frequent characters in the training set as rare characters.
uppercase_character_optimization - Default false. If true, uppercase
characters will be treated the same as lower case characters. Uppercase
characters will be predicted via post-processing output according to the
frequency of uppercase characters in the training data.
no_end_word_cache - When rare_character_optimization or
upper_case_character_optimization is used, it uses different post-processing
percents for the first and last character. If no_end_word_cache is true, then
only the first character has different post-processing values. The intuition
for this is that uppercase characters are likely more probable as the first
character and special characters more likely as the last character.
simulated_frequency_optimization - Default false. Only for TSV files. If set
to true, then multiple instances of the same password are simulated. This
can improve performance at the expense of accuracy.
save_always - Boolean. Default true. If false, then only the networks which
perform best on verification data will be saved to disk.
save_model_versioned - Boolean. When saving the model, save each generation of
the model using a different file name. You can use this to measure the effect
of more generations on models. The first generation is saved as
<model_file>.1, the second generation is saved in the file <model_file>.2,
where <model_file> is the model file name given in the arguments.
randomize_training_order - If true, will randomize the passwords training
order.
compute_stats - Compute pre-processing step and exit without training a neural
network.
tokenize_words - If true, create a tokenized model.
most_common_token_count - If tokenize_words is true, then this is the number of
tokens to simulate. E.g., 2000 will simulate the most common 2000 tokens in
the training set.
# Guessing Configuration Options:
lower_probability_threshold - This controls how many passwords to output
during generation. Lower threshold means more passwords. A value of 1e-7 will
output all passwords with probability above 1e-7.
relevel_not_matching_passwords - If true, then passwords that do not match the
filter policy will have their probability equal to zero and that probability
will be redistributed to other passwords. Recommended true.
guess_serialization_method - Default is 'human' which enumerates all passwords
above the lower_probability_threshold cutoff. 'delamico_random_walk' means
calculate password guess numbers using Monte Carlo simulations.
'generate_random' means generate random passwords. 'calculator' enumerates
all passwords, but does not save the enumerated passwords to disk; instead it
calculates the guess number of the test set of passwords.
parallel_guessing - Boolean. If true, then use multiple cores to generate
passwords.
fork_length - The prefix length to fork on when parallel_guessing is true. If
this value is 2, then prefixes of length 2 will be assigned to different
cores. For example, one core will generate passwords that start with 'aa',
another with 'ab', etc.
guesser_intermediate_directory - Directory to store intermediate files used in
parallel guessing.
cleanup_guesser_files - If true, then delete files in the
guesser_intermediate_directory after completion.
password_test_fname - File name containing test passwords. Each password should
be on one line.
chunk_size_guesser - Number of passwords to send to the GPU in one chunk. More
increases performance but could run out GPU of memory.
max_gpu_prediction_size - Maximum number of password fragments to send to the
GPU in one chunk. More increases performance but could run out GPU of memory.
gpu_fork_bias - Ratio to decrease the chunk size when using multiple processes.
Parallel guessing takes up more fixed memory on the GPU so can lead to
running out of GPU memory more easily. This value controls how much to
decrease memory by when forking.
cpu_limit - Number of processes to fork when using parallel guessing.
tokenize_guessing - If true, and if tokenize_words is true, then perform
tokenization during guessing.
probability_striation - If non-zero, then instead of enumerating probabilities
for specific passwords, instead enumerate the guess numbers at certain
probability cutoffs. This is useful for exporting a pre-computation of
probability to guess number mapping.
prob_striation_step - If probability_striation is true, then it will calculate
guess numbers for 10^(j * prob_striation_step) for j in
1..probability_striation. So for example, for prob_striation_step = 1 and
probability_striation = 10, it would calculate the guess number at the
followoing probabilities: 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8,
1e-9, 1e-10.
enforced_policy - Will not generate guesses that do not match the policy.
Currently supported policies are:
'complex' - requires 8 characters and 4 classes.
'basic' - no requirements
'1class8' - requires 8 characters
'basic_long' - requires 16 characters
'complex_lowercase' - requires 8 characters and 3 character classes
insensitive to case.
'complex_long' - requires 16 characters and 3 character classes
'complex_long_lowercase' - requires 16 characters and 2 character classes
insensitive to case.
'semi_complex' - requires 12 characters and 3 character classes
'semi_complex_lowercase' - requires 12 characters and 2 character classes
insensitive to case.
'3class12' - Same as semi_complex
'2class12_all_lowercase' - Same as semi_complex_lowercase
'one_uppercase' - Requires at least one uppercase character
*_lowercase policies mean that they are insensitve to case and case is
ignored. These are useful when preparing a train set using the
policyfilterer.py utility, but not useful for training or guessing with a
neural network.
# Monte Carlo Methods Configuration Options:
random_walk_seed_num - Number of passwords to keep in main memory in one chunk.
More increases memory requirements.
random_walk_confidence_bound_z_value - confidence bound coefficeint. This
should be correspond to the coefficient for a confidence interval. E.g., 95%
means a value of 1.96, 99% means a value of 2.58
[https://en.wikipedia.org/wiki/Confidence_interval]. Default is 1.96.
random_walk_confidence_percent - Confidence percent for the random_walk
guesser. A value of 5 will mean that the simulation will continue until all
passwords have confidence interval less than 5% of the estimated guess
number.
random_walk_upper_bound - Upper bound on the number of rounds to continue
simulation.
pwd_list_weights - Weighting to give different training sets. This should be a
json dictionary mapping file names to a ratio:
"pwd_list_weights" : {
"file1" : 1,
"file2" : 2
}
This will weight passwords in file1 as being twice as important as file2.
# Deprecated Configuration Options related to Trie preprocessing. Don't use these:
trie_serializer_encoding - default is 'utf8'.
trie_serializer_type - 'reg' or 'fuzzy'.
trie_implementation - Trie implementation. 'trie' for custom
implementation. None for no trie optimization.
trie_fname - File name for storing trie.
trie_intermediate_storage - File for storing intermediate trie.
preprocess_trie_on_disk
preprocess_trie_on_disk_buff_size
toc_chunk_size
use_mmap
fuzzy_training_smoothing
scheduled_sampling
final_schedule_ratio
Example Configuration File
--------------------------
You can also see the pre_built_networks/ directory for examples of
configuration files. Here are some starting configuration files that you should
modify to suit your needs.
Combined arguments and configuration file for generic training.
{
"args" : {
"arch_file" : "arch.json",
"weight_file" : "weight.h5",
"log_file" : "train_log.txt",
"pwd_file" : [
"[INPUT_FILE]"
],
"pwd_format" : [
"list"
]
},
"config" : {
"training_chunk" : 1000,
"training_main_memory_chunk": 10000000,
"min_len" : 8,
"max_len" : 30,
"context_length" : 10,
"chunk_print_interval" : 100,
"layers" : 2,
"hidden_size" : 1000,
"generations" : 5,
"training_accuracy_threshold" : -1,
"train_test_ratio" : 20,
"model_type" : "LSTM",
"train_backwards" : true,
"dense_layers" : 1,
"dense_hidden_size" : 512,
"secondary_training" : true,
"secondary_train_sets" : {
"pwd_file" : [
"[SECONDARY_INPUT_OPTIONAL]"
],
"pwd_format" : [
"list"
]
},
"simulated_frequency_optimization" : false,
"randomize_training_order" : true,
"uppercase_character_optimization" : true,
"rare_character_optimization" : true,
"rare_character_optimization_guessing" : true,
"parallel_guessing" : false,
"chunk_size_guesser" : 40000,
"random_walk_seed_num" : 100000,
"max_gpu_prediction_size" : 10000,
"random_walk_seed_iterations" : 1,
"no_end_word_cache" : true,
"intermediate_fname" : "intermediate_data.sqlite",
"save_model_versioned" : true
}
}
Example config of enumerating passwords:
{
"args" : {
"arch_file" : "arch.json",
"weight_file" : "nn_len8.h5",
"log_file" : "guess_log.txt",
"enumerate_ofile" : "g1_enumerate.tsv"
},
"config" : {
"training_chunk" : 10000,
"min_len" : 8,
"max_len" : 30,
"context_length" : 10,
"chunk_print_interval" : 100,
"layers" : 2,
"hidden_size" : 1000,
"model_type" : "JZS2",
"simulated_frequency_optimization" : true,
"intermediate_fname" : "intermediate_data.sqlite",
"randomize_training_order" : true,
"uppercase_character_optimization" : true,
"rare_character_optimization" : true,
"rare_character_optimization_guessing" : true,
"parallel_guessing" : false,
"lower_probability_threshold" : 1e-6,
"padding_character" : true,
"chunk_size_guesser" : 20000,
"guess_serialization_method" : "human",
"random_walk_seed_num" : 100000,
"max_gpu_prediction_size" : 20000,
"random_walk_seed_iterations" : 1,
"no_end_word_cache" : true
}
}
Combined arguments and configuration file for guessing using Monte Carlo
simulations:
{
"args" : {
"arch_file" : "arch.json",
"weight_file" : "all_trained.h5.3",
"log_file" : "guess_log.txt",
"enumerate_ofile": "g3_long.tsv"
},
"config" : {
"training_chunk" : 1000,
"training_main_memory_chunk": 10000000,
"min_len" : 16,
"max_len" : 30,
"context_length" : 10,
"chunk_print_interval" : 100,
"layers" : 2,
"hidden_size" : 1000,
"generations" : 3,
"training_accuracy_threshold" : -1,
"train_test_ratio" : 20,
"model_type" : "JZS2",
"tokenize_words" : false,
"most_common_token_count" : 2000,
"bidirectional_rnn" : false,
"train_backwards" : true,
"dense_layers" : 1,
"dense_hidden_size" : 512,
"secondary_training" : true,
"secondary_train_sets" : {
"pwd_file" : [
"../leaks/all_combined_long_v2.txt"
],
"pwd_format" : [
"list"
]
},
"simulated_frequency_optimization" : false,
"randomize_training_order" : true,
"uppercase_character_optimization" : true,
"rare_character_optimization" : true,
"rare_character_optimization_guessing" : true,
"parallel_guessing" : false,
"lower_probability_threshold" : 1e-7,
"chunk_size_guesser" : 40000,
"guess_serialization_method" : "delamico_random_walk",
"password_test_fname" : "../leaks/basic16.txt",
"random_walk_seed_num" : 100000,
"max_gpu_prediction_size" : 10000,
"random_walk_seed_iterations" : 50,
"no_end_word_cache" : true,
"intermediate_fname" : "intermediate_data.sqlite",
"save_model_versioned" : true
}
}
Example guessing configuration for a complex policy.
{
"args" : {
"arch_file" : "arch.json",
"weight_file" : "all_trained_cmplx.h5.3",
"log_file" : "guess_log.txt",
"enumerate_ofile": "g1_complex.tsv"
},
"config" : {
"training_chunk" : 1000,
"training_main_memory_chunk": 10000000,
"min_len" : 8,
"max_len" : 30,
"context_length" : 10,
"chunk_print_interval" : 100,
"layers" : 2,
"hidden_size" : 1000,
"generations" : 3,
"training_accuracy_threshold" : -1,
"train_test_ratio" : 20,
"model_type" : "JZS2",
"tokenize_words" : false,
"most_common_token_count" : 2000,
"enforced_policy" : "complex",
"bidirectional_rnn" : false,
"train_backwards" : true,
"dense_layers" : 1,
"dense_hidden_size" : 512,
"secondary_training" : true,
"secondary_train_sets" : {
"pwd_file" : [
"../leaks/all_combined_long_v2.txt"
],
"pwd_format" : [
"list"
]
},
"simulated_frequency_optimization" : false,
"randomize_training_order" : true,
"uppercase_character_optimization" : true,
"rare_character_optimization" : true,
"rare_character_optimization_guessing" : true,
"parallel_guessing" : false,
"lower_probability_threshold" : 1e-7,
"chunk_size_guesser" : 40000,
"guess_serialization_method" : "delamico_random_walk",
"password_test_fname" : "../leaks/complex/andrew8.txt",
"random_walk_seed_num" : 100000,
"max_gpu_prediction_size" : 10000,
"random_walk_seed_iterations" : 1,
"no_end_word_cache" : true,
"intermediate_fname" : "intermediate_data.sqlite",
"save_model_versioned" : true
}
}