Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address performance #42

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Address performance #42

wants to merge 6 commits into from

Conversation

BinaryMuse
Copy link
Owner

@BinaryMuse BinaryMuse commented Sep 1, 2017

As per #39, the library is fairly slow relative to similar libraries, especially in certain cases (e.g. the example showed in #39 is much slower than example.toml from the repo). This PR investigates root causes and works to address performance concerns.


Parsing or compiling

A comment in #39 asserts that compile() is the likely culprit; I agreed with this assumption, but decided to get some data anyway. I added a new _trace method that exposes timing data (and allows tracing — more on that later) and updated benchmark.js, and the results surprised me:

$ node benchmark.js
1000 iterations in 795ms
Time spent parsing:   749.8927939999987
Time spent compiling: 40.83443699999995

The most time is spent in peg.js, not in the compile function. The next step is to figure out why.


I added a tracer that would allow us to count how many times we enter each rule, how many times they match and fail, and the total amount of time spent trying the rules. Enabling tracing of course makes the parser slower; for comparison, here are the overall numbers:

1000 iterations in 2328ms
Time spent parsing:   2280.3740720000014
Time spent compiling: 45.199218999999964

And here's the raw trace data for this run, in CSV form (note times are in nanoseconds):

rule,type,count,total_time,avg_time
S,fail,152000,61339832,403.5515263157895
NL,fail,243000,98460839,405.1886378600823
EOF,fail,199000,64893925,326.1001256281407
NL,match,38000,15620130,411.05605263157895
comment,match,6000,334324468,55720.744666666666
expression,match,25000,1875612377,75024.49508
comment,fail,67000,25905795,386.6536567164179
line,match,25000,2221123845,88844.9538
path,fail,18000,10091163,560.6201666666667
tablearray,fail,18000,8149281,452.73783333333336
ASCII_BASIC,match,151000,101708749,673.5678741721854
ASCII_BASIC,fail,26000,19563772,752.4527692307693
key,match,31000,242246425,7814.400806451613
S,match,61000,32693500,535.9590163934427
double_quoted_multiline_string,fail,26000,14505472,557.9027692307692
ESCAPED_UNICODE,fail,161000,96547988,599.6769440993788
ESCAPED,fail,161000,332875110,2067.5472670807453
string_char,match,155000,455304619,2937.4491548387095
string_char,fail,11000,27703518,2518.5016363636364
double_quoted_single_line_string,match,11000,593870126,53988.19327272727
string,match,11000,619686221,56335.111
value,match,34000,1558000666,45823.549
assignment,match,17000,1421121382,83595.3754117647
double_quoted_single_line_string,fail,22000,10579619,480.8917727272727
single_quoted_single_line_string,fail,22000,11374817,517.0371363636364
quoted_key,fail,7000,24112349,3444.621285714286
dot_ended_table_key_part,fail,6000,123840781,20640.130166666666
table_key_part,match,6000,15072855,2512.1425
table_key,match,6000,186158327,31026.387833333334
path,match,6000,207829408,34638.23466666666
ESCAPED,match,5000,4630821,926.1642
single_quoted_multiline_string,fail,15000,7461219,497.4146
string,fail,15000,78617350,5241.156666666667
DIGIT_OR_UNDER,match,79000,28021804,354.70637974683547
date,match,1000,10764087,10764.087
secfragment,fail,1000,1571971,1571.971
time,match,1000,12218111,12218.111
datetime,match,1000,32743222,32743.222
DIGIT_OR_UNDER,fail,41000,14395885,351.1191463414634
date,fail,28000,42063856,1502.2805714285714
datetime,fail,14000,62805718,4486.122714285714
DIGITS,fail,7000,8625475,1232.2107142857142
float_text,fail,24000,53581331,2232.5554583333333
integer_text,fail,10000,8108193,810.8193
float,fail,12000,115447341,9620.61175
integer,fail,5000,7127397,1425.4794
boolean,fail,4000,3279191,819.79775
array_sep,match,15000,14248918,949.9278666666667
array_sep,fail,43000,57201133,1330.2589069767441
DIGITS,match,11000,39934717,3630.4288181818183
integer_text,match,14000,27292622,1949.473
integer,match,7000,13897515,1985.3592857142858
array_value,match,8000,211874580,26484.3225
array_value_list,match,10000,72709609,7270.9609
array_value_list,fail,8000,195018327,24377.290875
array,match,4000,529926373,132481.59325
float_text,match,4000,20738874,5184.7185
float,match,2000,26062533,13031.2665
boolean,match,1000,1816956,1816.956
dot_ended_table_key_part,match,2000,25373939,12686.9695
key,fail,1000,2220697,2220.697
assignment,fail,1000,10090976,10090.976
expression,fail,1000,14375103,14375.103
line,fail,1000,17899968,17899.968
start,match,1000,2259406061,2259406.061

I imagine analyzing this data will yield some opportunities for performance enhancements.

@felix9
Copy link
Contributor

felix9 commented Sep 1, 2017

I was looking at the specific case mentioned in #39, which is a toml file with 1000 [[tablearray]] clauses. For that, most of the time is in compile, specifically in addTableArray. It looks to me like compile does a number of things that don't scale well when the input gets large.

My instinct is the parse part is not any different from toml-j0.4, but I haven't measured that yet.

@BinaryMuse
Copy link
Owner Author

Good point; it's likely compile is responsible for the extreme cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants