-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcl-clock rewritten in c #2
Conversation
…out `-gmt 1`, because the return value would be -3600): % clock scan "01.01.1970" -locale current -format %x time value too large/small to represent
…" the execution limited by fixed time (in milliseconds) instead of repetition count (more precise results, to prevent very long execution time it is no more necessary to estimate repetition count)
… to 5000 now), because otherwise sporadically stutters on some platforms on very fast iterations (<= 1µs/#)
performance tests included;
Current performance increase:
Additionally it is few CPU and memory hungry.
Because many rules are system-wide (across interpreter), it spares resources also, e. g. don't wash out the cpu-cache. |
Known small incompatibilities:
% # FreeScan : relative date with ordinal month (I said January)
% clock scan "5 years 18 months 385 days next 1 January" -base 0 -gmt 1
-Fri Jul 21 02:00:00 CEST 1978
+Sat Jan 21 01:00:00 CET 1978
% # FreeScan : relative date with ordinal month and relative weekday (I said Fri in January)
% clock scan "5 years 18 months 385 days next January Fri" -base 0 -gmt 1
-Sat Jul 22 02:00:00 CEST 1978
+Fri Jan 27 01:00:00 CET 1978
Side effects:
|
…ance counters actualized in calibration thread in UpdateTimeEachSecond; This entails that sometimes sporadically time-drifts resp. jump-esque time-shifts occurred, what for example produces very confusing results during time measurement. [unix] wrong cast fixed in TclpGetWideClicks: multiplication with 1000000 in long int may cause overflow
…00000; more precise threshold handling after NativeGetTime fix.
I've found by testing that we had a bug in time routines (sporadically time-drifts in performance counters). So fixed now and the results of performance test-cases are updated above (because they were blurred by the error). |
I've made a (threaded) build of newest binaries for Windows (if someone needs it, please find enclosed both links). tcl8.7 win(x86) clock speedup - trunk-rewrite-clock-in-c.zip After unzip of archive, just start
|
FYI, building on linux doesn't seem to work:
|
@aidanhs Thx for the testing!
I've fixed it now and extended with many new test cases to cover this "regexp" based things... (see new test cases clock-6.22.11 - clock-6.22.20). Although such "artificial" and very irrational "date-time" formats are theoretically imaginable, but I don't think, that it makes sense at all in the practice (or just to be 100% backwards compatible?).
So which would be correct Better for the explanation is the usage of almost the same output format
So Because of greedy regex with mandatory space in-between (and optional before of each token) it is extremely NFA (or even DFA) engine dependent (I've already seen both variants). So what. I've it ready now. Costs from 0.04µs to 0.07µs per iteration more in the performance test cases. And one sleepless night. :) I can additionally implement a fallback to something using regular expressions (or small pre-compiled NFA/DFA or even more complex gramar rules in bison or yacc lexer). But I think ATM it would be breaking a fly on the wheel. |
…-6.22.12), involving space count in look ahead and end distance calculation (because spaces are optional in date-time string as well as in scanning format).
…re fixed (see test cases clock-6.22.11 - clock-6.22.20), additionally involving look ahead token of known type into pre-search process.
…used vars, functions etc; types normalization;
5674e9c
to
d91a0cd
Compare
Building on *nix fixed and should work now (tested on debian jessie) |
[UPD] This comment is obsolete since calibrated timerate (newer test-script version) after merge with new timerate from #3. Please note among the comparison the tcl-own overhead that consumes 0.1µs for executing of the byte code resp. the measurement overhead, explained below (does nothing, just "executes" an empty code scope):
Thus the time of 0.1µs/# in comparison to 6.14µs/# is almost neglectable, but in comparison to 0.68µs/# So all performance test-cases above, that have the running time less than 1µs per iteration, are in reality more (at least 10%) faster. |
…rite-clock-in-c; + minor fixes after merge.
…ce of the parasitical execution-overhead by extremely fast execution times
e79a484
to
4def0a2
Compare
Because of several fixes and improving of measurement procedure, I've actualized measure results (see above in head of the PR) and the win-binaries (see #2 (comment)). |
- static used in non-static inline function; - x64 int cast on pointer [-Wpointer-to-int-cast]; - (obscure) may be used uninitialized in this function [-Wmaybe-uninitialized]; - TclEnvEpoch initialized and declared extern;
…rate' into sb/trunk-rewrite-clock-in-c
…n time (overhead considered)
242ab9e
to
1b69a0d
Compare
…" (and freescan)... test-performance.tcl: test cases extended to cover "clock add"
I've added a new commit with "clock add" rewritten in C. Below you'll find a new snippet of the test-cases covered "clock add" functionality... |
Rebased to fossil-repository / RFE-ticket - http://core.tcl.tk/tcl/info/ddc948cff9781daa. Further development will be done in TCT fossil-repository. |
As regards flightaware/Tcl-bounties#21 (comment) (ensemble overhead) the newest clock-command becomes still faster. |
This is an artificial PR, corresponds with FlightAware's Tcl-bounties program, issue
flightaware/Tcl-bounties#4
Current performance increase:
clock format
- 10-15 times faster;clock scan -format
- 30-60 times (some previously extremely slow scans up to 100 times faster);clock scan
(freescan) - 15-20 times;clock add
- 30-90 times;My current result of performance test-cases attached bellow, (running threaded, warmed up, 4 x threads working, 4 core i5-2400 @ 3.10 GHz).
normal system-load (without parasitic load):
clock-test-perf-new.txt
clock-test-perf-org.txt
Diff: clock-test-perf.diff.txt
with parasitic load:
clock-test-perf-mth4x-new.txt
clock-test-perf-mth4x-org.txt
Diff: clock-test-perf-mth4x.diff.txt
Here is an excerpt from diff of total-blocks (normal system-load):
I've made a (threaded) build of newest binaries for Windows, if someone needs it, please see #2 (comment).