-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large overhead for drawing plots with lots of data points #94
Comments
I started looking at this because I was worried it was Typed Racket-induced contract overhead, but that doesn't seem to be the case. One thing I did see is that |
I think you are correct. This problem might indeed be limited just to the lines renderer itself. If using the #lang typed/racket
(require plot)
(: p (Listof (List Real Real)))
(define p (build-list 1000000 (lambda ([x : Index]) (list x (random)))))
(time
(parameterize ([plot-decorations? #f])
(plot-bitmap (points p #:color 6)))) |
Free association, based on history, probably worthless.
When I read your table of performance numbers, I am reminded of an experiment we ran in the late 90s. For quite some time we though, an algorithm looked “practically linear” even though the algorithm was supposed to be in O(n^3). 10, 100, 1000 lines — it scaled nicely. Eventually we discovered that the constant on n^3 was small but not small enough to make it go away. The cubic element kicked in at over 3000 or 4000 lines (my memory isn’t a 100% here).
So you might have that or Sam’s O(n) suggestions might “multiply” at some point.
|
I spend some time investigating this issue and found the following:
It looks to me that, for large number of points, calling For reference, this is the instrumentation I did: plot-lib/plot/private/common/draw.rkt | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/plot-lib/plot/private/common/draw.rkt b/plot-lib/plot/private/common/draw.rkt
index a072a5b..a13cf72 100644
--- a/plot-lib/plot/private/common/draw.rkt
+++ b/plot-lib/plot/private/common/draw.rkt
@@ -429,12 +429,19 @@
(: draw-lines/pen-style (-> (Instance DC<%>) (Listof (Pair Real Real)) Plot-Pen-Style-Sym Void))
(define (draw-lines/pen-style dc vs style-sym)
+ (printf "*** ~a style-sym: ~a~%" (current-inexact-milliseconds) style-sym)
(cond [(or (empty? vs) (eq? style-sym 'transparent)) (void)]
[else
- (let ([vs (map cons-fl vs)])
- (cond [(eq? style-sym 'solid) (send dc draw-lines vs)]
+ (let ([vs1 (map cons-fl vs)])
+ (printf "*** ~a cons-fl done~%" (current-inexact-milliseconds))
+ (cond [(eq? style-sym 'solid)
+ (define start (current-inexact-milliseconds))
+ (send dc draw-lines vs)
+ (define end (current-inexact-milliseconds))
+ (printf "*** ~a draw-lines done took ~a ms~%"
+ end (- end start))]
[else
(define pen (send dc get-pen))
(define scale (max 1.0 (fl (send pen get-width))))
(define sty (scale-pen-style (symbol->style style-sym) scale))
- (draw-lines* dc vs sty)]))]))
+ (draw-lines* dc vs1 sty)]))])) And this is the program I ran: #lang typed/racket
(require plot)
(: points (Listof (List Real Real)))
(define points (build-list 100000 (lambda ([x : Index]) (list x (random)))))
(time
(parameterize ([plot-decorations? #f])
(plot-bitmap (lines points #:color 6)))) |
I also converted to Typed Racket the program that uses |
I believe that Typed Racket program has no contract boundaries that aren't there always in plot -- ie, I don't think there's anything Typed Racket about your example. To see if Typed Racket is the problem, I recommend the contract profiler -- if it shows a non-trivial percentage of time in contracts, particularly contracts that look to be types, then that's likely a problem. I think when I tried this before for my earlier comment that didn't happen. |
Hi @samth , I am confused about
The lines starting with "***" are the printfs I added to the plot code. The contract profiler now shows that 95% of the time is spent in contracts, but the contract it flags is at the "entry" into the plot package (the "lines" renderer), which does not make much sense to me.... |
Looking at the implementation of
|
My guess is that the contract cost is in the result contract, not shown in the print out, which is Renderer2d. That's an object which will impose cost on every method access. |
One other thing, @alex-hhh, I think you meant to use |
I'm now much more confused. I tested with the following program:
This should have no Typed Racket-related contract boundaries -- all of the listed plot modules as written in Typed Racket. It has the same slow performance that @alex-hhh reports. However, when I put a Furthermore, |
Removing the contract inside racket/draw doesn't remove the wrapper-object so I'm still confused. |
This is from one of my earlier attempts to determine the step that takes a long time -- the last transformation, |
Further investigation: I put some printouts inside
The wrapper objects are created with the following blame objects:
However, I can't seem to find the relevant contracts. I also tried simply removing the use of Conclusions so far:
|
Further investigation: the mysterious contract is applied as part of this line: https://github.com/racket/plot/blob/master/plot-lib/plot/private/no-gui/plot2d.rkt#L108 |
At @rfindler's suggestion, I investigated @alex-hhh's hypothesis that calling Here's the timings I get, for slightly modified versions of Alex's program in the original report. They're almost identical between typed and untyped used of
The code I ran is at https://gist.github.com/samth/385182bcd0c1bc26795aa7e58a81f80b. |
Hi @samth , I mentioned in a previous comment (#94 (comment)) that using draw-lines from TR does not have a performance problem and I was also puzzled by that.... |
One additional note -- I tried using |
One more piece of (not useful yet) investigation -- this code, which tries to be more like what
|
I tried For what is worth, I also tried using |
@alex-hhh I believe that the pict construction is just saving a closure that then runs the expensive code, which then happens when the bitmap is draw. If the problem is indeed a contract, then it's happening when either the bitmap or the bitmap-dc is created in A contracted version of that object is created when constructing the new My current conclusions are:
One path forward is to try to construct a test program that is slow in the way plot is, by removing parts of the plot library until we get something smaller. |
Here's another possibility. Below are the traces (from Plot
direct
|
If I'm reading correctly that smoothing is one of the differences, that could be significant. |
Smoothing seems to be about a 4x slowdown on many individual |
To answer the question (that I now see) about why |
Having tested this in plot, if you remove the line I'm not sure whether smoothing is needed for plot, but that seems to be the source of the overhead that this issue discusses. |
Hi @samth and @mflatt , thanks for finding out the problem. I suspect smoothed drawing is slower because it involves reading back from the drawing buffer. When I did my testing, I only looked at the output for record dc for plot, and smoothing didn't jump out to me as a possible problem... I will think about how to address this problem, as I don't think that simply disabling smoothing is a good idea -- there is a visible quality difference between a smoothed and non-smoothed plots (although not in the example showed in this issue). |
This issue has been mentioned on Racket Discussions. There might be relevant details there: https://racket.discourse.group/t/why-is-plot-slower-with-unicode-symbols-in-ticks/1611/10 |
A private email from a person who noticed poor performance of the plot package when plotting millions of data points, prompted me to run some tests to see what the overhead of the actual
plot
package is on top ofracket/draw
(which is used for all the drawing). The result is that it is a lot.There are two programs below (the second one has two variants):
lines
do-manual-plot
will draw the plot using thedraw-lines
dc<%>
method, and do all plotting calculations itselfdo-manual-plot/individual-lines
will draw the plot using onedraw-line
call for each line segment, also doing all plotting calculations itself.Here is the performance of the programs
For large number of points, the time of the
plot
version becomes too large very quickly, much more than the time increase fordraw-lines
anddraw-line
versions. Note thatplot
ultimately uses the same draw package, so the plot package overhead is probably the plot time minus the draw-lines time.I also found that, for large number of points, a single
draw-lines
call is much slower than multipledraw-line
calls, one for each line -- this is somewhat counterintuitive, but I could not find an explanation for this...The text was updated successfully, but these errors were encountered: