Parallelization of ConstProp compilation #3042

imaihal · 2025-01-14T07:37:30Z

To accelerate compilation time, this PR parallelizes compilation of ConstProp using parallelFor() API in MLIR. Basically this improve constant propagation for reduction computation.

Signed-off-by: Haruki Imai <[email protected]>

sorenlassen

LGTM

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.hpp

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.cpp

AlexandreEichenberger · 2025-01-23T19:41:30Z

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.cpp

      }
-    }
+    };
+    parallelFor(ctx, 0, ctx->getNumThreads(), work);
  });
 }


I also assume that the work up there assumes that there are batch.size() reductions that can all be done in parallel.

Since we have for quantization "whole tensor" quantization, we have cases where we have only 1 reduction.
That can also be done in parallel. Say you have 1000 elements and 10 threads. Each thread process its own 100 numbers, and save its result in its location in an array of 10 partial sum. Then after the parallel region, just reduce these 10 values sequentially. You will still get a near 10x speedup.

Also, should we check if that if the batch.size is small, we may want to do things sequentially? It would probably be good in case we have a few very small tensors. You can easily print out the sizes on stderr for a few benchmarks and see if you have such cases.

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.hpp

AlexandreEichenberger · 2025-01-23T19:46:05Z

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.hpp

+        for (WideNum &n : batch)
+          n = fun(n);
+      };
+      parallelFor(ctx, 0, ctx->getNumThreads(), work);


As mentioned before, please check that there is enough work to go to parallel computations. I suspect that if the reduction is very small, then we really want to do it sequentially and it will be faster.

Signed-off-by: Haruki Imai <[email protected]>

AlexandreEichenberger · 2025-01-24T19:58:59Z

@imaihal please ping me when you have implemented the changes, I will then review it again. Thanks for working on accelerating the compiler, much appreciated.

If you know of other opportunities that are not exploited yet, maybe you can add a "todo" in the code or in the description of this PR so that we don't lose such opportunities.

Signed-off-by: Haruki Imai <[email protected]>

AlexandreEichenberger

All the changes looks good to me now.

Do you want to test if there are cases where there is very little work and the code would go in sequential mode? Maybe report here if you have seen this in benchmarks?

So I approved the PR, would be good to know if we should consider doing some of the work in sequential mode (as this PR could help 90% of the cases, but for small one hurt perf, so while it may look in general good, we would still let performance on the table by not doing the small case sequentially).

imaihal · 2025-01-28T00:46:56Z

@AlexandreEichenberger

Do you want to test if there are cases where there is very little work and the code would go in sequential mode? Maybe report here if you have seen this in benchmarks?

Yes. I will test and add a threshold (maybe if loop length is small, it should run in sequential mode)
Also, I will add test code you suggested in another PR.

AlexandreEichenberger

LGTM, thanks for adding the new algo for selecting the batch bounds.

Will you add a lit test to make sure parallel version of the algo works?

Signed-off-by: Haruki Imai <[email protected]>

imaihal · 2025-02-04T07:57:19Z

@AlexandreEichenberger

Will you add a lit test to make sure parallel version of the algo works?

I added threshold to avoid parallelization in small input. Also I added lit tests. Thanks!

imaihal added 4 commits January 14, 2025 02:28

Parallelize compilation of ConstProp.

eb4a777

Signed-off-by: Haruki Imai <[email protected]>

Add tentative implementation for parallelizing inner loop.

74291c5

Signed-off-by: Haruki Imai <[email protected]>

Remove tentative implementation.

2056cfb

Signed-off-by: Haruki Imai <[email protected]>

Merge branch 'main' into constprop_parallel

96f98ce

imaihal marked this pull request as ready for review January 17, 2025 06:29

Merge branch 'main' into constprop_parallel

a476c36

imaihal requested review from sorenlassen, tungld, AlexandreEichenberger and chentong319 January 22, 2025 04:50

sorenlassen approved these changes Jan 23, 2025

View reviewed changes

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.hpp Outdated Show resolved Hide resolved

src/Dialect/ONNX/ElementsAttr/ElementsAttrBuilder.hpp Outdated Show resolved Hide resolved

AlexandreEichenberger reviewed Jan 23, 2025

View reviewed changes

imaihal added 3 commits January 24, 2025 01:43

Remove an argument for MLIRcontext in functionTransform().

adf6ca9

Signed-off-by: Haruki Imai <[email protected]>

Merge branch 'main' into constprop_parallel

abf14f9

Restore necessary MLIRContext.

4ba7d33

Signed-off-by: Haruki Imai <[email protected]>

Remove mutex.

f42244d

Signed-off-by: Haruki Imai <[email protected]>

imaihal mentioned this pull request Jan 27, 2025

Option to set the number of threads for parallel compilation #3048

Merged

AlexandreEichenberger approved these changes Jan 27, 2025

View reviewed changes

Merge branch 'main' into constprop_parallel

c0a39cc

AlexandreEichenberger approved these changes Jan 28, 2025

View reviewed changes

imaihal added 4 commits February 3, 2025 22:23

Add a threshold to run in parallel.

ed3cc97

Signed-off-by: Haruki Imai <[email protected]>

Merge branch 'main' into constprop_parallel

c96556c

Update threshold.

43dc668

Signed-off-by: Haruki Imai <[email protected]>

Add lit test.

76095b9

Signed-off-by: Haruki Imai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization of ConstProp compilation #3042

Parallelization of ConstProp compilation #3042

imaihal commented Jan 14, 2025 •

edited

Loading

sorenlassen left a comment

AlexandreEichenberger Jan 23, 2025

AlexandreEichenberger Jan 23, 2025

AlexandreEichenberger commented Jan 24, 2025

AlexandreEichenberger left a comment

imaihal commented Jan 28, 2025

AlexandreEichenberger left a comment

imaihal commented Feb 4, 2025

Parallelization of ConstProp compilation #3042

Are you sure you want to change the base?

Parallelization of ConstProp compilation #3042

Conversation

imaihal commented Jan 14, 2025 • edited Loading

sorenlassen left a comment

Choose a reason for hiding this comment

AlexandreEichenberger Jan 23, 2025

Choose a reason for hiding this comment

AlexandreEichenberger Jan 23, 2025

Choose a reason for hiding this comment

AlexandreEichenberger commented Jan 24, 2025

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

imaihal commented Jan 28, 2025

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

imaihal commented Feb 4, 2025

imaihal commented Jan 14, 2025 •

edited

Loading