-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SP compute() const, thread safe #560
base: master
Are you sure you want to change the base?
Conversation
and obtrusive for const compute()
still not 100% thread safe, but semantic const
replaced by Cereal, probably forgotten
and SP calculateOverlap_()
remove SP::calculateOverlaps_() which is replaced by this return value.
@marty1885 FYI, this is my take on c++17 TS parallel. Do you have any advice on atomic, thread safe data-structures? (Ive worked a bit with facebook/folly) to avoid need of locks? |
@breznak The main reason I made all Some guidelines:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review if you are interested.
- don't take too seriously the SP.compute() const
- but the other changes I'd like to implement, as this is nicer, safer, faster.
src/htm/algorithms/Connections.hpp
Outdated
|
||
void computeActivity(std::vector<SynapseIdx> &numActiveConnectedSynapsesForSegment, | ||
const std::vector<CellIdx> &activePresynapticCells); | ||
const std::vector<CellIdx> &activePresynapticCells) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connections::computeActivity can be made const. With the "fake" (mutable) that for timeseries_
the curent/previousUpdates are mutable & not locked (I may use mutex, or just let it be for now)
@@ -176,18 +176,6 @@ void SpatialPooler::setBoostStrength(Real boostStrength) { | |||
boostStrength_ = boostStrength; | |||
} | |||
|
|||
UInt SpatialPooler::getIterationNum() const { return iterationNum_; } | |||
|
|||
void SpatialPooler::setIterationNum(UInt iterationNum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SP removed iteration
stuff. not really needed. And connections will have the iteration from #537
@@ -462,16 +441,18 @@ void SpatialPooler::initialize( | |||
} | |||
|
|||
|
|||
void SpatialPooler::compute(const SDR &input, const bool learn, SDR &active) { | |||
vector<SynapseIdx> SpatialPooler::compute(const SDR &input, const bool learn, SDR &active) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP compute() is const.
But that will not be feasible. (adaptSegment -> conn.updateSynapsePermanence)
@@ -934,7 +901,7 @@ void SpatialPooler::inhibitColumnsLocal_(const vector<Real> &overlaps, | |||
|
|||
|
|||
bool SpatialPooler::isUpdateRound_() const { | |||
return (iterationNum_ % updatePeriod_) == 0; | |||
return (rng_.getReal64() < 1.0/updatePeriod_); //approx every updatePeriod steps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one place where iterationNum_
was needed, I have replaced with probability, which I think is nicer approach.
// calls to RNG | ||
std::mt19937 gen; //Standard mersenne_twister_engine 64bit seeded with seed_ | ||
mutable std::mt19937 gen; //Standard mersenne_twister_engine 64bit seeded with seed_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all Random
methods can be made const (thread safe), with the exception in mutable here. Result, will work OK, but for effective parallel algorithm, we likely won't have deterministic builds. So when finished, I imagine we run parallel by default, but when stuff breaks, we switch to single thread and deterministic operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marty1885 thank you for your advice. What do you think of the mutable in Random here? Shouldn't make too much damage if some collision messed the state.
but it results in the non-deterministic builds. (Currently we support set seed and deterministic operation for the whole HTM, which is cool for some usages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll say that using a shared RNG across threads is not a good idea. Besides making things nondeterministic. It also leads to false sharing and a tremendous amount of cache invalidation. (Even though I do share a RNG across threads in Etaler)
One solution is to have a master RNG, and then you use the master RNG to seed light weight RNGs local to each thread. This will make your results deterministic as long as you have the same # of thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: you can have a master RNG, then use the master RNG to seed a secondary RNG local to each function. Then use the secondary to seed thread-local RNGs. This way you can have deterministic results no matter the amount of threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PRs that prepare for this functionality, like #552 , I'm removing dependency on local variables / state. That is a good thing overall. With the suggested approach as above, that is "passing all that is needed as parameters" I need to balance convenience and usability of the public API (for single threaded use), I don't want users who want to call SDR result = SP.compute(input)
to need to pass 10s parameters (Random, ...).
So I'll need to figure some compromise. Maybe function overloads for user,parallel? compute(simple, params)
x compute(many,needed,for,parallel)
Also whole SP.compute is quite bad idea to parallelize, rather some inner methods: inhibition, overlaps,...
will keep in mind
this makes the code more complex, I'm doing it only for the speed. |
Follow up,
Thread-count managementHow do you deal with some heuristic on when to parallelize?
In real world it'll be something inbetween, say newest AMD Zen CPUs with 32threads. If I have HTM with SP-TM-SP-TM. I want to give 32/4 threads to each object.
|
Regarding to managing the amount of threads. And the problem of to too much threads. My solution in Etaler (not implemented) is to have a NUMA aware backend. For example. A 32 thread ThreadRipper is made of 4 NUMA nodes. And the InfinityFabric link is the main bottleneck. So just make your backend run on one of the NUMA modes. And spawn multiple of them. |
Thank you for this link! I assume Parallel TS is TBB, so I can use this. |
No, Parallel TS is not TBB. They do look similar, but they are not the same thing. |
oh, I'm aiming for the c++17 standard. Seems the whole |
OT It looks like 3+ years, but I like where c++ is heading with concurrency, parallelism and heterogenous computing : It looks like eventually we'll be able to run on GPUs and even FPGAs. |
My experiences with FPGA tells me to not expect that to work well. FPGA is difficult to optimize and takes forever to compile (> 6Hr compilation time for a mid-range FPGA and a 10K compiler license fee) The GPU part sounds interesting! |
This PR will benefit from boosting cleanup #544 , is part of parallelization effort #214 and will make PR parallel MNIST usable #559 .
The concept is to make SP thread-safe, suitable for parallel execution. This is done by making
SP.compute() const
. Eventually that is not feasible, asSP.adaptSegments()
callsConn.updateSynapsePermanence()
which cannot beconst
. But it serves as proof of concepts and shows what can be const, and what needs to be synced.Note that parallelizing SP.compute() like this won't be useful, only in MNIST #559 , in the end we want parallelized SP.compute() - the inside of it. Likely
inhibition
.Takeaways:
const
and pass needed arguments. Less/none state of the object (this.numColumns_). Connections work like this.SP.adaptSegment
will not be const -> needs lockatomic
, thread-safe data structures. Eg.Folly
?