-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leader Election/Consensus Protocol #7
Comments
Good question, Travell. I'm not studied in the area of consensus algorithms. So, I'll have to read up on those you suggested and others in order to give you an answer. However, I'll describe quickly how it works and maybe you can give me your feedback on which you think it is most like. By default each node-discover instance that is started is given a random float "weight" value between 0 and 1. The instance will broadcast its existence to the network with a hello packet which includes its weight. Each other node on the network will see the new node and add it to its local table of nodes. The other nodes will broadcast their hello packet on their set intervals and the new node will learn about all the other nodes on the network. So, at this point every node on the connected network knows the weight of all the other nodes. If there is no master or not enough masters each node will look through its own list of known nodes and determine if itself is the highest weighted node. If so, it will promote itself to master, send a hello packet with the isMaster flag set so all other nodes know that it is now the master. All of the other nodes who were not the highest weighted node would not promote themselves and would become aware of the new master by receiving its hello packet. This system does not prevent split brain situations. And that is really what I intended. I wanted a system that was kinda loose and would just guarantee that a master would become available and any nodes that were still within broadcast range would become aware. |
I'm reading through your code now, as I'm playing around with distributed computing. What would happen in larger systems if math.random() coincidently returned the same value twice? Would they both promote themselves to master, or do you resolve this situation? Also, how might you integrate this into a cluster? |
It is not handled explicitly. And I have not tested it but, I think this is what would happen in the worst case if two nodes have the same weight:
In reality, I would guess that flapping may occur. But there is a possibility that due to timeouts, when the processes were started and network delays, one of the nodes with the same weight would promote itself before the other and announce that it is now master. The other node with the same weight would see that there is a new master and have no need to promote itself. This could be easily tested by specifying the same weight when starting multiple instances. var d = require('node-discover')({ weight : 1 })
d.on('promotion', function () {
console.log('PID %s: I was promoted', process.pid)
}); In the current version the default random weight function is actually https://github.com/wankdanker/node-discover/blob/master/lib/discover.js#L57 : Discover.weight = function () {
//default to negative, decimal now value
return -(Date.now() / Math.pow(10,String(Date.now()).length));
}; Which is exported by the module and you can overwrite it if you need a more unique weight. Hope this helps, Dan |
I just tested this using the code above I created d.js:
Then ran a bunch of instances: $ for i in `seq 1 10`; do node d.js& done;
[1] 27278
[2] 27279
[3] 27280
[4] 27281
[5] 27282
[6] 27283
[7] 27284
[8] 27285
[9] 27286
[10] 27288
PID 27282: I was promoted
PID 27284: I was promoted
PID 27284: I was demoted
PID 27282: I was demoted
PID 27278: I was promoted
PID 27288: I was promoted
PID 27288: I was demoted
PID 27278: I was demoted
PID 27283: I was promoted Then killed the master $ kill 27283
PID 27281: I was promoted
PID 27285: I was promoted
PID 27285: I was demoted
PID 27286: I was promoted
PID 27281: I was demoted
PID 27286: I was demoted
PID 27282: I was promoted
PID 27284: I was promoted
PID 27284: I was demoted
PID 27282: I was demoted
PID 27288: I was promoted
PID 27278: I was promoted
PID 27278: I was demoted
PID 27288: I was demoted
PID 27280: I was promoted
PID 27279: I was promoted
PID 27279: I was demoted
PID 27280: I was demoted
PID 27281: I was promoted Then killed the new master $ kill 27281
PID 19228: I was promoted Then killed that master $ kill 19228
PID 27284: I was promoted So, on the localhost with the same weights they definitely would promote themselves but seem to ultimately end up demoting and leaving one master. The unfortunate part is that we have promotion events and demotion events firing in rapid succession for the same node. |
Oh, to answer your other question about how to integrate this into a cluster... I have used it for creating a single head node which all nodes in the cluster connect to in order to establish communications. So, each node should have the logic to be a master or if you have specific nodes that contain the master logic, you should specify weights > 0. Master logic var d = require('node-discover')({ /* whatever */});
d.on('promotion', function () {
//create some server and listen on port 12345
d.advertise({ servicePort : 12345 });
}); Member logic var d = require('node-discover')({ /* whatever */});
var master = null;
d.on('master', function (masterNode) {
//close any existing master connections
if (master) {
master.close();
}
//connect to masterNode.address on port masterNode.advertisement.servicePort
master = createConnectionToMaster(masterNode.address, masterNode.advertisement.servicePort);
}); Or if you don't care about the whole master thing and you just want to create a mesh to all nodes in the network you could just listen to the 'added' event. var d = require('node-discover')({ /* whatever */});
var net = require('net');
var server = net.createServer(function(c) {
});
server.listen(8124);
d.on('added', function (node) {
var client = net.connect({ port: 8124, host : node.address });
}); In this case every node listens on port 8124 and every node connects to every other node on the same port. Not sure what you are looking to do. Hope this helps. Dan |
Hey Dan, Thanks for the great write up on everything. It helped me understand a lot, and I want to try your code before I give a lengthier reply. But what I'm trying to do is mesh all my nodes together as a sort of high availability computer cluster that moves data around quickly. So, this may be my overall lack of understanding of how node handles sockets under the hood - but if I had 4 cores and regardless of the process I use to spin up 4 threads, either 4x command line boots or the cluster module - won't I have a problem binding to the port? Or is that what your last bit of code does to circumvent listening on it directly by using the built in master/http server integration? I tried something a bit more primitive, but couldn't get it to work. |
Hey, If you use node's built in cluster module, that will allow multiple child processes of a single master process to bind to the same port. If you execute multiple individual processes from the command line they will not be able to bind to the same port. The example I gave above would not work on a single host. It would only work if one process was started per machine because it was statically set to use port 8124. I just put together a module called automesh, which if you read the code might help establish a picture of how to create a mesh of TCP connections. Or you can just use the module. I'm not entirely sure how useful it is but it automatically connects to and accepts connections from other node instances whether on the same computer or on remote computers. It automatically listens on a random port and advertises its random port to the other nodes. It then connects to every node that advertises a port. Dan |
Which consensus protocol do you actually use under the covers for node-discover? Which published scheme is it closest to? Paxos vs Raft for example.
Thanks,
Travell
The text was updated successfully, but these errors were encountered: