-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data pipeline failed without retry #154
Comments
Yeah the write path is the least resilient part of the library right now. I need to take a deeper look to understand exactly how the Java writer handles lost data nodes, and then figure out how I could write a test for something like that. If you're not already, I'd currently recommend retrying the whole write from scratch if you still have access to all the data you are trying to write. |
Based on a quick look, it seems like the logic is generally:
So for example with replication of 3, a new DataNode would only get added if two DataNodes fail during writing that block. The first one is simpler and I could look into trying to add that, though still not exactly sure how I would test it. That should make things more resilient than the current behavior which is simply failing on the first DataNode error. The second is a bit more complex and requires copying the already written data to a new replica before continuing, and probably wouldn't be something I could tackle anytime soon. |
Thanks for your quick reply, I found another error status when appending file. Please see this: And I think the first one solution is good for my append case. @Kimahriman |
Yes, it looks unstable especially in a busy hdfs cluster. If you have any improvement patch, i’m happy to test this. Almostly, i could reproduce it by my cases. |
This would be when a data node the block is replicating to fails. The connection drop would be when the one you are talking to dies, so I think they're effectively the same issue |
After simply looking this article https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-2/, I think the bad data node replacement will not trigger the original data resend. Once client found the bad data node, it will build the new pipeline through the namenode, and then resend the data start from last acked packet. @Kimahriman |
Very helpful article! I hoped that might be the case for node replacement but then found https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1550 while tracing through the code. The data is re-replicated by the client on node replacement. It can only keep going from where it left off if the node isn't replaced. |
Thanks for your code reference. Let me take a deep look. |
From digging the DataStreamer.java, the replicated transfer is from the healthy datanode -> replacement datanodes. The client only trigger the recovery pipeline creating to back fill the data from healthy node to replacement node. BTW, if ignore the bad nodes and do nothing, the missing blocks will exist. So If we want to have a resilient writing process, the datanode replacement may need to be supported |
Yeah looks like it just makes an RPC call to trigger the replication, just a little bit of extra complexity on there. I think that would still be follow on work to just getting the write to continue with fewer DataNodes on failures.
My understanding is even if a single node (or multiple but not all) in a pipeline are lost, if you successfully write the block to at least one DataNode, HDFS will re-replicate it after the fact as it will see it is under replicated. I think the main reason for replicating as part of the pipeline recovery is just to make that a little more resilient. i.e. if in the end all but one DataNodes fail in your pipeline, and you "successfully" finish your write, but then that one DataNode immediately dies, your write was successful but now you are missing that block. |
Got it. If so, the simply ignore bad data node in the pipeline is acceptable. We only just need to recognize the bad node from the ack response’s reply and flag and then ignore this. |
Hmm that's very odd, might be unrelated to writing data. What version of HDFS are you running? Also if you can replicate those having some debug logs would be helpful to see what's going on. |
HDFS with 3.2.2 version.
It's OK, I will attach it if I having more detailed logs from the namenode or datanodes. |
I think I have an initial version of at least handling the failures if you want to test it out in your environment: #164 |
Yes, I will test this in the later 3 days. Thanks for fixing this. |
When using the hdfs-native crate, I encountered the data pipeline failed. If this happened by the problem datanode, do we need to request a new block location?
Could you help look this problem? @Kimahriman
The text was updated successfully, but these errors were encountered: