-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make leela2onnx Conv-nodes compatible with onnx2pytorch #1924
Conversation
src/neural/onnx/builder.cc
Outdated
@@ -135,12 +135,15 @@ std::string PopulateStdNodeFields(pblczero::NodeProto* node, | |||
std::string OnnxBuilder::Conv(const std::string& name, | |||
const std::string& input_name, | |||
const OnnxConst& kernel_weights, | |||
const OnnxConst& bias_weights, int pads) { | |||
const OnnxConst& bias_weights, | |||
int filters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just do a 'kernel_weights.GetDimensions().back()` instead of passing the number of filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that's better
You can find an optional LayerNormalization replacement in borg323@bc18bec |
Thanks for that. I confirmed that all the nets I tested can now be loaded correctly by onnx2pytorch using your alternative LayerNorm-definition. Is it worth considering making this the default implementation of LayerNorm for the ONNX converter? It introduces more complexity and makes the ONNX graph more complex to look at I guess, but since it's just the same operation defined more explicitly it's nice that it just works without any extra hassle. But that might again come down to how much one wants to accomodate projects like onnx2pytorch. (a somewhat dead project, but still the first thing that most people run into when they want to convert leela-onnx nets to pytorch) I pushed some changes making that implementation the default for LayerNorm to this branch, but that can be ignored if it shouldn't be used. |
The last commit went too far, removing the |
Yeah I understand, I've reverted that and put it behind a switch instead. I've made the switch for the user explicitly for onnx2pytorch, but internally I think it's nicer that that option toggles whatever needs to be different for the converter to make networks that are compatible. (I.e. if the user sets onnx2pytorch, it internally enables alt_ln or whatever relevant behaviour that's needed. The main thing is that there shouldn't be an internal option just for onnx2pytorch/compatibility I mean) Regarding other relevant things, I'm not completely sure, but I don't think so. I was thinking if it was possible to put behaviour solving #1735 behind a similar switch, but in that case it doesn't look viable anyway. It might be nice to add explicit switches to enable the alternate implementations of mish and layer-norm since they already have options ready to go internally, but I don't know if it's it's that relevant for most people. They could maybe be added as hidden to avoid cluttering --help. |
Do you think options to change the onnx opset or the ability to generate a fp16 modes (with either fp16 or fp32 input/outputs) would be useful? |
I think it's already quite easy to convert fp32 leela-models to fp16/mixed precision in general. The few times I have needed to, the converter I used worked for every net I tried at least. (I used https://onnxruntime.ai/docs/performance/model-optimizations/float16.html) It's also supposedly quite good with how it converts certain nodes if they aren't supported in fp16 for example, meaning that the leela-onnx-converter doesn't have to maintain workarounds for it in the meantime. And it's still something that might have to be maintained in the future if it was an explicit feature. I haven't run into any cases where I've been explicitly limited by the opset of my model (i.e. "I couldn't run my model because my device/software/whatever doesn't support this opset"), but I've just been using them on up-to-date systems where I can install whatever I need, so I'm not a good reference point on that. It also seems like a burden to have to support it. But there may be some use-cases I'm not seeing, and if it's already something that has been committed to being maintained I guess it could be useful to expose that as an option. So I guess for both of them I would say that they would be nice to have, but that I think that they might not be worth maintaining as explicit features. (solely based on my own usage though) |
src/neural/onnx/converter.h
Outdated
@@ -44,6 +44,7 @@ struct WeightsToOnnxConverterOptions { | |||
int batch_size = -1; | |||
int opset = 17; | |||
bool alt_mish = false; | |||
bool alt_ln = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename it so something slightly less cryptic? (alternative_layer_normalization
or alt_layer_norm
or at the very least add a comment). And I'd say the same with the command line option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from my original code that was only meant for tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, fixed (should have considered that after using it)
…ro#1924) * Accomodating onnx2pytorch for conv-blocks * alternate onnx layernorm implementation with explicit fp32 casting * Switch for supporting onnx2pytorch Co-authored-by: borg323 <[email protected]> (cherry picked from commit ed4e669)
Adds
kernel_shape
as a property to all generated Conv-nodes for generated ONNX-networks, to partially address #1736 . This was specifically done for the onnx2pytorch-package requiring. The ONNX-standard states that this field is optional, and if not provided it should be inferred just looking at the dimensionality of the weights. These changes make some networks generated byleela2onnx
readable byonnx2pytorch
.I verified that it works with the network that comes bundled with the latest lc0-release (791556) at least. However, many of the networks fail since
onnx2pytorch
hasn't implemented layer-normalisation, (as was the case with many of the networks I tried (mostly from https://lczero.org/play/networks/bestnets/). I also considered making a PR over there implementing that, but it seems pretty dead.I'm not sure if it is correct that this project should add stuff like this to accomodate such requirements in other projects. It is practical that it works with other software that people are likely to use, especially in this case since it's quite a small change (and follows an optional part of the standard). I'm a big advocate for making the barrier as low as possible for other people to tinker with lc0-networks (and for that it's nice that conversion between pytorch and ONNX works as well as it could), but probably not at the cost of more bloat. I haven't done any large contributions, so that would be up to someone else to judge, but I wouldn't lose any sleep if this wasn't merged.