-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yogev/enhance debugging #1
base: main
Are you sure you want to change the base?
Conversation
yogev-lb
commented
May 2, 2022
- Added some minor changes to the package in order to be able to invoke additional API calls that were not supported.
- Added a String() method that makes it easier to debug the IF_SEND/IF_RECV calls (not clean but eases the debug)
- Added IsGlobal API for exposing the isGlobal Range property.
- Fix polling mechanism since some of the devices operations timeout on the default tmo set.
pkg/core/method.go
Outdated
if len(resp) > 0 { | ||
cont = false | ||
} | ||
case <-timeoutC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yogev-lb just to understand -
the old code did have timeout (100 iterations X 10ms interval = 1s timeout). what you are saying is he default session timeout wasn't taken into account?
can you explain the purpose of this timeout (better add such explanation to the commit message)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adir-lb that is exectily what im saying
some operations - like revert for intel drives took around 5s
this caused a timeout.
the spec talks about a default timeout of 30s i think so i set it to this tmo value by default.
pkg/core/session.go
Outdated
@@ -27,6 +27,7 @@ var ( | |||
ErrPropertiesCallFailed = errors.New("the properties call returned non-zero") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yogev-lb can you elaborate in commit message why you had to move the definitions?
pkg/core/method.go
Outdated
@@ -209,6 +419,7 @@ func (m *MethodCall) Execute(c CommunicationIntf, proto drive.SecurityProtocol, | |||
return nil, ErrReceivedUnexpectedResponse | |||
} | |||
|
|||
// log.Printf("exec cmd: %s", m.String(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this comment.
pkg/core/method.go
Outdated
} | ||
_, ok4 := param.([]byte) | ||
if ok4 { | ||
//log.Printf("got []byte on params[%d]", i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't forget to remove the code in comment.
0048f9a
to
802df72
Compare
after looking today at upstream repo i see that there are many tasks that are started to be implemeneted that we implemented in our library: for example they do a big refactor in this PR: Also they add get-msid and set-sid to the library which i implemented in the wrapping sed library here: I may wait with the upstream PR after the open-source-firmware#88 is merged |
Hello there, I wrapped up the refactor PR in upstream repo. TakeOwnership will take more time and rework of sessions as well. A review would be helpful to get the changes in asap, so you can continue with your PR. (I also would be happy to see a PR of these changes in the upstream repo ;) |
Hello, me again. This implements take ownership and SetMBREnable. It would be awesome if you can provide me with some feedback. :) |
b680502
to
5eb192f
Compare
pkg/core/session.go
Outdated
@@ -237,6 +237,7 @@ func NewControlSession(d drive.DriveIntf, d0 *Level0Discovery, opts ...ControlSe | |||
// Technically we should be able to advertise 0 here and the disk should pick | |||
// for us, but that results in small values being picked in practice. | |||
rhp.MaxComPacketSize = 1024 * 1024 // 1 MiB for good measure | |||
//rhp.MaxComPacketSize = 1024 * 32 // LBM1-20784 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yogev-lb : er... that line looks thoroughly commented out. is that by design, given the commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that required? Until now the only usecase for MaxComPacketSize is when transmitting PBA image to MBR, but that doesnt effect recieving capabilities, which is handled by HostProperties ( at least that is the way I understand it)
Also check out "takeOwnership" branch in upstream repo.
Maybe it is benefitial for all of us if we can work on upstream together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChriMarMe : please see the commit message that introduces this change. we're getting sporadic errors on at least one SED model when using such huge buffers. so far, dropping it to, IIRC, pretty much anything below 1MB seemed to fix the issue for that particular model, but we're not taking any chances.
it might be argued that the problem we're seeing is peculiar to a certain server model (PCIe topology with switches) and a certain SED model, and so this might be addressed with a model-specific "quirk", but i personally am not very fond of those. we might also be able to do a bit better than hard-coding a value (any value, really, 32kB is no more or less magical than 1MB) while keeping it generic. this is our current brute-force hack. "fix it twice", and all that... ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry. I overlooked the commit message.
Do you think it could be benefitial if we create an optional command parameter which forces certain properties, e.g. MaxComPacketSize to a given value? Thinking of something like -comsize or -compacksize
edit:
-hcomsize/hcompacksize (for HostProperties)
-tcomsize/tcompacksize (for TPerProperties)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChriMarMe : manually passed opts seem sub-optimal at this point. they certainly should work, but their usability is questionable: we probably shouldn't assume that the caller of the method will know the model of the device at all times ahead of time, and have the full list of the affected models - this is essentially the codifying of "quirks" i referred to above, but pushed up the stack.
what i had in mind is something a little more automatic, like, perhaps, detecting that the TPer returns apparently bogus negotiated properties
(the returned host half), and only in these cases doing on the host side what the TPer should have done and explicitly re-negotiating properties
with values that would work for both sides.
so far we've been able to observe what looks like buggy properties
negotiation on just one SED model from just one vendor, and possibly affected by the PCIe connectivity topology at that. we'll want to have a word with the vendor first and sleep on it for a bit. then we can revisit this and propose a more sensible fix worth upstreaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read it the way that the TPer uses the Values supplied in HostProperties parameter. So responsibility to get the values correct is on the hosts side.
i don't think this is a viable interpretation. TPers (SEDs) have limited HW resources and plenty of features are optional, so i don't think the host can dictate to TPer session params outside the scope of TPer's capabilities. besides, if this was so - why even bother with reporting back the TPer properties separately?
What I understood from your former quote was: TPer has some fixed values for HostProperties which needs to be reported to the host.
i don't think that's it either. my interpretation is that the TPer has a set of capabilities (inc. max buffer sizes) that it supports. it reports those in the response SMUID.Properties.Properties
(see 5.2.2.1.2.1). they are "fixed" insofar as they're limited by the SED's HW & FW, and they don't change depending on the host that's sitting on the other side of the wire or the values that host might have specified, but they are not HostProperties
.
the response SMUID.Properties.HostProperties
(5.2.2.1.2.2), on the other hand, is what the TPer proclaims it'll use for a given set of sessions (q.v ComID). these are not fixed, and host SW passing in different HostProperties
in the request might have different corresponding values returned, at TPer's discretion (within the bounds of the capabilities of both sides, i would imagine). this corresponds to the following two passages that me and you quoted, respectively:
this portion of the method result SHALL include the communications limitations and capabilities that the TPer SHALL use for messages sent from the TPer to the Host.
the TPer SHALL respond with ALL host properties it supports, with their current values, in the HostProperties parameter of its method response
in the meantime we were able to reproduce the issue we're seeing with the SED model in question with plenty of other SEDs of the same model, on several different server chassis of different models by different vendors with a few different Linux kernel versions. so it seems to be a vendor specific issue or a SED model specific one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I got it. Thank you for taking the time to explain it to me.
Great work! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do keep in mind that i'm not one of the spec co-authors, so it's just my interpretation. empirically, it does appear to match the interpretation of most of the NVMe SED vendors we were able to lay our hands on so far - but not this one particular vendor, it seems. :)
i'll keep this thread posted once we hear back from their team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChriMarMe : after the 1st phase of root-causing the issue the vendor's engineering team seems to agree with my take on the situation (see above) and apparently a fix is planned for a future FW release. we'll see how that one pans out.
unfortunately, that will take a while and won't address drives already deployed [with legacy FW]. therefore i think i'd like to take you up on the idea of adding an option to cap MaxComPacketSize
. we're only interested in the library side of things, not the binaries. at the very least we'll need to add an option to core.NewControlSession()
(ControlSessionOpt
), something like WithMaxComPacketSize(uint)
, but i also suspect an InitializeOpt
will be required for locking.Initialize()
. AFAICT this should be backwards compatible with the existing library users, and should allow us to address the immediate problem - albeit in a somewhat heavy-handed fashion.
would such a change be acceptable for you? if so - we can whip up a corresponding separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds reasonable to introduce that. There is a real world problem and you need it fixed. Also your proposed solution follows a pattern already used and it is optional for any user.
3b22df3
to
48a8600
Compare
48a8600
to
7ba869f
Compare
5a4840d
to
c62b435
Compare
add36fb
to
79d5bc2
Compare
9aee3b2
to
ccd8c31
Compare
ccd8c31
to
14a2c6e
Compare
f810132
to
883c0bd
Compare
883c0bd
to
46f9344
Compare