Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add enclosure support #211

Open
BlaineEXE opened this issue Sep 21, 2016 · 7 comments
Open

Feature request: Add enclosure support #211

BlaineEXE opened this issue Sep 21, 2016 · 7 comments

Comments

@BlaineEXE
Copy link

BlaineEXE commented Sep 21, 2016

Some feedback we've gotten from one of our test labs is that we are not able to monitor the status of JBODs with lsm. If we use an lsm-only approach to management, the only way to detect JBOD health issues is to view the health LED on the physical chassis.

I've already completed about 35% of the work needed to enable this feature, but I haven't been able to work on it in a while. It would be good to develop some requirements around what details need to be included in an enclosure moving forward.

Desired properties:

  • Enclosure name
  • Enclosure serial number
  • Enclosure link speed - Addendum: HPSSACLI does not give us this; does SES?
    • 6G_SAS
    • 12G_SAS
    • OTHER
    • UNKNOWN
  • Enclosure status
    • OK (healthy)
    • UNKOWN
    • FAIL (failed)
    • OTHER
  • Enclosure location
    • INTERNAL (this enclosure is inside the node, e.g., Apollo internal enclosure)
    • EXTERNAL (this enclosure is outside of the node, e.g., JBOD)
    • UNKNOWN
    • OTHER
  • SEPs exposed in the chassis
    • Many JBODs are physically/logically separated into multiple different SEPs. In HPE's case, the SEPs are indicative of drawer in the JBOD or backplane segment in an Apollo unit. We already have some scripts built around HPSSACLI that we would like to port to libstoragemgmt, but we need enclosure support including these SEPs to do so.
    • We also need to be able to determine which SEP a local disk belongs to to make the association
@cathay4t
Copy link
Contributor

@BlaineEXE I hope you mean adding more library level local APIs instead of create a plugin named as SES or expanding hpsa plugin. From my point view, these features you are requesting could be done via SES standard or at least SCSI T10 standards.

May I have early look on your 35% work? I used to have SES plugin created but dropped.

@BlaineEXE
Copy link
Author

The work I had done focused on using the HPSA plugin to get the information. Joe and I have discussed the SES method you mentioned, and it has merit, but I think a combination of both vendor-specific plugin and SES methods will be best. The vendor-specific plugin would be needed if one were to put a JBOD behind a RAID controller, for example. And the SES method would be preferable for a customer who knows that all controllers are going to be in HBA mode. I think this can be broken into two parts that can be developed simultaneously.

Part 1) Create the Enclosure data type/structure. The HPSA pluginadds data to Enclosure structures. In the plugin, SES commands can be called when appropriate from the HPSA plugin (i.e., for enclosures behind HBAs). The plugin can also give information about enclosures behind RAID mode controllers. One of the properties HPSSACLI returns about an enclosure is a SEP WWID, which we can use to determine a drive bay location with more precision.

Part 2) The SES functionality you have mentioned is still appropriate but does not add data into an Enclosure structure. For other plugins that do not return an enclosure's SEP WWID, SES functionality could deliver that information in the plugin.

There is still a lot of discussion and planning involved to make sure this is successful. It would be good to sketch out how everything should look and work.

My 35% work: https://github.com/BlaineEXE/libstoragemgmt/commits/add-enclosure-support

@cathay4t
Copy link
Contributor

Thank you.
Just as I expected:

  • Create LSM library level functions for SES standardized stuff where all plugins could use.
  • Plugin adds their variety.

I will take a look on your code and we could work out a plan then. But it has to wait a while.( I am taking PTO for next week(Oct 1-7).

@cathay4t
Copy link
Contributor

cathay4t commented Oct 14, 2016

@BlaineEXE I did a quick view on your code, it seems you are intending to create some plugin API like lsm.Client.enclosures(). all properties you are asking are in existing standards(SES and SPL), thus the feature you are requesting should be library level APIs like lsm.LocalDisk.enclosure_name_get('/dev/sda') or lsm.LocalDisk.enclosure_sn_get('/dev/sda'), instead of plugin level API.

  • Enclosure name

    PRODUCT IDENTIFICATION should do.

  • Enclosure serial number

    ENCLOSURE LOGICAL IDENTIFIER should do.

  • Enclosure link speed.

    It's in SPL-4 standard: use SCSI MODE SENSE page 0x19 Protocol Specific Port, subpage 01h Phy Control And Discover, property NEGOTIATED LOGICAL LINK RATE is the link speed.

  • Enclosure status

    SES allows us to check temperature sensor, fan speed, power supply status. We could expose them and also summarize them into a simple health status. And meanwhile, each port/device slot has many BYPASS status we could determine each port's health.

  • Enclosure location

    CONNECTOR TYPE of SAS Connector status element would do.

  • SEPs exposed in the chassis

    If you mean /dev/sgX for enclosure service. IMHO, we should simplify all actions to against /dev/sdX, and do all the dirty/heavy lookup of SEP /dev/sgX internally(currently we use SAS address).

Even playing with SES and SPL in C is fun for me, but apparently, above properties(especially enclosure status and port link speed) require numerous work, any good use case I could persuade my boss?

@cathay4t
Copy link
Contributor

@BlaineEXE To simplify things, are you seeking implementations of these APIs(take python API as an example):

 * lsm.LocalDisk.enclosure_name_get("/dev/sda")
 * lsm.LocalDisk.enclosure_sn_get("/dev/sda")
 * lsm.LocalDisk.link_speed_get("/dev/sda")
    # Return ["6G", "6G"] if dual port, else ["6G"]
 * lsm.LocalDisk.sensors_get("/dev/sda")
    # Return [("<sensor_name_location>", "80C", STATUS_TEMP_WARN), ... ]
 * lsm.LocalDisk.fans_get("/dev/sda")
    # Return [("<fan_name_location>",
                <cur_speed>, <max_speed>, STATUS_OK,), ... ]
 * lsm.LocalDisk.power_supplies_get("/dev/sda")
    # Return [("<power_supply_name_location>", 
                STATUS_OK/STATUS_ERROR), ...]

@joehandzik
Copy link
Member

@cathay4t Thanks for taking a look at Blaine's code. I agree that we should be able to get most of this information via standard methods as well as via hpssacli, the main reason why we erred on the side of starting with the hpssacli route is because it would work in RAID mode too. So, keep that in mind as we go forward with this work. I agree with you that we need LocalDisk methods on top of whatever we might put in the standard plugin interface.

As far as use cases are concerned...we work heavily in HPC right now, and JBODs are used to minimize per-node licensing costs and maximize capacity. So, many of our configurations going forward will involve multiple daisy-chained high-capacity JBODs attached to x86 servers. I think we have a shot with libstoragemgmt to provide a significantly better user experience than with the current toolchains that exist today, particularly for this HPC market (though as I've discussed before, libstoragemgmt is useful for other scale-out solutions like Ceph as well). That's how I've justified my involvement here with my bosses. :)

@cathay4t
Copy link
Contributor

@joehandzik Thanks for the inspiring use cases. I agree both plugin and library API should be able to query link speed, fan, sensors, power supply and etc.
I will try to allocate time for the library API mentioned above.

@joehandzik @BlaineEXE If any of you are working on plugin or library API mentioned above, please create an issue and assign to yourself with simple notes in order to eliminate overlap. And we should discuss the API design at early stage instead of during PR review when all patches are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants