Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module hiden with SitePackage.lua is unfindable #735

Closed
wpoely86 opened this issue Nov 21, 2024 · 20 comments
Closed

Module hiden with SitePackage.lua is unfindable #735

wpoely86 opened this issue Nov 21, 2024 · 20 comments

Comments

@wpoely86
Copy link
Contributor

With the Lmod version in EPEL (Lmod-8.7.53-1.el8.x86_64) we are having issues with modules hidden in the SitePackage.lua. The logic can be found here: https://github.com/vub-hpc/Lmod-config/blob/main/SitePackage.lua#L239-L246

This works well until we updated to 8.7.53. Now the module is not visible at all. Neither ml --show_hidden spider or ml --show_hidden av show it. Module that load it as a dependency fail to load. Removing the lines in our SitePackage.lua fixes the issue (but the module is no longer hidden).

In attachment I've added two files:

  • output-broken.txt.gz: This is the output of ml -D --ignore-cache --show_hidden spider JupyterHub.
  • output-unhidden.txt.gz. This is the same output but with the line of JupyterHub removed in the SitePackage.lua.
@rtmclay
Copy link
Member

rtmclay commented Nov 22, 2024

I do not have a fix yet but I can reproduce this issue. I'll keep you updated on progress on this issue.

@rtmclay
Copy link
Member

rtmclay commented Nov 22, 2024

I am unclear on what you mean by:

Module that load it as a dependency fail to load.

How are you loading this dependency? Is with the version or not?

@wpoely86
Copy link
Contributor Author

How are you loading this dependency? Is with the version or not?

As a version locked dependency (it are all EasyBuild generate module files):

depends_on("JupyterHub/4.1.5-GCCcore-12.3.0")

rtmclay pushed a commit that referenced this issue Nov 25, 2024
@rtmclay
Copy link
Member

rtmclay commented Nov 25, 2024

I was able to reproduce both issues (ml -A av and the depends_on()). This has been fixed for me on the IS690-hide branch. Please test this branch to see if it works for you.

@wpoely86
Copy link
Contributor Author

It gives me:

/usr/bin/lua: /usr/share/lmod/lmod/libexec/MRC.lua:535: attempt to index a boolean value (local 'resultT')
stack traceback:
	/usr/share/lmod/lmod/libexec/MRC.lua:535: in upvalue 'l_findHiddenState'
	/usr/share/lmod/lmod/libexec/MRC.lua:716: in function 'MRC.isVisible'
	/usr/share/lmod/lmod/libexec/Spider.lua:635: in local 'l_buildDbT_helper'
	/usr/share/lmod/lmod/libexec/Spider.lua:669: in function 'Spider.buildDbT'
	/usr/share/lmod/lmod/libexec/Cache.lua:631: in function 'Cache.build'
	/usr/share/lmod/lmod/libexec/cmdfuncs.lua:1082: in function 'SpiderCmd'
	/usr/share/lmod/lmod/libexec/lmod:523: in function 'main'
	/usr/share/lmod/lmod/libexec/lmod:594: in main chunk
	[C]: in ?

@rtmclay
Copy link
Member

rtmclay commented Nov 25, 2024

Thanks for running this. Can you give me an example module tree that reproduces this?

@wpoely86
Copy link
Contributor Author

An empty one works:

$ mkdir /tmp/a
$ export MODULEPATH=/tmp/a
$ ml av
/usr/bin/lua: /usr/share/lmod/lmod/libexec/MRC.lua:535: attempt to index a boolean value (local 'resultT')
stack traceback:
	/usr/share/lmod/lmod/libexec/MRC.lua:535: in upvalue 'l_findHiddenState'
	/usr/share/lmod/lmod/libexec/MRC.lua:716: in function 'MRC.isVisible'
	/usr/share/lmod/lmod/libexec/Spider.lua:635: in local 'l_buildDbT_helper'
	/usr/share/lmod/lmod/libexec/Spider.lua:669: in function 'Spider.buildDbT'
	/usr/share/lmod/lmod/libexec/Cache.lua:631: in function 'Cache.build'
	/usr/share/lmod/lmod/libexec/ModuleA.lua:677: in function 'ModuleA.singleton'
	/usr/share/lmod/lmod/libexec/Hub.lua:1253: in function 'Hub.avail'
	/usr/share/lmod/lmod/libexec/cmdfuncs.lua:145: in function 'Avail'
	/usr/share/lmod/lmod/libexec/lmod:523: in function 'main'
	/usr/share/lmod/lmod/libexec/lmod:594: in main chunk
	[C]: in ?

All of our config stuff can be found in https://github.com/vub-hpc/Lmod-config

@rtmclay
Copy link
Member

rtmclay commented Nov 25, 2024

This is what I get:

% mkdir /tmp/a
% clearMT
% export MODULEPATH=/tmp/a
% ml av       
No module(s) or extension(s) found!
...

I didn't see anything in your Lmod-config variable setting that would make a difference. What happens if you do?

% ml -I av

Also please include ml --config as an attached file.

@wpoely86
Copy link
Contributor Author

Hmm, using ml --ignore-cache av indeed works. It's something in the cache that breaks it.

@wpoely86
Copy link
Contributor Author

It's a cache thing. If I add --ignore-cache, the error is gone.

@rtmclay
Copy link
Member

rtmclay commented Nov 26, 2024

Please attach your cache file from your earlier version of Lmod. My attempts to reproduce your issue have not worked. Once I know what is happening, I can create my own cache file to add to the test suite.

@wpoely86
Copy link
Contributor Author

The issue is a cache build by Lmod 8.7.53 as it's now in EPEL. If I rebuild the cache with the Lmod from the IS690-hide branch, it's fine again and all works as intended. The original issue is gone.

I'm attaching the two cache files:

@rtmclay
Copy link
Member

rtmclay commented Nov 27, 2024

I would like to figure out why you got the original error. However using your spiderT-broken.lua file as my cache, I am unable to reproduce the error. It is still out there but someone else will have to report it. Thanks for your help.

@wpoely86
Copy link
Contributor Author

Does it help if I give you a -D -D output of the error? Or a full copy of the module tree?

@rtmclay
Copy link
Member

rtmclay commented Nov 27, 2024

Yes, I'd like to try both. Thanks!

@wpoely86
Copy link
Contributor Author

If I start it in an empty rockylinux 8 container, I can't reproduce it either with a full copy of the module tree + cache. This is getting weird.

@rtmclay
Copy link
Member

rtmclay commented Nov 27, 2024

I think that this bug has to do with hidden modules. It may have to do with different LMOD vars but I don't know why.

If you have made sure that you have the same LMOD vars and hidden modules then I don't know what the issue is.

@rtmclay
Copy link
Member

rtmclay commented Nov 28, 2024

I went back to your stack trace and found that

   local resultT = l_find_resultT(self, "hiddenT", {kind = "hidden"}, modT.mpath, wantedA)

   -- Apply isVisibleHook, convert false isVisible flag to resultT.
   if (hook.exists("isVisibleHook")) then
      modT.isVisible = true
      hook.apply("isVisibleHook", modT)
      if (not modT.isVisible) then
         resultT        = resultT or {}
         resultT.kind   = "hidden"   --> this line fails in your case.
         resultT.name   = modT.fullName
      end
   end

Which would mean that resultT is a true value but not a table. Which means that in l_find_resultT returns a value which is not false.

local function l_find_resultT(self, tbl_kind, replaceT, mpath, wantedA)
   dbg.start{"MRC:l_find_resultT( tbl_kind, replaceT, mpath, wantedA)"}
   local resultT = false
   local Tkind   = "__" .. tbl_kind
   local tt      = {}
   local ttt     = self[Tkind] or {}
   
   if (self.__mpathT[mpath] and self.__mpathT[mpath][tbl_kind]) then
      tt  = self.__mpathT[mpath][tbl_kind]
   end

   local mpathA = {mpath}
   for i = 1,#wantedA do
      local wanted = wantedA[i]
      local key    = self:resolve(mpathA, wanted)
      local ans    = ttt[key] or tt[key]
      if (ans ) then
         resultT = ans
         dbg.fini("MRC:l_find_resultT")
         return resultT         
      end
   end
   dbg.fini("MRC:l_find_resultT (false)")
   return resultT
end   

Which implies that ans is not a table because tt[key] or ttt[key] is not a table. So the new version in l_find_resultT becomes:

      if (ans and type(ans) == "table") then

This should prevent resultT from ever being set to something other than a table. When everyone has spiderT.lua built with Lmod 8.8+ this won't be a problem

I'm willing to close this issue unless you see a reason to keep it open.

@rtmclay
Copy link
Member

rtmclay commented Dec 1, 2024

I believe that I have found what the problem was. I have added code to deal with the old spiderT.lua format. That is why you originally found that stack trace error.

Please test the updated IS690-hide branch to see if it works for you.

@wpoely86
Copy link
Contributor Author

wpoely86 commented Dec 1, 2024

All seems well with the updated IS690-hide branch :)

The original issue is gone and everything else seems to work.

@rtmclay rtmclay closed this as completed Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants