Multiple lines in Parameter descriptions #7

SvenMantowsky · 2024-03-11T16:00:52Z

First of all - hats off to you jwlodek. This is almost the only package I found which is working almost perfect.
Not sure if this package is still maintained but i thought I give it a shot:

I converted several files with np docs in them and discovered a problem, where multiple lines of description are split in several lines in the markdown.

Example doc:

Parameters
----------
model_scores_cali: np.ndarray[float]
    2D-Array containing model outputs in form of a specific score (e.g. softmax). 
    The rows correspond to different data-points and the columns correspond to the 
    classes of the classification task. Note that the calibration data should not 
    have been used for model training.
cali_label: np.ndarray[str | int]
    Contains integer or string ground-truth labels. The i-th entry corresponds to
    the i-th row of model_scores_cali.
model_scores_val: np.ndarray[float]
    Contains 'validation' data with same structure as model_scores_cali. Prediction
    sets will be formed for this data.

Result:

Parameters

Parameter	Type	Doc
model_scores_cali	np.ndarray[float]	2D-Array containing model outputs in form of a specific score (e.g. softmax).
Unknown	The rows correspond to different data-points and the columns correspond to the	classes of the classification task. Note that the calibration data should not
Unknown	have been used for model training.	cali_label: np.ndarray[str
Unknown	Contains integer or string ground-truth labels. The i-th entry corresponds to	the i-th row of model_scores_cali.
model_scores_val	np.ndarray[float]	Contains 'validation' data with same structure as model_scores_cali. Prediction
Unknown	sets will be formed for this data.	val_label: None
Unknown	If the ground-truth labels of model_scores_val are known, they can be used as	input here in order to compute the empirical coverage of correct predictions.

If you have an easy fix for this i would appreciate it very much. But if you no longer maintain this repo - maybe you can point me in the direction where to look so i can save some time and create a PR you could maybe approve.

THX and have a nice week.

The text was updated successfully, but these errors were encountered:

jwlodek · 2024-03-13T13:58:51Z

Hi, glad you are finding this useful! I basically couldn't find a tool for doing this other than ones that generate full sphinx docs which is complete overkill for smaller projects, which is why I wrote this script over a weekend.

It's been a while since I worked on this, but I think the issue is I basically assumed single line descriptions for parameters.

The code that actually parses each docstring into the data structure that npdoc2md uses internally is here:

npdoc2md/npdoc2md.py

Lines 339 to 380 in 7138063

    
           def add_docstring_to_instance(instance: ItemInstance, doc_string: List[str]) -> None: 
        
               """Function that parses docstring to data structures and adds to instance 
        
               Parameters 
        
               ---------- 
        
               instance : ItemInstance 
        
                   current instance 
        
               doc_string : list of str 
        
                   Current instance's docstring as list of lines 
        
               """ 
        
               current_descriptor = None 
        
               i = 0 
        
               while i < len(doc_string): 
        
                   left_stripped = doc_string[i].lstrip() 
        
                   stripped = doc_string[i].strip() 
        
                   if i == 0: 
        
                       instance.set_simple_description(stripped.replace('"""', '')) 
        
                   elif stripped not in docstring_descriptors.keys() and current_descriptor is None: 
        
                       instance.add_to_detailed_description(left_stripped) 
        
                   elif stripped in docstring_descriptors.keys(): 
        
                       current_descriptor = stripped 
        
                   elif current_descriptor is not None and not stripped.startswith('---') and len(stripped) > 0: 
        
                       descriptor_elem = [] 
        
                       if len(docstring_descriptors[current_descriptor]) == 3: 
        
                           name_type = stripped.split(':') 
        
                           if len(name_type) == 1: 
        
                               name_type.insert(0, 'Unknown') 
        
                           descriptor_elem = descriptor_elem + name_type 
        
                       else: 
        
                           descriptor_elem.append(stripped.split('(')[0]) 
        
                       i = i + 1 
        
                       try: 
        
                           descriptor_elem.append(doc_string[i].strip()) 
        
                       except IndexError: 
        
                           descriptor_elem.append('Unknown') 
        
                       if current_descriptor not in instance.descriptors.keys(): 
        
                           instance.descriptors[current_descriptor] = [descriptor_elem] 
        
                       else: 
        
                           instance.descriptors[current_descriptor].append(descriptor_elem) 
        
                   i = i + 1

It essentially takes as input the docstring as a list of lines, and the reference to the internal ItemInstance object that represents this docstring.

Then, it loops over all the lines in the docstring, and based on what it expects to see, parses them into the format that ItemInstance expects. So for example, when it sees one of the numpy descriptors, i.e. Parameters, Returns etc., it adds a descriptor to the ItemInstance, and then for each subsequent line after the ---* it will add a name/type combo in one line, followed by the description in the next.

I think what we'd need to do is figure out a different way to determine whether or not there is a new entry in the docstring (maybe by using the number of indents?) and until we see a new entry keep appending to the current description.

I can take a look at this maybe over the weekend, or if you'd like to have a crack at it, I'd be happy with that.

There are probably some overall improvements that could be made here as well - I basically wrote this in one sitting and got it to a point that it works but never went back to clean things up and make things more "proper" or readable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple lines in Parameter descriptions #7

Multiple lines in Parameter descriptions #7

SvenMantowsky commented Mar 11, 2024

jwlodek commented Mar 13, 2024 •

edited

Loading

Multiple lines in Parameter descriptions #7

Multiple lines in Parameter descriptions #7

Comments

SvenMantowsky commented Mar 11, 2024

Parameters

jwlodek commented Mar 13, 2024 • edited Loading

jwlodek commented Mar 13, 2024 •

edited

Loading