Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to narrow down on changes (methods) that's considered for API migration #963

Open
niccolopaganini opened this issue Aug 31, 2023 · 17 comments

Comments

@niccolopaganini
Copy link

Referencing from previous issue #934. New issue created not to overwhelm the aforementioned issue. To highlight problems:

  • I focused on just our plugins, leading to code repetition.
  • Due to the above (and to keep it sort of clean) I overcomplicated it by given each directory as input developed by @shankari rather than one single directory
  • In the process of automating things, I relied on spreadsheet(s) to keep track of packages/ classes/ methods rather than obtaining one final list of methods required for update.

Current Process/ changes

  • Retain the code to obtain CSV ✅
  • Strip down "unnecessary information" to where just the name of the packages are noted ✅
    For example: android.view.accessibility.AccessibilityNodeInfo ➡️ AccessibilityNodeInfo
  • Compile a unique list from the above ✅
  • Use that CSV (list of classes) to traverse over the entire plugins directory and compile them in a list ✅
  • Filter to create a unique list ✅
  • Use that list and scrape for matching methods.
  • Create a unique list from the above
  • Use that CSV (list of methods) to traverse over the entire plugins directory and print out the name of the method and location of file
    An example for the above (that I got as an output from classes):
    Code:
print(f"Found {classes} in {java_file}") #note: rather than printing for classes, I will be compile them in a CSV 

Output:

Found json in /Users/nvsr/Downloads/e-mission-phone-nvsr-iter1-API_Migration_aid_script_patch-1/plugins/cordova-plugin-file/src/android/ContentFilesystem.java
@shankari
Copy link
Contributor

shankari commented Sep 1, 2023

@niccolopaganini where is this script? Can you please submit a draft PR?

@shankari shankari moved this to Issues being worked on in OpenPATH Tasks Overview Sep 1, 2023
@niccolopaganini
Copy link
Author

I am trying to omit the process of scraping it twice and rather scrape the code in one go. The problem I am currently facing is that it is picking up weird artifacts. For example:

  • #B
  • #W
  • #topheader
    etc. I am trying to polish the code to avoid picking up these artifacts. Once this is done, I can follow it with this code chunk
  • Use that CSV (list of classes) to traverse over the entire plugins directory and compile them in a list ✅

and then we will have our list of methods.

@niccolopaganini
Copy link
Author

Update: I am now able to access everything from the URL. However, few caveats I noticed

  • Super unorganized

Screenshot 2023-09-01 at 7 24 57 PM

  • That code chunk considerably takes more time to run than others (It is yet to be organized in a script)
  • Unable to get rid of the name of the packages being repeated

The final step that is to be done is to compare this CSV and the one we compiled to get matching packages. With that, we can infer:

  1. Changed Methods, Constructors & Fields
  2. Added Methods, Constructors & Fields
  3. Deprecated items

@niccolopaganini
Copy link
Author

To reiterate, the updated method is as follows:

  1. Compile just the list of packages. ✅
  2. Use that to compare against the plugins directory ✅
  3. Obtain a unique list ✅
  4. Compile another set of CSV with the changes -> currently working on this code chunk - I'm trying to organize this list so that it is easier for one to read
  5. compare the above list with the unique list and list the changes

@niccolopaganini
Copy link
Author

Update 2:

I am able to now sort it for each type. However, there is repetition in the scraping process and the title of the table is missing (crucial to know whether it is an added method, changed method, field, etc.)

Attached screenshot below shows repetition and the lack of a title at the start of the cell which is crucial for identification

Screenshot 2023-09-01 at 9 12 42 PM

@shankari
Copy link
Contributor

shankari commented Sep 4, 2023

Attached screenshot below shows repetition and the lack of a title at the start of the cell which is crucial for identification

In general, it is better to use text rather than screenshots to illustrate errors, since screenshots are not searchable.

The problem I am currently facing is that it is picking up weird artifacts. For example:
#B
#W
#topheader
etc. I am trying to polish the code to avoid picking up these artifacts. Once this is done, I can follow it with this code chunk

It would be helpful to see the code as you make these changes. I don't see any draft PR or updates to the existing PR...

I am able to now sort it for each type. However, there is repetition in the scraping process and the title of the table is missing (crucial to know whether it is an added method, changed method, field, etc.)

It seems like you should be able to address the repetition by simply comparing the strings and getting a unique set. Is there a reason that this doesn't work?

@niccolopaganini
Copy link
Author

I don't see any draft PR or updates to the existing PR...

I closed the existing PR (e-mission/e-mission-phone#1010) and opened a new one (e-mission/e-mission-phone#1025).

It seems like you should be able to address the repetition by simply comparing the strings and getting a unique set. Is there a reason that this doesn't work?

I had to polish that a bit as it did not apply to everything. Fixed this problem.

@niccolopaganini
Copy link
Author

Pending task is to compare and match. WIll update the status on that in a couple of hours.

@niccolopaganini
Copy link
Author

I have been experiencing network issues for the past 48 hours. Will update the issue as soon as it is up and running

@niccolopaganini
Copy link
Author

why is environment.yml in an_archive? It is still valid and needs to be activated for the script to run, correct?
Please make sure to pin the dependencies for all packages!
- common_imports.py will contain all necessary imports. In addition, the imports are distilled to remove redundant packages

classes.py renamed to classes_2.py consist(s) of the following changes:

goes from taking a "hardcoded" URL to taking it as an input from the user

url = "https://developer.android.com/sdk/api_diff/33/changes/alldiffs_index_changes"
...
if len(sys.argv) != 2:
        print("Usage: python classes.py <URL>")
        sys.exit(1)

    url = sys.argv[1]
...

Similarly, links.py is changed to links_2.py with the aforementioned change.

Input is achieved using

read -p "Enter the URL which contains the API changes: " url

@niccolopaganini
Copy link
Author

Changing from absolute paths to relative in output.py

    match_csv_java("/Users/nseptank/e-mission-phone/plugins", "/Users/nseptank/e-mission-phone/bin/API_migration_scripts/simplify_classes.csv")

working on updating this script currently (will be renamed to output_2.py) but failing somewhere logically when trying to walk back on directories.

This is the change that I made but I am working on it currently:

dirr = "../plugins"
csvf = "../bin/API_Migration_scripts_readme/API_Migration_Automation/API_migration_scripts/simplify_classes.csv"

print("Directory:", dirr)
print("CSV File:", csvf)

match_csv_java(dirr, csvf)

@niccolopaganini
Copy link
Author

Common imports will have the following:

#common_imports.py
import csv
import requests
import bs4
import pandas as pd

And all the scripts will be updated

@shankari
Copy link
Contributor

working on updating this script currently (will be renamed to output_2.py) but failing somewhere logically when trying to walk back on directories.
Similarly, links.py is changed to links_2.py with the aforementioned change.

Please do not make a copy of the file. Having multiple copies that are subtly different just makes it more confusing for users.
Just edit the file! That's why we use source control!

And all the scripts will be updated

I don't understand the point of "common imports" given that, in my review, I suggested combining all the files.

@niccolopaganini
Copy link
Author

Please do not make a copy of the file. Having multiple copies that are subtly different just makes it more confusing for users.
Just edit the file! That's why we use source control!

Oh I was making it so that I can differentiate it locally. My apologies fort not making it clear.

I don't understand the point of "common imports" given that, in my review, I suggested combining all the files.
I wanted to update the PR and then follow up with a reply to provide support as of why I chose this route. However, if this structure is uncommon and fragmented, I will combine everything.

@shankari
Copy link
Contributor

I wanted to update the PR and then follow up with a reply to provide support as of why I chose this route. However, if this structure is uncommon and fragmented, I will combine everything.

For efficiency, you might want to discuss pros and cons and make the design decision before implementation. Otherwise, you can waste time changing the code in the PR in a way that you will have to change again anyway.
If this was exploratory work, it would be different, but I think we have a fairly clear idea of what the current structure looks like

@niccolopaganini
Copy link
Author

niccolopaganini commented Sep 22, 2023

.py Description
classes scrapes packages/ classes
modified_csv removes unnecessary info
unique_classes compiles a unique list from the above
simplify_classes removes unnecessary from the above
output compares the above csv with all the plugins from the plugins directory
unique_packages removes the path column and compiles a unique list
links scrapes similar to classes but stores them as URLs
simplified_links removes unnecessary info
changed_scraped gets the methods with the URLs and desc.
changed_scraped_without_column filters to the methods
matching_entries gets the matching packages
merged_output gets the name of the packages, file location, and method name (as a whole)

I have taken the above and condensed them to the following:

.py Description
c1_m2_u3_s4 Scrapes packages on a high level and compiles a list of just classes
o5_up6 Takes the above list and compares against the plugins directory and then compiles a list. This list is then distilled to hold just a unique pair
l7_sl8_cs9_cswc10 Scrapes the URL again and saves them as links. With the list, I use it to access each link and compile ONLY the changes. Then I filter it to just unique methods
me11_mo12 Takes the above and the unique_packages.csv and then compares to get a list in merged_output.csv which contains the final list of methods requiring some form of change

While I can't just the first table as it supports a form of extreme modularity, I have compiled them to these 4 scripts (obviously could use better names for script) where someone other than me can use/ modify the packages accordingly. What do you think of this @shankari? This also supports modularity and is not fragmented enough and each python script serves a particular purpose.

@niccolopaganini
Copy link
Author

I was looking at setup.py for this as you had mentioned in the PR and I stumbled upon this link

https://docs.python.org/3/distutils/setupscript.html

(for example)

from distutils.core import setup

setup(name='Distutils',
      version='1.0',
      description='Python Distribution Utilities',
      author='Greg Ward',
      author_email='[email protected]',
      url='https://www.python.org/sigs/distutils-sig/',
      packages=['distutils', 'distutils.command'],
     )

Did you mean something like this?

@shankari shankari moved this from Issues being worked on to Pending for Nitish in OpenPATH Tasks Overview Oct 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending for Nitish
Development

No branches or pull requests

2 participants