Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make ingest URL field case insensitive in parse_query #115

Merged

Conversation

filipchristiansen
Copy link
Collaborator

@filipchristiansen filipchristiansen commented Jan 8, 2025

This pull request addresses the issue where the injest URL field was not case insensitive, as described in #110.

Changes:

  • Added source = source.lower() in the parse_query function to handle mixed-case URLs.
  • Implemented a new test test_parse_query_mixed_case to ensure that URLs with uppercase letters are processed correctly.

These changes ensure that URLs with different casing are correctly recognized, resolving the reported issue.

Resolves #110

- Convert `source` to lowercase in `parse_query`
- Add `test_parse_query_mixed_case` to verify functionality

Resolves cyclotruc#110
Copy link
Owner

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thank you! Merging

@cyclotruc cyclotruc merged commit 551d09a into cyclotruc:main Jan 8, 2025
8 checks passed
@filipchristiansen filipchristiansen deleted the fix/injest-url-case-insensitive branch January 8, 2025 22:33
filipchristiansen added a commit that referenced this pull request Jan 10, 2025
- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](#115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
filipchristiansen added a commit that referenced this pull request Jan 10, 2025
- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](#115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
cyclotruc pushed a commit that referenced this pull request Jan 13, 2025
* feat: make parser domain-agnostic to support multiple Git hosts

- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](#115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

injest url field isn't case insensitive.
2 participants