Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQ] Coach Headshots #82

Open
1 task done
alecglen opened this issue Aug 7, 2024 · 1 comment
Open
1 task done

[FEATURE REQ] Coach Headshots #82

alecglen opened this issue Aug 7, 2024 · 1 comment
Labels
good first issue Good for newcomers

Comments

@alecglen
Copy link
Member

alecglen commented Aug 7, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

No response

Describe the solution you'd like

Make coach headshots available e.g. for nflplotR.

Here are the relevant links as of 2024-08-06:
nfl_coaches.csv

Here's the script to pull them, which could be updated to also grab coordinators, etc.

import re
import requests
import pandas as pd
from bs4 import BeautifulSoup


coaches = []


all_teams_page = requests.get("https://www.nfl.com/teams/")
all_teams_page.raise_for_status()

all_teams_soup = BeautifulSoup(all_teams_page.text, "lxml")
main_content = all_teams_soup.find("main", {"id": "main-content"})

for linkbutton in main_content.find_all("a", string=re.compile(r"View Full Site")):
    team: str = linkbutton.find_previous("p").text.strip()
    site: str = linkbutton["href"]
    print(f"{team}: {site}")

    try:
        team_coaches_url = site.strip("/") + "/team/coaches-roster/"
        team_coaches_page = requests.get(team_coaches_url)
        team_coaches_page.raise_for_status()
    except requests.HTTPError:
        team_coaches_url = site.strip("/") + "/team/coaches/"
        team_coaches_page = requests.get(team_coaches_url)
        team_coaches_page.raise_for_status()

    team_coaches_soup = BeautifulSoup(team_coaches_page.text, "lxml")
    coaches_main = team_coaches_soup.find("main", {"id": "main-content"})

    hc_text = coaches_main.find("h5", string=re.compile(r"Head Coach"))

    try:
        hc_name = hc_text.find_previous("h3").text.strip()
        assert len(hc_name.split()) == 2
    except (AttributeError, AssertionError):
         hc_name = hc_text.find_next("h3").text.strip()
         assert len(hc_name.split()) == 2

    hc_headshot = hc_text.find_previous("img")
    headshot_url = hc_headshot.get("data-src") or hc_headshot["src"]
    assert headshot_url.startswith("https://static.clubs.nfl.com/image/")

    headshot_url = re.sub("t_[a-z_]*/", "", headshot_url)

    coaches.append({
        "team": team,
        "team_site_source": team_coaches_url,
        "heach_coach": hc_name,
        "headshot_url": headshot_url
    })

    print(f"{hc_name} {headshot_url}")
    print()

pd.DataFrame(coaches).to_csv("nfl_coaches.csv", index=False)

Describe alternatives you've considered

No response

Additional context

Per Discord discussion https://discord.com/channels/789805604076126219/924673653961003098/1270566142586523649

@john-b-edwards john-b-edwards added the good first issue Good for newcomers label Sep 20, 2024
@john-b-edwards
Copy link
Contributor

I suspect there's a /coaches/ endpoint hanging out somewhere aorund here that we might be able to hit more cleanly, but this is a great start. Will see about trying to find said endpoint or incorporating this info otherwise when I have an opportunity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants