Collecting Premier League Data with Python requests
Final Exam, DANL 210-01, Spring 2025
π Directions
This is an exam on a paper, so minor coding errors are expected. My main focus is on your approach to each question β the logic, algorithms, and syntax you use. Nearly perfect code will be rewarded with bonus credit.
Data Collection with APIs (Points: 12)
The code below fetches the Premier Leagueβs top-10 scorers for the 2024β25 season (compSeasons = 719) from the Premier League API (which powers the Stats Center at https://www.premierleague.com/stats/top/players/goals?se=719):
import requests
import pandas as pd
import time
import random
# Custom headers for browser information
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:137.0) Gecko/20100101 Firefox/137.0',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Origin': 'https://www.premierleague.com',
'Connection': 'keep-alive',
'Referer': 'https://www.premierleague.com/',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'cross-site',
'Priority': 'u=0'
}
# Query parameters to paginate and filter the player-goals ranking endpoint
params = {
'page': '0', # which page of results to fetch
'pageSize': '10', # how many records per page
'compSeasons': '719', # season identifier (e.g., 2024β25)
'comps': '1', # competition ID (Premier League)
'compCodeForActivePlayer': 'EN_PR', # competition code for active players
'altIds': 'true', # include alternative player IDs
}
# Send GET request to the Premier League stats API
response = requests.get(
'https://footballapi.pulselive.com/football/stats/ranked/players/goals',
params=params,
headers=headers
)
# Parse the JSON response into a Python dict
content = response.json()
# Extract the stats dictionary from the content dictionary
stats = content['stats']
# Extract the list of player records from the stats dictionary
goals = stats['content']
# Convert the list of dicts, golas, into a pandas DataFrame for analysis
df_content = pd.json_normalize(goals)
# Sleep for a random 1β2 second interval.
time.seelp(random.uniform(1,2))
# At this point, df_content contains one row per player-goal record,
# with columns for player identifiers, goal counts, team info, etc.This is the content dictionary:

This is the stats dictionary:

This is the goals list of dictionaries:

This is the df_content DataFrame:


The first page (page = 0 in the params) of the Premier League Player Goals website shows players 1β10:

The second page (page = 1 in the params) of the Premier League Player Goals website shows players 11β20:

Task:
- Use a nested for-loop over:
- each
compSeasonsvalue inlst_compSeasons = [719, 578, 489]719,578, and489correspond to 2024β25, 2023β24, 2022β23 seasons, respectively.
- parameter
pagein[0, 1]
- each
- In each iteration:
- Update the
paramsdict with the currentpageandcompSeasons. - Send the GET request and parse the JSON, as provided in the given code.
- Normalize the
contentβstatsβgoalslist into a DataFrame, as provided in the given code. - After each request, pause execution for a random 5β8 second interval.
- Add a column named βseasonβ using the matching entry from
lst_compSeasons_txt = ['2024-25', '2023-24', '2022-23']. - Concatenate the
df_contentDataFrame to one comprehensive DataFrame,df_all.
- Update the
At the end, you should have a single DataFrame with the top 20 scorers for each of the three seasons, including a βseasonβ column.
This is the df_all DataFrame with 60 observations:

Your Task: Complete the Script Below
# %%
# =============================================================================
# Setting up
# =============================================================================
import requests
import pandas as pd
import time
import random
# Custom headers for browser information
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:137.0) Gecko/20100101 Firefox/137.0',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Origin': 'https://www.premierleague.com',
'Connection': 'keep-alive',
'Referer': 'https://www.premierleague.com/',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'cross-site',
'Priority': 'u=0'
}
# Query parameters to paginate and filter the player-goals ranking endpoint
lst_compSeasons = [719, 578, 489] # in the order of 2024β25, 2023β24, 2022β23 seasons
lst_compSeasons_txt = ['2024-25', '2023-24', '2022-23']
_______________PROVIDE_YOUR_CODE_HERE_______________Answer: