Lecture 5

Data Collection III: Collecting Data with requests and APIs

Byeong-Hak Choe

SUNY Geneseo

March 4, 2026

⚙️ APIs

💻↔︎️🗄️️ Clients and Servers

  • Devices on the web act as clients and servers.
  • Your browser is a client; websites and data live on servers.
    • Client: your computer/phone + a browser (Chrome/Firefox/Safari).
    • Server: a computer that stores webpages/data and sends them when requested.
  • When you load a page, your browser sends a request and the server sends back a response (the page content).

What is an API?

  • API stands for Application Programming Interface.
    • It enables data transmission between a client and a server.
    • Think of it like a waiter in a restaurant:
      • Customer (client) → places an order → Waiter (API) → relays it → Kitchen (server) → returns the food (data)

API: Key Concepts

Methods: The “verbs” that define what action the client wants the server to perform.

Method Purpose
GET Retrieve data from a server ✅ (main one we use)
POST Send new data to a server
PUT Update existing data on a server
DELETE Remove data from a server
  • Request: What the client sends to the server — specifying the method, endpoint URL, and optional parameters (e.g., search terms, API keys).

  • Response: What the server sends back, consisting of three parts:

    • Status Code — indicates success or failure (e.g., 200 OK, 404 Not Found, 401 Unauthorized)
    • Content — the actual data of interest in JSON or XML format
    • Header — meta-information about the response

🔒🛡️ Hypertext Transfer Protocol Secure (HTTPS)

  • HTTP is how clients and servers communicate.
  • HTTPS is encrypted HTTP (safer).

When we type a URL starting with https://:

  1. Browser finds the server.
  2. Browser and server establish a secure connection.
  3. Browser sends a request for content.
  4. Server responds (e.g., 200 OK) and sends data.
  5. Browser decrypts and displays the page.

🚦🔢 HTTP Status Codes

# library for making HTTPS requests in Python
import requests
p = 'https://bcdanl.github.io/210'
response = requests.get(p)
print(response.status_code)
print(response.reason)
  • 200 OK → success; content returned.
p = 'https://bcdanl.github.io/2100'
response = requests.get(p)
print(response.status_code)
print(response.reason)
  • 404 Not Found → URL/page doesn’t exist (typo, removed page, broken link).

Understanding the requests Methods

  • What is requests?
    • A Python library used to make HTTP requests to APIs as well as websites.
    • Supports methods like GET, POST, PUT, DELETE, etc.
  • Using requests.get():
    • Sends a GET request to retrieve data from a specified URL.
    • Commonly used when fetching data from APIs.
import requests
url = 'https://www.example.com/.....'
param_dicts = {.....}  # Optional, but we often need this
response = requests.get(url, params=param_dicts)
  • url: API endpoint where data is requested.
  • param_dicts: Dictionary of query parameters sent with the request.

API Endpoints

  • In the case of web APIs, we can access information directly from the API database if we specify the correct URL(s).
    • These URLs are known as API endpoints.
  • Navigating your browser to an API endpoint will display raw text in one of these formats:
    • JSON (JavaScript Object Notation), or
    • XML (Extensible Markup Language).
  • Install JSONView to render JSON output nicely in Chrome.

Inspecting the Response Object

  • Key attributes and methods:
    • response.status_code: Returns HTTP status code (e.g., 200 for success).
    • response.json(): Converts JSON format data into a Python dictionary.
    • response.text: Decoded text version of the response.content
  • Best practices:
    • Always check status_code before processing the response.
    • Use response.json() for easier handling of JSON data.

🗽 NYC Open Data APIs

NYC Open Data

NYC’s Payroll Data

Example Code

import requests
import pandas as pd

endpoint = 'https://data.cityofnewyork.us/resource/ic3t-wcy2.json'  ## API endpoint
response = requests.get(endpoint)
content = response.json() # to convert JSON response data to a dictionary
df = pd.DataFrame(content)
  • The requests.get() method sends a GET request to the specified URL.
  • response.json() automatically converts JSON data into a dictionary or a list of dictionaries.
    • JSON objects have the same format as Python dictionaries.

🏦 Federal Reserve Economic Data (FRED) APIs

Federal Reserve Economic Data (FRED)

FRED API Key

Example Code

import requests  # to handle API requests
import json      # to parse JSON response data
import pandas as pd
param_dicts = {
  'api_key': 'YOUR_FRED_API_KEY', ## Change to your own key
  'file_type': 'json',
  'series_id': 'GDPC1'    ## ID for US real GDP
}
url = "https://api.stlouisfed.org/"
endpoint = "series/observations"
api_endpoint = url + "fred/" + endpoint   # sum of strings
response = requests.get(api_endpoint, params = param_dicts)
  • We’re going to go through the requests, json, and pandas libraries.
    • requests comes with a variety of features that allow us to interact more flexibly and securely with web APIs.

Example Code: response.json()

# Convert JSON response to Python dictionary.
content = response.json() 

# Extract the "observations" list element.
df = pd.DataFrame( content['observations'] )
  • json(): Converts JSON into a Python dictionary object (or a list of dictionaries).

  • By default, all columns in the DataFrame from content are string-type.

    • So, we need to convert their types properly.
    • The DataFrame may also contain columns we do not need.

Data Collection with APIs

Let’s do Classwork 9!

📰 New York Times APIs

Using the New York Times API

Sign Up and Sign In

Register Apps

  1. Select My Apps from the user drop-down at the navigation bar.
  2. Click + New App to create a new app.
  3. Enter a name and description for the app in the New App dialog.
  4. Click Create.
  5. Click the APIs tab.
  6. Click the access toggle to enable or disable access to an API product from the app.

Access the API keys

  1. Select My Apps from the user drop-down.
  2. Click the app in the list.
  3. View the API key on the App Details tab.
  4. Confirm that the status of the API key is Approved.

Collect Data with pynytimes

pip install pynytimes
  • While NYTimes Developer Portal APIs provides API documentation, it is time-consuming to go through the documentation.

  • There is an unofficial Python library called pynytimes that provides a more user-friendly interface for working with the NYTimes API.

  • To get started, check out Introduction to pynytimes.

Example Code: pynytimes

# Settings
from pynytimes import NYTAPI

# Initialize API with your key
nyt = NYTAPI("YOUR_NYTIMES_API_KEY", parse_dates=True)
# Top Stories
top_stories = nyt.top_stories()

# Get all the top stories from a specific category
top_climate_stories = nyt.top_stories(section = "climate")
# Most Viewed
most_viewed = nyt.most_viewed()

# Get most viewed articles of last 7 or 30 days
most_viewed = nyt.most_viewed(days = 7)
most_viewed = nyt.most_viewed(days = 30)

Example Code: Article Search

from datetime import datetime

# Define the date range
start = datetime(2025, 1, 1)
end = datetime(2026, 2, 28)

# Search articles related to climate within a date range
# This returns only up to 10 articles.
articles = nyt.article_search(
    query="climate",
    dates={"begin": start, "end": end},
)

🕵️ Hidden APIs

Hidden API

  • Most industry-scale websites display data from their database servers.

    • Many of them do NOT provide official APIs to retrieve data.
  • Sometimes, it is possible to find their hidden APIs to retrieve data!

  • Examples:

Hidden API

How can we find a hidden API?

  • On a web-browser (FireFox is recommended)
    1. F12 -> Network -> XHR (XMLHttpRequest)
    2. Refresh the webpage that loads data
    3. Find json type response that seems to have data
    4. Right-click that response -> Copy as cURL (Client URL)
    • cURL lets you talk to a server by specifying the location of the data you want.
  • On an API tool such as Python curlconverter,
    1. Paste the cURL to the curl command textbox, and then click “Copy to clipboard” right below the code cell.
    2. Paste it on your Python script, and run it.

References: Details on requests Methods

FYI on Web-Data Collection Approaches

  • Use API services if available; otherwise, use web scraping tools.

Web-scrapping Decision