๐Ÿงพ๐Ÿ“Œ Capstone Project Guide

Research Kick-off Report (DANL 410)

Author
Affiliation

Byeong-Hak Choe

SUNY Geneseo

Published

January 28, 2026

๐ŸŒ Overview

In DANL 410: Data Analytics Capstone, you will complete an end-to-end analytics projectโ€”starting from a focused question, moving through data work and analysis, and ending with a polished report and presentation.

This course is designed to help you practice what data analysts actually do:

  • Translate a real-world problem into an analytics question
  • Find and evaluate data (or create a clean dataset)
  • Clean, transform, and explore the data
  • Apply an appropriate machine learning or statistical method (e.g., regression, classification, clustering)
  • Communicate findings clearly with visuals and writing
  • Publish your work on your Quarto website (GitHub Pages)
Note

You do not need the โ€œperfectโ€ topic at the beginning. Your goal early in the semester is to build a feasible plan with a clear question, a credible dataset, and a reasonable method.

๐Ÿ‘ฅ Team Formation

  • Each team may have one or two members.
  • Every team member is expected to contribute actively and understand the entire project.
  • Teams should be formed by Wednesday, Feb 11, 11:59 P.M.
  • If your team has two members, a representative should email Prof. Choe at , cc-ing a team member.

โฐ Project Timeline (Spring 2026)

Component Description Due Date
๐Ÿงพ Research Kick-off Report Define question, motivation, data, and plan (2โ€“3 pages) Feb 25 (11:59 P.M.)
๐Ÿ—ฃ๏ธ Midterm Presentation Short progress presentation (what you have, what you learned, next steps) Mar 11 (class time)
๐Ÿงฉ Progress & Insights Report What you did + what you learned + obstacles + updated plan Mar 25
๐Ÿง  Research Synthesis Report Stronger draft narrative + results + interpretation Apr 15
๐Ÿ–ฅ๏ธ GREAT Day PowerPoint Slide deck submission Apr 22
๐Ÿ“„ Final Capstone Report Final written report (published on website) May 11 (11:59 P.M.)

๐Ÿงพ Research Kick-off Report โ€” Structure

Your Research Kick-off Report (2โ€“3 pages) is a concise proposal that explains:

  • what you plan to study,
  • why it matters (business, social, or policy relevance),
  • how you will study it using data.

Include these sections:

  1. Working Title & Topic
    • A clear, descriptive title.
    • 2โ€“4 sentences describing the topic and setting.
  2. Research Question
    • State one main research question in a single sentence.
    • Add 1โ€“3 sub-questions if helpful.
  3. Motivation / Value
    • Why would someone care?
    • Who is the โ€œstakeholderโ€ (a business, customers, local government, nonprofit, campus office, etc.)?
    • What decision, insight, or understanding might the analysis support?
  4. Data Plan
    • What dataset(s) will you use?
    • Unit of analysis (customer? county? day? product? student? transaction?)
    • Key variables you expect to use (inputs and outcomes)
    • Data limitations you already anticipate (missingness, small sample, measurement, bias)
  5. Method Plan (Required: include one ML/statistical component)
    • What approach will you use and why?
    • Examples:
      • Regression (prediction or cause-and-effect)
      • Classification (e.g., churn / pass-fail / fraud / sentiment categories)
      • Clustering (e.g., segmentation)
      • Time-series forecasting (if appropriate)
    • What is your evaluation plan? (train/test split, accuracy/RMSE, cross-validation, or a clear validation idea)
  6. References
    • Use a consistent citation style.
    • For online sources: include URL and date of access.
Note

๐Ÿ’ก Think of the Kick-off Report as your project blueprint. You are not expected to have final results yetโ€”but your plan should be credible and feasible.


๐Ÿ’ก Brainstorming Ideas (Capstone-Friendly)

Area Example Research Question Data Type Possible Methods
Business / Marketing Which factors predict customer churn or repeat purchase? transactions, CRM, reviews classification, regression
Operations What drives delays or failures in a process? timestamps, logs regression, clustering
Finance / Risk Can we predict risk outcomes or identify risky segments? firm/household indicators classification, clustering
Sports / Performance Which player/team features predict win probability? game logs regression, classification
Public Policy Which communities face higher risk or lower access? census + admin data regression, mapping, clustering
Education What predicts course performance or retention patterns? LMS / grades / surveys regression, classification
Climate Change Which factors predict household solar adoption (or energy burden) across counties? American Community Survey (ACS) + solar installs + weather/energy regression, classification, mapping

๐Ÿงญ Research Idea Logic

A strong capstone topic is:

  • Specific enough to answer within one semester
  • Data-supported (you can actually obtain usable data)
  • Method-feasible (you can execute and interpret a reasonable model)
  • Meaningful (it has a real stakeholder, decision, or insight)

Think in โ€œquestion typesโ€:

Type Purpose Example
Descriptive Summarize what is happening โ€œHow have housing prices changed across counties since 2018?โ€
Comparative Compare groups or time periods โ€œDo outcomes differ by region, income level, or policy change?โ€
Predictive Predict an outcome using features โ€œCan we predict churn using usage behavior?โ€
Segmentation Identify clusters / groups โ€œCan we cluster customers into meaningful segments?โ€
Diagnostic Identify drivers of an outcome โ€œWhich factors are most associated with delays?โ€
Warning

Avoid questions that are too broad:

  • โ€œHow does social media affect society?โ€

Instead, narrow:

  • โ€œWhich post features predict higher engagement for a specific account/category?โ€

๐Ÿ’พ Using Data Effectively

You are encouraged to use real-world data and present evidence using:

  • Clean tables (summary statistics, grouped means, counts)
  • Clear visualizations (distributions, trends, comparisons, relationships)
  • A method you can explain and justify
  • Transparent code and reproducible workflow

Guidelines:

  • Start with a dataset you can actually obtain quickly (Week 2โ€“4).
  • Prefer data that has:
    • Enough observations to learn patterns
    • Clear variable definitions
    • A usable target/outcome variable (for prediction/classification), or a clear grouping structure (for clustering)
  • Use modeling only when it adds value:
    • A โ€œfancyโ€ model with weak interpretation is worse than a simple model well explained.
  • Always cite data sources and document data cleaning steps.
Tip

You donโ€™t need complex modeling to do a strong capstoneโ€” but you do need clear evidence + clear reasoning + clear communication.


โœ… Checklist (Research Kick-off Report)

Before submitting your Research Kick-off Report:

  • Team confirmed (1โ€“2 people)
  • Clear research question (one sentence)
  • Stakeholder / value explained (why it matters)
  • Data source identified and link provided
  • Unit of analysis + key variables listed
  • Method plan includes at least one ML/stat component (regression/classification/clustering/etc.)
  • Realistic timeline and deliverables described
  • References included (URLs + access date for online sources)
  • 2โ€“3 pages, professional formatting
  • Submitted via Brightspace by Feb 25 (11:59 P.M.)

๐ŸŒŸ Closing Thought

A strong capstone project is not about doing everythingโ€”
itโ€™s about doing a focused analysis well,
and communicating results clearly enough that someone can act on them.

Back to top