Machine Learning Project - Guideline

DANL 320: Big Data Analytics

Author
Affiliation

Byeong-Hak Choe

SUNY Geneseo

Published

April 13, 2026

Overview

For the final project in DANL 320, you will complete a machine learning project.

  • You may work alone or form a group with one other student.
  • The size of a group must be either one or two.
  • You may use any dataset for the project.
  • You may also build on a project from another course such as DANL 310 or DANL 410, as long as you substantially extend it to meet the requirements of this course.
  • Your project must include:
    • thoughtful data preparation and transformation,
    • appropriate data visualization and descriptive analysis,
    • multiple supervised machine learning models, and
    • at least one unsupervised learning model.
  • A literature review is optional.

The main goal of this project is to show that you can take a dataset, prepare it carefully, apply and compare machine learning methods, interpret the results clearly, and communicate the value of your analysis in a coherent and professional way.

Presentation

Your presentation will take place during class time on May 4 and May 6.

  • Each student will give a 12-minute presentation.
  • If you work in a two-person group, your group will have 24 minutes total.
    • Each member should speak for roughly 12 minutes.
  • The presentation order will be determined randomly.
  • If your project uses a machine learning method that was not covered in class, you should provide a brief and accessible explanation of that method.
  • If your topic involves technical, scientific, or domain-specific knowledge, you should also provide enough background so that your classmates can understand the context and importance of your work.

What to Submit for the Presentation

  • Please be prepared to present your slides during class on your assigned day.
  • You should use clear and professional visual materials.
  • Your slides may be prepared in PowerPoint, Google Slides, or another presentation format approved by the instructor.

Key Components in the Presentation

  1. Title
    • Choose a title that clearly reflects your project.
  2. Introduction
    • Background: Explain the topic and why it matters.
    • Project Motivation: Describe what interested you about the problem.
    • Research Question or Goal: State clearly what you are trying to predict, classify, cluster, or learn from the data.
  3. Data
    • Introduce the dataset and explain where it came from.
    • Describe the key variables used in your analysis.
    • Briefly explain how you cleaned, transformed, or prepared the data.
  4. Exploratory Analysis
    • Present descriptive statistics and visualizations that help the audience understand the data.
    • Highlight patterns that motivate your modeling choices.
  5. Machine Learning Analysis
    • You must include various supervised learning models.
    • You must also include at least one unsupervised learning model.
    • Clearly explain why you chose your models.
    • Present model performance and interpret the results in a meaningful way.
    • Focus on comparison, insight, and interpretation rather than simply reporting numbers.
  6. Significance of the Project
    • Explain why your findings matter.
    • Discuss possible business, policy, scientific, or practical implications.
  7. References
    • Cite all relevant sources consistently.

Structure of the Project Write-Up

Your write-up should be posted on your personal GitHub website by May 14, 2026, at 11:59 PM.

You may prepare the write-up using Quarto that can be clearly published as a webpage on your personal GitHub website.

1. Introduction

  • Provide background on your topic.
  • Explain why the topic is interesting or important.
  • Clearly state the main research question, modeling goal, or analytical objective.

2. Literature Review (Optional)

  • You may include a short literature review if it helps motivate your project.
  • This is optional, not required.

3. Data

  1. Source and Scope
    • Explain where the data came from.
    • Describe the time period, unit of observation, and scope of the data.
  2. Variables
    • Define the main variables used in the project.
  3. Cleaning and Preparation
    • Explain how you handled missing values, recoded variables, engineered features, normalized values, or otherwise prepared the data.
  4. Exploratory Data Analysis
    • Include descriptive statistics and visualizations.
    • Show patterns in the data that help motivate later modeling choices.

4. Supervised Machine Learning Analysis

  • Include multiple supervised machine learning models.
  • These may include, for example, linear regression, logistic regression, regularized regression, decision trees, random forests, gradient boosting, support vector machines, or other appropriate supervised methods.
  • Clearly explain:
    • the modeling goal,
    • the predictors and outcome,
    • how the data were split or validated,
    • the evaluation metric(s), and
    • how the models compare.
  • Interpret the results in clear language.

5. Unsupervised Learning Analysis

  • Include at least one unsupervised learning model.
  • This may include, for example, clustering, principal component analysis, association rules, or another appropriate unsupervised method.
  • Explain why the method is useful for your dataset and what insights it provides.

6. Discussion / Implications

  • Discuss what your results mean.
  • Explain the practical significance of your findings.
  • Reflect on strengths and limitations of your analysis.

7. Conclusion

  • Summarize your main findings.
  • Briefly explain what you learned from the project.
  • Suggest possible extensions or next steps.

8. References

  • Use a consistent citation style.
  • Include all sources you cited.

General Requirements

  • Format: Your write-up should be presented in a clear, readable, and reproducible format.
  • Website Posting: The final write-up must be posted on your personal GitHub website.
  • Deadline: May 14, 2026, 11:59 PM.
  • Code and Output: Include code, results, tables, and figures as appropriate.
  • Organization: Use clear section headings and logical flow.
  • Clarity: Your write-up should explain what you did and why you did it, not just show code.

Suggested Project Workflow

  1. Choose a topic and dataset.
  2. Clean and prepare the data.
  3. Explore the data with tables and visualizations.
  4. Fit and compare multiple supervised models.
  5. Apply at least one unsupervised learning method.
  6. Interpret the results.
  7. Prepare presentation slides.
  8. Publish the final write-up on your personal GitHub website.

Rubric

Presentation

Attribute Very Deficient (1) Somewhat Deficient (2) Acceptable (3) Very Good (4) Outstanding (5)
1. Quality of Data Preparation and Exploratory Analysis Little or no preparation shown; major errors Minimal preparation; several errors Adequate preparation and exploratory analysis Strong preparation and thoughtful exploratory analysis Excellent and highly effective preparation and exploratory analysis
2. Quality of Data Visualization Missing, unclear, or misleading visuals Basic visuals with limited clarity Clear and appropriate visuals Insightful and well-designed visuals Exceptional, polished, and highly effective visuals
3. Quality of Supervised Learning Analysis Inappropriate or missing supervised models Limited supervised modeling with weak explanation Appropriate supervised models with adequate explanation Strong supervised modeling with good comparison and interpretation Excellent supervised modeling with strong justification, comparison, and insight
4. Quality of Unsupervised Learning Analysis Missing or inappropriate unsupervised method Minimal unsupervised analysis with weak explanation Appropriate unsupervised analysis with adequate explanation Strong unsupervised analysis with useful interpretation Excellent unsupervised analysis with deep and meaningful insight
5. Effectiveness of Communication and Storytelling No clear narrative or purpose Weak narrative and limited clarity Clear overall structure and message Compelling and well-organized presentation Exceptionally engaging, coherent, and polished presentation
6. Quality of Presentation Delivery Difficult to follow; poor delivery Uneven delivery; limited preparedness Clear and reasonably organized delivery Professional and confident delivery Highly polished, confident, and engaging delivery

Write-Up

Attribute Very Deficient (1) Somewhat Deficient (2) Acceptable (3) Very Good (4) Outstanding (5)
1. Quality of Project Question / Goal Unclear or missing Somewhat unclear Clearly stated Clear and well motivated Exceptionally clear, interesting, and well motivated
2. Quality of Data Preparation and Visualization Poorly prepared and poorly visualized Some preparation and visualization, but several weaknesses Adequate preparation and visualization Strong preparation and clear visualization Excellent preparation and highly effective visualization
3. Quality of Supervised Modeling Analysis Missing or inappropriate Basic and weakly explained Appropriate and adequately explained Strong and well interpreted Excellent, thoughtful, and well justified
4. Quality of Unsupervised Learning Analysis Missing or inappropriate Basic and weakly explained Appropriate and adequately explained Strong and well interpreted Excellent, thoughtful, and insightful
5. Quality of Interpretation and Discussion Little or no interpretation Limited interpretation Adequate interpretation Strong interpretation with meaningful implications Deep, thoughtful, and compelling interpretation
6. Quality of Writing and Organization Very difficult to follow; many errors Somewhat disorganized; several errors Generally clear and organized Well organized and easy to read Exceptionally clear, polished, and professional
7. Quality of Reproducible Computing / Website Presentation Major issues with code, output, or website presentation Several issues with reproducibility or presentation Adequate reproducibility and website presentation Strong reproducibility and clear website presentation Excellent reproducibility, presentation, and technical polish
Back to top