Lecture 1

Syllabus, Course Outline, Python Basics

Byeong-Hak Choe

SUNY Geneseo

February 6, 2024

Instructor

Instructor

Current Appointment & Education

  • Name: Byeong-Hak Choe.

  • Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.

  • Ph.D. in Economics from University of Wyoming.

  • M.S. in Economics from Arizona State University.

  • M.A. in Economics from SUNY Stony Brook.

  • B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.

    • Minor in Business Administration.
    • Concentration in Finance.

Instructor

Economics, Data Science, and Climate Change

  • I consider myself an applied economist specializing in environmental economics, with a specific emphasis on climate change.

  • My methodological approach involves leveraging causal inference, econometrics, machine learning methods, and various data science tools for conducting empirical analyses.

  • Choe, B.H., 2021. “Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange/#globalwarming and Energy Lobbies.”

  • Choe, B.H. and Ore-Monago, T., 2024. “Governance and Climate Finance in the Developing World”

Syllabus

Syllabus

Email, Class & Office Hours

Syllabus

Course Description

  • This course aims to provide overview of how one can process, clean, and crunch datasets with practical case studies.

  • Key topics include:

    1. loading, slicing, filtering, transforming, reshaping, and merging data
    2. summarizing and visualizing data,
    3. exploratory data analysis.
  • We will cover these topics to solve real-world data analysis problems with thorough, detailed examples.

Syllabus

Reference Materials

Syllabus

Course Requirements

  • Laptop or personal computer

    • Operating System: Mac or Windows.
    • Specification: 2+ core CPU, 4+ GB RAM, and 500+ GB disk storage.
  • Homework: There will be six homework assignments.

  • Exam: There will be one take-home exam.

  • Discussions: You are encouraged to participate in GitHub-based online discussions.

    • Checkout the netiquette policy in the syllabus.

Syllabus

Course Schedule and Contents

There will be tentatively 7 class sessions.

Syllabus

Assessments

\[ \begin{align} (\text{Total Percentage Grade}) =\quad\, &0.60\times(\text{Total Homework Score})\notag\\ \,+\, &0.30\times(\text{Take-Home Exam Score})\notag\\ \,+\, &0.10\times(\text{Total Discussion Score})\notag \end{align} \]

  • Each of the six homework assignments accounts for 10% of the total percentage grade.
  • The exam account for 30% of the total percentage grade.
  • Participation in discussions accounts for 10% of the total percentage grade.

Prologue

Why Data Analytics?

  • Fill in the gaps left by traditional business and economics classes.
    • Practical skills that will benefit your future career.
    • Neglected skills like how to actually find datasets in the wild and clean them.
  • Data analytics skills are largely distinct from (and complementary to) the core quantitative works familiar to business undergrads.
    • Data visualization, cleaning and wrangling; databases; machine learning; etc.
  • In short, we will cover things that I wish someone had taught me when I was undergraduate to prepare my post-graduate career.

You, at the end of this course

Why Data Analytics?

  • Data analysts use analytical tools and techniques to extract meaningful insights from data.
    • Skills in data analytics are also useful for business analysts or market analysts.
  • Breau of Labor Statistics forecasts that the projected growth rate of the employment in the industry related to data analytics from 2021 to 2031 is 36%.
    • The average growth rate for all occupations is 5%.

Why Python, R, and Databases?

Why Python, R, and Databases?

Data Science and Big Data

  • Stack Overflow is the most popular Q & A website specifically for programmers and software developers in the world.

  • See how programming languages have trended over time based on use of their tags in Stack Overflow from 2008 to 2022.

The State of the Art

Generative AI and ChatGPT

Data Science and Big Data Trend

From 2008 to 2023

Programmers in 2024

The State of the Art

Generative AI and ChatGPT

  • Users around the world have explored how to best utilize generative pre-trained transformer (GPT) for writing essays and programming codes.
  • Is AI a threat to data analytics?
    • Fundamental understanding of the subject matter is still crucial for effectively utilizing AI’s capabilities.
  • If we use Generative AI such as ChatGPT, we should try to understand what Generative AI gives us.
    • Copying and pasting it without understanding harms our learning opportunity.

DANL Tools

What is Python?

  • Python is a, simple, easy-to-read, and fast programming language, making it an excellent choice for beginners and experienced developers alike.
    • Versatility: Python’s extensive library and the vast ecosystem of third-party modules make it suitable for a wide range of applications, from web development and data analysis to machine learning, AI and scientific computing.

What is GitHub?

  • GitHub is a web-based hosting platform for Git repositories to store, manage, and share code.

  • Github is useful for many reasons, but the main reason is how user friendly it makes uploading and sharing code.

  • We will use a GitHub repository to store Python Notebooks.

  • Course contents will be posted not only in Brightspace but also in our GitHub repositories (“repos”) and websites.

What is Google Colab?

  • Google Colab is analogous to Google Drive, but specifically for writing and executing Python code in your web browser.

  • A key benefit of Colab is that it is entirely free to use and has many of the standard Python modules pre installed.

    • It allows for CPU or GPU usage, even for free users, and stores the files in Google’s servers so you can access your files from anywhere you can connect to the Internet.
  • Using Colab also means you can entirely avoid the process of installing Python and any dependencies onto your computer.

  • Colab notebooks don’t just contain Python code. They can contain text, images, and HTML via Markdown!

How do we use Google Colab with GitHub?

Python Basics

Python Basics

Variables Are Names, Not Places

  • A value is datum (literal) such as a number or text.

  • There are different types of values:

    • 352.3 is known as a float or double;
    • 22 is an integer;
    • “Hello World!” is a string.

Python Basics

Values, Variables, and Types

a = 10
print(a)

  • A variable is a name that refers to a value.
    • We can think of a variable as a box that has a value, or multiple values, packed inside it.
  • A variable is just a name!

Python Basics

Values, Variables, and Types

  • Sometimes you will hear variables referred to as objects.

  • Everything that is not a literal value, such as 10, is an object.

Python Basics

Assignment ( = )

# Here we assign the integer value 5 to the variable x.
x = 5   

# Now we can use the variable x in the next line.
y = x + 12  
y
  • In Python, we use = to assign a value to a variable.

  • In math, = means equality of both sides.

  • In programs, = means assignment: assign the value on the right side to the variable on the left side.

Python Basics

Code and comment style

  • The two main principles for coding and managing data are:
    • Make things easier for your future self.
    • Don’t trust your future self.
  • The # mark is Google Colab’s comment character.
    • The # character has many names: hash, sharp, pound, or octothorpe.
    • # indicates that the rest of the line is to be ignored.
    • Write comments before the line that you want the comment to apply to.
  • Consider adding more comments on code cells and their results using text cells.

Python Basics

Assignment

  • In programming code, everything on the right side needs to have a value.
    • The right side can be a literal value, or a variable that has already been assigned a value, or a combination.
  • When Python reads y = x + 12, it does the following:
    1. Sees the = in the middle.
    2. Knows that this is an assignment.
    3. Calculates the right side (gets the value of the object referred to by x and adds it to 12).
    4. Assigns the result to the left-side variable, y.

Python Basics

Variables Are Names, Not Places

list_example = [10, 1.23, "like this", True, None]
print(list_example)
type(list_example)
  • The most basic built-in data types that we’ll need to know about are:
    • integers 10
    • floats 1.23
    • strings "like this"
    • booleans True
    • nothing None
  • Python also has a built-in type of data container called a list (e.g., [10, 15, 20]) that can contain anything, even different types

Python Basics

Types

  • The second column (Type) contains the Python name of that type.

  • The third column (Mutable?) indicates whether the value can be changed after creation.

Python Basics

Brackets

  • There are several kinds of brackets in Python, including [], {}, and ().
vector = ['a', 'b']
vector[0]
  • [] is used to denote a list or to signify accessing a position using an index.
{'a', 'b'}  # set
{'first_letter': 'a', 'second_letter': 'b'}  # dictionary
  • {} is used to denote a set or a dictionary (with key-value pairs).
num_tup = (1, 2, 3)
sum(num_tup)
  • () is used to denote
    • a tuple, or
    • the arguments to a function, e.g., function(x) where x is the input passed to the function.

Python Basics

Operators

string_one = "This is an example "
string_two = "of string concatenation"
string_full = string_one + string_two
print(string_full)
  • All of the basic operators we see in mathematics are available to use:
  • + for addition
  • - for subtraction
  • * for multiplication
  • ** for powers
  • / for division
  • // for integer division
  • These work as you’d expect on numbers.
  • These operators are sometimes defined for other built-in data types too.
    • We can ‘sum’ strings (which really concatenates them).

Python Basics

Operators

list_one = ["apples", "oranges"]
list_two = ["pears", "satsumas"]
list_full = list_one + list_two
print(list_full)
  • It works for lists too:
string = "apples, "
print(string * 3)
  • We can multiply strings!

Python Basics

Operators

Q. Classwork 1.1

Using Python operations only, calculate below: \[\frac{2^5}{7 \cdot (4 - 2^3)}\]

Python Basics

Booleans and Conditions

10 == 20
10 == '10'
  • Boolean data have either True or False value.

Python Basics

Booleans and Conditions

  • Existing booleans can be combined, which create a boolean when executed.

Python Basics

Booleans and Conditions

Conditions are expressions that evaluate as booleans.

Python Basics

Booleans and Conditions

boolean_condition1 = 10 == 20
print(boolean_condition1)

boolean_condition2 = 10 == '10'
print(boolean_condition2)
  • The == is an operator that compares the objects on either side and returns True if they have the same values

  • Q. What does not (not True) evaluate to?

  • Q. Classwork 1.2

Python Basics

Booleans and Conditions

name = "Geneseo"
score = 99

if name == "Geneseo" and score > 90:
    print("Geneseo, you achieved a high score.")

if name == "Geneseo" or score > 90:
    print("You could be called Geneseo or have a high score")

if name != "Geneseo" and score > 90:
    print("You are not called Geneseo and you have a high score")
  • The real power of conditions comes when we start to use them in more complex examples, such as if statements.

Python Basics

Booleans and Conditions

name_list = ["Lovelace", "Smith", "Hopper", "Babbage"]

print("Lovelace" in name_list)

print("Bob" in name_list)
  • One of the most useful conditional keywords is in.
    • This one must pop up ten times a day in most coders’ lives because it can pick out a variable or make sure something is where it’s supposed to be.
  • Q. Check if “a” is in the string “Sun Devil Arena” using in. Is “a” in “Anyone”?

Python Basics

Booleans and Conditions

score = 98

if score == 100:
    print("Top marks!")
elif score > 90 and score < 100:
    print("High score!")
elif score > 10 and score <= 90:
    pass
else:
    print("Better luck next time.")
  • One conditional construct we’re bound to use at some point, is the if-else chain:

Python Basics

Casting Variables

orig_number = 4.39898498
type(orig_number)
mod_number = int(orig_number)
mod_number
type(mod_number)
  • Sometimes we need to explicitly cast a value from one type to another.

    • We can do this using built-in functions like str(), int(), and float().
    • If we try these, Python will do its best to interpret the input and convert it to the output type we’d like and, if they can’t, the code will throw a great big error.

Python Basics

Tuples and (im)mutability

  • A tuple is an object that is defined by parentheses and entries that are separated by commas, for example (15, 20, 32). (They are of type tuple.)

  • Tuples are immutable, while lists are mutable.

  • Immutable objects, such as tuples and strings, can’t have their elements changed, appended, extended, or removed.

    • Mutable objects, such as lists, can do all of these things.
  • In everyday programming, we use lists and dictionaries more than tuples.

Python Basics

Indentation

  • We have seen that certain parts of the code examples are indented.

  • Code that is part of a function, a conditional clause, or loop is indented.

  • Indention is actually what tells the Python interpreter that some code is to be executed as part of, say, a loop and not to executed after the loop is finished.

Python Basics

Indentation

x = 10

if x > 2:
    print("x is greater than 2")
  • Here’s a basic example of indentation as part of an if statement.

  • The standard practice for indentation is that each sub-statement should be indented by 4 spaces.

Python Basics

for Loops

name_list = ["Ben", "Chris", "Kate", "Mary"]

for name in name_list:
    print(name)
  • A loop is a way of executing a similar piece of code over and over in a similar way.

    • The most useful loop is for loops.
  • As long as our object is an iterable, then it can be used in this way in a for loop.

  • Lists, tuples, strings, and dictionaries are iterable.

Python Basics

Dictionaries

cities_to_temps = {"Paris": 28, "London": 22, "Seville": 36, "Wellesley": 29}

cities_to_temps.keys()
cities_to_temps.values()
cities_to_temps.items()
  • Another built-in Python type that is enormously useful is the dictionary.
    • This provides a mapping one set of variables to another (either one-to-one or many-to-one).
    • If you need to create associations between objects, use a dictionary.
  • We can obtain keys, values, or key-value paris from dictionaries.

Python Basics

Running on Empty

  • Being able to create empty containers is sometimes useful, especially when using loops.

  • The commands to create empty lists, tuples, dictionaries, and sets are lst = [], tup=(), dic={}, and st = set() respectively.

  • Q. What is the type of an empty list?

Python Basics

Slicing Methods

  • With slicing methods, we can get subset of the data object.

  • Slicing methods can apply for strings, lists, arrays, and DataFrames.

  • The above example describes indexing in Python

Python Basics

Strings

string = "cheesecake"
print( string[-4:] )
  • From strings, we can access the individual characters via slicing and indexing.
string = "cheesecake"
print("String has length:")
print( len(string) )
list_of_numbers = range(1, 20)
print("List of numbers has length:")
print( len(list_of_numbers) )
  • Both lists and strings will allow us to use the len() command to get their length:

Python Basics

Strings and Slicing

  • We can extract a substring (a part of a string) from a string by using a slice.

  • We define a slice by using square brackets ([]), a start index, an end index, and an optional step count between them.

    • We can omit some of these.
  • The slice will include characters from index start to one before end:

Python Basics

Get a Substring with a Slice

letters = 'abcdefghij'
letters[:]
  • [:] extracts the entire sequence from start to end.
letters = 'abcdefghij'
letters[4:]
letters[2:]
letters[-3:]
letters[-50:]
  • [ start :] specifies from the start index to the end.
letters = 'abcdefghij'
letters[:3]
letters[:-3]
letters[:70]
  • [: end ] specifies from the beginning to the end index minus 1.
letters = 'abcdefghij'
letters[2:5]
letters[-26:-24]
letters[35:37]
  • [ start : end ] indicates from the start index to the end index minus 1.
letters = 'abcdefghij'
letters[2 : 6 : 2]   # From index 2 to 5, by steps of 2 characters
letters[ : : 3]     # From the start to the end, in steps of 3 characters
letters[ 6 : : 4 ]    # From index 19 to the end, by 4
letters[ : 7 : 5 ]    # From the start to index 6 by 5:
letters[-1 : : -1 ]   # Starts at the end and ends at the start
letters[: : -1 ]
  • [ start : end : step ] extracts from the start index to the end index minus 1, skipping characters by step.

Python Basics

Lists and Slicing

  • Python is
    • a zero-indexed language (things start counting from zero);
    • left inclusive;
    • right exclusive when we are specifying a range of values.

Python Basics

Lists and Slicing

list_example = ['one', 'two', 'three']
list_example[ 0 : 1 ]
list_example[ 1 : 3 ]

  • We can think of items in a list-like object as being fenced in.
    • The index represents the fence post.

Python Basics

Lists and Slicing

Get an Item by [index]

suny = ['Geneseo', 'Brockport', 'Oswego', 'Binghamton', 
        'Stony Brook', 'New Paltz'] 
  • We can extract a single value from a list by specifying its index:
suny[0]
suny[1]
suny[2]
suny[7]
suny[-1]
suny[-2]
suny[-3]
suny[-7]

Get an Item with a Slice

  • We can extract a subsequence of a list by using a slice:
suny = ['Geneseo', 'Brockport', 'Oswego', 'Binghamton', 
        'Stony Brook', 'New Paltz'] 
suny[0:2]    # A slice of a list is also a list.
suny[ : : 2]
suny[ : : -2]
suny[ : : -1]
suny[4 : ]
suny[-6 : ]
suny[-6 : -2]
suny[-6 : -4]

Python Basics

Lists and Slicing

Python Basics

Functions

int("20") 
float("14.3")
str(5)
int("xyz")
  • A function can take any number and type of input parameters and return any number and type of output results.

  • Python ships with more than 65 built-in functions.

  • Python also allows a user to define a new function.

  • We will mostly use built-in functions.

Python Basics

Functions, Arguments, and Parameters

print("Cherry", "Strawberry", "Key Lime")
print("Cherry", "Strawberry", "Key Lime", sep = "!")
print("Cherry", "Strawberry", "Key Lime", sep=" ")
  • We invoke a function by entering its name and a pair of opening and closing parentheses.

  • Much as a cooking recipe can accept ingredients, a function invocation can accept inputs called arguments.

  • We pass arguments sequentially inside the parentheses (, separated by commas).

  • A parameter is a name given to an expected function argument.

  • A default argument is a fallback value that Python passes to a parameter if the function invocation does not explicitly provide one.

Python Basics

Functions, Arguments, and Parameters

Python Basics

Importing Modules, Packages, and Libraries

  • Python is a general-purpose programming language and is not specialized for numerical or statistical computation.

  • The core libraries that enable Python to store and analyze data efficiently are:

    • pandas
    • numpy
    • matplotlib and seaborn

Python Basics

Importing Modules, Packages, and Libraries

  • pandas provides Series and DataFrames which are used to store data in an easy-to-use format.

  • numpy, numerical Python, provides the array block (np.array()) for doing fast and efficient computations;

  • matplotlib provides graphics. The most important submodule would be matplotlib.pyplot.
  • seaborn provides a general improvement in the default appearance of matplotlib-produced plots.

Python Basics

Importing Modules, Packages, and Libraries

  • A module is basically a bunch of related codes saved in a file with the extension .py.

  • A package is basically a directory of a collection of modules.

  • A library is a collection of packages

  • We refer to code of other modules/pacakges/libraries by using the Python import statement.

    • This makes the code and variables in the imported module available to our programming codes.
    • We can use the as keyword when importing the modules using their canonical names.
  • Q. Classwork 1.5