Syllabus, Course Outline, Python Basics
February 6, 2024
Name: Byeong-Hak Choe.
Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.
Ph.D. in Economics from University of Wyoming.
M.S. in Economics from Arizona State University.
M.A. in Economics from SUNY Stony Brook.
B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.
I consider myself an applied economist specializing in environmental economics, with a specific emphasis on climate change.
My methodological approach involves leveraging causal inference, econometrics, machine learning methods, and various data science tools for conducting empirical analyses.
Choe, B.H., 2021. “Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange/#globalwarming and Energy Lobbies.”
Choe, B.H. and Ore-Monago, T., 2024. “Governance and Climate Finance in the Developing World”
Email: bchoe@geneseo.edu
Class Homepage:
Office: South Hall 301
Office Hours:
This course aims to provide overview of how one can process, clean, and crunch datasets with practical case studies.
Key topics include:
We will cover these topics to solve real-world data analysis problems with thorough, detailed examples.
Python Programming for Data Science by Tomas Beuzen
Coding for Economists by Arthur Turrell
Python for Econometrics in Economics by Fabian H. C. Raters
QuantEcon DataScience - Python Fundamentals by Chase Coleman, Spencer Lyon, and Jesse Perla
QuantEcon DataScience - pandas by Chase Coleman, Spencer Lyon, and Jesse Perla
Laptop or personal computer
Homework: There will be six homework assignments.
Exam: There will be one take-home exam.
Discussions: You are encouraged to participate in GitHub-based online discussions.
There will be tentatively 7 class sessions.
\[ \begin{align} (\text{Total Percentage Grade}) =\quad\, &0.60\times(\text{Total Homework Score})\notag\\ \,+\, &0.30\times(\text{Take-Home Exam Score})\notag\\ \,+\, &0.10\times(\text{Total Discussion Score})\notag \end{align} \]
Stack Overflow is the most popular Q & A website specifically for programmers and software developers in the world.
See how programming languages have trended over time based on use of their tags in Stack Overflow from 2008 to 2022.
From 2008 to 2023
GitHub is a web-based hosting platform for Git repositories to store, manage, and share code.
Github is useful for many reasons, but the main reason is how user friendly it makes uploading and sharing code.
We will use a GitHub repository to store Python Notebooks.
Course contents will be posted not only in Brightspace but also in our GitHub repositories (“repos”) and websites.
Google Colab is analogous to Google Drive, but specifically for writing and executing Python code in your web browser.
A key benefit of Colab is that it is entirely free to use and has many of the standard Python modules pre installed.
Using Colab also means you can entirely avoid the process of installing Python and any dependencies onto your computer.
Colab notebooks don’t just contain Python code. They can contain text, images, and HTML via Markdown!
A value is datum (literal) such as a number or text.
There are different types of values:
Sometimes you will hear variables referred to as objects.
Everything that is not a literal value, such as 10
, is an object.
=
)# Here we assign the integer value 5 to the variable x.
x = 5
# Now we can use the variable x in the next line.
y = x + 12
y
In Python, we use =
to assign a value to a variable.
In math, =
means equality of both sides.
In programs, =
means assignment: assign the value on the right side to the variable on the left side.
#
mark is Google Colab’s comment character.
#
character has many names: hash
, sharp
, pound
, or octothorpe
.#
indicates that the rest of the line is to be ignored.y = x + 12
, it does the following:
=
in the middle.x
and adds it to 12
).y
.10
1.23
"like this"
True
None
[10, 15, 20]
) that can contain anything, even different types
The second column (Type) contains the Python name of that type.
The third column (Mutable?) indicates whether the value can be changed after creation.
[]
, {}
, and ()
.[]
is used to denote a list or to signify accessing a position using an index.{}
is used to denote a set or a dictionary (with key-value pairs).string_one = "This is an example "
string_two = "of string concatenation"
string_full = string_one + string_two
print(string_full)
+
for addition-
for subtraction*
for multiplication**
for powers/
for division//
for integer divisionUsing Python operations only, calculate below: \[\frac{2^5}{7 \cdot (4 - 2^3)}\]
True
or False
value.Conditions are expressions that evaluate as booleans.
boolean_condition1 = 10 == 20
print(boolean_condition1)
boolean_condition2 = 10 == '10'
print(boolean_condition2)
The ==
is an operator that compares the objects on either side and returns True
if they have the same values
Q. What does not (not True)
evaluate to?
name = "Geneseo"
score = 99
if name == "Geneseo" and score > 90:
print("Geneseo, you achieved a high score.")
if name == "Geneseo" or score > 90:
print("You could be called Geneseo or have a high score")
if name != "Geneseo" and score > 90:
print("You are not called Geneseo and you have a high score")
if
statements.name_list = ["Lovelace", "Smith", "Hopper", "Babbage"]
print("Lovelace" in name_list)
print("Bob" in name_list)
in
.
in
. Is “a” in “Anyone”?score = 98
if score == 100:
print("Top marks!")
elif score > 90 and score < 100:
print("High score!")
elif score > 10 and score <= 90:
pass
else:
print("Better luck next time.")
if-else
chain:Sometimes we need to explicitly cast a value from one type to another.
str()
, int()
, and float()
.A tuple is an object that is defined by parentheses and entries that are separated by commas, for example (15, 20, 32)
. (They are of type tuple.)
Tuples are immutable, while lists are mutable.
Immutable objects, such as tuples and strings, can’t have their elements changed, appended, extended, or removed.
In everyday programming, we use lists and dictionaries more than tuples.
We have seen that certain parts of the code examples are indented.
Code that is part of a function, a conditional clause, or loop is indented.
Indention is actually what tells the Python interpreter that some code is to be executed as part of, say, a loop and not to executed after the loop is finished.
Here’s a basic example of indentation as part of an if
statement.
The standard practice for indentation is that each sub-statement should be indented by 4 spaces.
for
LoopsA loop is a way of executing a similar piece of code over and over in a similar way.
As long as our object is an iterable, then it can be used in this way in a for loop.
Lists, tuples, strings, and dictionaries are iterable.
cities_to_temps = {"Paris": 28, "London": 22, "Seville": 36, "Wellesley": 29}
cities_to_temps.keys()
cities_to_temps.values()
cities_to_temps.items()
Being able to create empty containers is sometimes useful, especially when using loops.
The commands to create empty lists, tuples, dictionaries, and sets are lst = []
, tup=()
, dic={}
, and st = set()
respectively.
Q. What is the type of an empty list?
With slicing methods, we can get subset of the data object.
Slicing methods can apply for strings, lists, arrays, and DataFrames.
The above example describes indexing in Python
len()
command to get their length:We can extract a substring (a part of a string) from a string by using a slice.
We define a slice by using square brackets ([]
), a start index, an end index, and an optional step count between them.
The slice will include characters from index start to one before end:
[ start :]
specifies from the start index to the end.[: end ]
specifies from the beginning to the end index minus 1.[ start : end ]
indicates from the start index to the end index minus 1.letters = 'abcdefghij'
letters[2 : 6 : 2] # From index 2 to 5, by steps of 2 characters
letters[ : : 3] # From the start to the end, in steps of 3 characters
letters[ 6 : : 4 ] # From index 19 to the end, by 4
letters[ : 7 : 5 ] # From the start to index 6 by 5:
letters[-1 : : -1 ] # Starts at the end and ends at the start
letters[: : -1 ]
[ start : end : step ]
extracts from the start index to the end index minus 1, skipping characters by step.
[index]
A function can take any number and type of input parameters and return any number and type of output results.
Python ships with more than 65 built-in functions.
Python also allows a user to define a new function.
We will mostly use built-in functions.
print("Cherry", "Strawberry", "Key Lime")
print("Cherry", "Strawberry", "Key Lime", sep = "!")
print("Cherry", "Strawberry", "Key Lime", sep=" ")
We invoke a function by entering its name and a pair of opening and closing parentheses.
Much as a cooking recipe can accept ingredients, a function invocation can accept inputs called arguments.
We pass arguments sequentially inside the parentheses (, separated by commas).
A parameter is a name given to an expected function argument.
A default argument is a fallback value that Python passes to a parameter if the function invocation does not explicitly provide one.
Python is a general-purpose programming language and is not specialized for numerical or statistical computation.
The core libraries that enable Python to store and analyze data efficiently are:
pandas
numpy
matplotlib
and seaborn
pandas
provides Series
and DataFrames
which are used to store data in an easy-to-use format.
numpy
, numerical Python, provides the array block (np.array()
) for doing fast and efficient computations;
matplotlib
provides graphics. The most important submodule would be matplotlib.pyplot
.seaborn
provides a general improvement in the default appearance of matplotlib
-produced plots.A module is basically a bunch of related codes saved in a file with the extension .py
.
A package is basically a directory of a collection of modules.
A library is a collection of packages
We refer to code of other modules/pacakges/libraries by using the Python import
statement.
as
keyword when importing the modules using their canonical names.