pandas
Basics - Joining DataFrames; Concatenating DataFrames
February 26, 2025
DataFrames
DataFrames
DataFrame
for county-level data and DataFrame
for geographic information, such as longitude and latitude.DataFrames
based on common data values in those DataFrames
.
merge()
method in Pandas.DataFrames
DataFrames
are called keys.DataFrames
with merge()
DataFrames
with merge()
DataFrames
with merge()
x
.
DataFrames
with merge()
y
.
DataFrames
with merge()
x
and y
.
DataFrames
with merge()
DataFrame
has duplicate keys (a one-to-many relationship).
DataFrames
with merge()
DataFrames
have duplicate keys (many-to-many relationship).
DataFrames
with merge()
left_on
and right_on
parameters instead.DataFrames
with merge()
Let’s do Part 2 of Classwork 7!
DataFrames
by adding rows or columns. This method is useful:
df1 = pd.read_csv('https://bcdanl.github.io/data/concat_1.csv')
df2 = pd.read_csv('https://bcdanl.github.io/data/concat_2.csv')
df3 = pd.read_csv('https://bcdanl.github.io/data/concat_3.csv')
.index
and .columns
in this Section.DataFrames
on top of each other uses the concat()
method.DataFrames
to be concatenated are passed in a list
.
pd.concat( [DataFrame_1, DataFrame_2, ... , DataFrame_N] )
axis
parameter in the concat()
method.axis
is 0
(or axis = "index"
), so it will concatenate data in a row-wise fashion.axis = 1
(or axis = "columns"
) to the function, it will concatenate data in a column-wise manner.pd.concat([df1, df2, df3], axis = "columns")
pd.concat([df1, df2, df3], axis = "columns", ignore_index = True)
ignore_index=True
to reset the column indicesSeries
and concatenate it with df1
:# create a new row of data
new_row_series = pd.Series(['n1', 'n2', 'n3', 'n4'])
# attempt to add the new row to a DataFrame
df = pd.concat([df1, new_row_series])
Series
into a DataFrame
.
DataFrame
contains one row of data, and the column names are the ones the data will bind to.pd.concat([df1, new_row_series], axis = 1)
:::
:::
Let’s do Part 3 of Classwork 7!