Python: Pandas to Concat Multiple DataFrame

Soumya Agarwal
By
Soumya Agarwal
I'm a BTech graduate from IIITM Gwalior. I have been actively working with large MNCs like ZS and Amazon. My development skills include Android and Python...
9 Min Read

This tutorial demonstrates how to concat DataFrames in Pandas with different Python examples and use cases. If you usually work with data, merge datasets, or handle lots of info, learning the DataFrame concatenation technique in Pandas will be helpful. It makes your data analysis tasks a lot easier.

Prerequisites to Concat DataFrames in Pandas

Before we start, ensure that you have Pandas installed. If you don’t have it installed, you can use the following command:

pip install pandas

Now, let’s learn how to concatenate DataFrames using Pandas and different use cases that evolve around this topic.

Understanding Concatenation

Concatenation is the process of combining data frames along a particular axis. In Pandas, the concat function is used for this purpose. It allows you to stack DataFrames vertically or horizontally. The key parameter is axis, where axis=0 stacks DataFrames vertically (along rows), and axis=1 stacks them horizontally (along columns).

Also Explore: Concatenate Strings in Python With Examples

Use Cases for Concatenating Multiple DataFrames

To concatenate Pandas data frames, there can be many approaches. Let’s check on each of them in detail and with the help of examples.

Concat DataFrames Vertically

Let’s create two simple DataFrames, df1 and df2, to demonstrate vertical concatenation.

import pandas as pds

# Create DataFrame 1
df1 = pds.DataFrame({
    'Name': ['Soumya', 'Versha'],
    'Age': [25, 30],
    'City': ['New York', 'Los Angeles']
})

# Create DataFrame 2
df2 = pds.DataFrame({
    'Name': ['Kavya', 'Sena'],
    'Age': [22, 28],
    'City': ['Chicago', 'Houston']
})

# Concate DataFrames Using Pandas
res_vt = pds.concat([df1, df2], axis=0)

# Display the result
print("Concatenated DataFrame Vertically:")
print(res_vt)

In this example:

  • We create two DataFrames, df1 and df2, with similar column names and structures.
  • The pds.concat() function combines these DataFrames vertically, creating a new DataFrame named res_vt.
  • The result is then displayed.

Run this script to see the concatenated DataFrame:

Concatenated DataFrame Vertically:
      Name  Age           City
0    Soumya   25       New York
1    Versha   30    Los Angeles
0     Kavya   22        Chicago
1      Sena   28        Houston

The final data frame has consecutive index values, reflecting the stacking of rows.

Concat DataFrames Horizontally

Now, let’s explore horizontal concatenation. We’ll modify the script to concat df1 and df2 DataFrames horizontally.

# Concatenate DataFrames horizontally
res_hr = pds.concat([df1, df2], axis=1)

# Display the result
print("Concatenated DataFrame Horizontally:")
print(res_hr)

Run this modified script to see the horizontally concatenated DataFrame:

Concatenated DataFrame Horizontally:
    Name  Age           City     Name  Age     City
0  Somya   25       New York    Kavya   22  Chicago
1 Versha   30    Los Angeles     Sena   28  Houston

The updated data frame has columns from both df1 and df2 side by side.

Handling Index Reset

After concatenation, the resulting data frame may have duplicate index values. To address this, you can reset the index using the ignore_index parameter.

# Concatenate DataFrames vertically with index reset
res_set_index = pds.concat([df1, df2], axis=0, ignore_index=True)

# Display the result
print("Concatenated DataFrame with Reset Index:")
print(res_set_index)

In the script, ignore_index=True ensures that the resulting data frame has a new sequential index:

Concatenated DataFrame with Reset Index:
      Name  Age           City
0   Soumya   25       New York
1   Versha   30    Los Angeles
2    Kavya   22        Chicago
3     Sena   28        Houston

Now, the index values are reset, providing a cleaner structure.

Concat DataFrames with Different Columns

What if your DataFrames have different columns? The concat function can handle this by filling in missing values with NaN.

# Create data frames with diff columns
df3 = pds.DataFrame({
    'Name': ['Dave', 'Tim'],
    'Job': ['Doctor', 'Engineer']
})

# Concatenate data frames with diff columns
result = pds.concat([df1, df3], axis=1)

# Display the result
print("Concatenated DataFrame with Different Columns:")
print(result)

The output will look like this:

Concatenated DataFrame with Different Columns:
    Name   Age         City   Name Occupation
0 Soumya  25.0     New York   Dave     Doctor
1 Versha  30.0  Los Angeles    Tim   Engineer

The missing values in columns that don’t exist in the original DataFrame are filled with NaN.

Concat DataFrames with Common Columns

When DataFrames have common columns, you might want to concatenate without duplication. The pds.concat function provides the keys parameter for this purpose.

# Concatenate data frames with common columns
result = pds.concat([df1, df2], axis=0, keys=['First', 'Second'])

# Display the result
print("Concatenated DataFrame with Common Columns:")
print(result)

Here, we use the keys parameter to create a hierarchical index:

Concatenated DataFrame with Common Columns:
              Name   Age           City
First  0    Soumya   25        New York
       1    Versha   30     Los Angeles
Second 0     Kavya   22         Chicago
       1      Sena   28         Houston

This hierarchical index allows you to distinguish between the original DataFrames.

Concat DataFrames with Duplicate Columns

In some cases, your DataFrames may have columns with identical names. To handle this, use the suffixes parameter to add suffixes to the duplicate columns.

# Create data frames with duplicate columns
df4 = pds.DataFrame({
    'Name': ['Shiv', 'Som'],
    'Age': [26, 35],
    'City': ['Miami', 'Seattle']
})

# Concatenate data frames with duplicate columns
result = pds.concat([df1, df4], axis=0, suffixes=('_left', '_right'))

# Display the result
print("Concatenated DataFrame with Duplicate Columns:")
print(result)

The output will look like this:

Concatenated DataFrame with Duplicate Columns:
    Name  Age_left        City  Age_right
0 Soumya        25    New York        NaN
1 Versha        30 Los Angeles        NaN
0   Shiv        26       Miami        NaN
1    Som        35     Seattle        NaN

The suffixes _left and _right help distinguish between the duplicate columns.

Frequently Asked Questions (FAQ)

Let’s add a few FAQs related to concat() DataFrames in Pandas.

Q1: Can I concatenate DataFrames with different column names?

A: Yes, you can mix DataFrames with different column names. The result will have all columns, with empty spaces filled as NaN.

Q2: How do I deal with repeated column names when combining DataFrames?

A: Use the suffixes option in pds.concat() to add labels like _left and _right to distinguish duplicate columns.

Q3: What if my index values become jumbled after combining?

A: Set ignore_index=True in pds.concat() to give your DataFrame a fresh, organized index.

Q4: Can I combine DataFrames with varying column numbers?

A: Absolutely. Combining DataFrames with different column counts fills in gaps with NaN.

Q5: Is there a way to join DataFrames without doubling up common columns?

A: Yes, use keys in pds.concat() to create a neat structure with a nested index, keeping things clear.

Q6: Are there other ways to put DataFrames together in Pandas?

A: Certainly! You can use pds.DataFrame.append() to add rows or pds.DataFrame.merge() for more complex merging.

Feel free to ask more questions that you might have via email. We want to ensure this tutorial helps you as much as possible.

Before You Leave

In this tutorial, we covered the essentials of concatenating DataFrames in Pandas. We discussed both vertical and horizontal concatenation. In addition, you got to see more cases like handling index reset, dealing with different and common columns, and managing duplicate columns.

As you continue working with Pandas, practicing techniques like concatenating DataFrames and others will help you do a more mature data analysis job. It will make your data analysis and manipulation tasks more efficient.

Lastly, our site needs your support to remain free. Share this post on social media (Linkedin/Twitter) if you gained some knowledge from this tutorial.

Happy coding,
TechBeamers.

Share This Article
Subscribe
Notify of
guest

0 Comments
Newest
Oldest
Inline Feedbacks
View all comments