Python – UNION like SQL in pandas
In the business, I often come across cases where data must be obtained directly from the data lake and preprocessed. Therefore, I think there are many opportunities to process large amounts of data with pandas.
So, this time, I would like to record the case of UNION like SQL with python – pandas, which is personally used a little more frequently.
1. Pre processing
First, import pandas and then create data. We will create df1, df2 and df3 of the data frame respectively.
In [1]: import pandas as pd
In [2]: df1 = pd.DataFrame(
...: {
...: "A": ["A0", "A1", "A2", "A3"],
...: "B": ["B0", "B1", "B2", "B3"],
...: "C": ["C0", "C1", "C2", "C3"],
...: "D": ["D0", "D1", "D2", "D3"],
...: },
...: index=[0, 1, 2, 3],
...: )
...: df1
Out[2]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
In [3]: df2 = pd.DataFrame(
...: {
...: "A": ["A4", "A5", "A6", "A7"],
...: "B": ["B4", "B5", "B6", "B7"],
...: "C": ["C4", "C5", "C6", "C7"],
...: "D": ["D4", "D5", "D6", "D7"],
...: },
...: index=[4, 5, 6, 7],
...: )
...: df2
Out[3]:
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
In [4]: df3 = pd.DataFrame(
...: {
...: "A": ["A8", "A9", "A10", "A11"],
...: "B": ["B8", "B9", "B10", "B11"],
...: "C": ["C8", "C9", "C10", "C11"],
...: "D": ["D8", "D9", "D10", "D11"],
...: },
...: index=[8, 9, 10, 11],
...: )
...: df3
Out[4]:
A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
2. UNION like SQL using pandas concat
First, let’s check the implementation image in SQL. Let’s look at the SQL implementation image assuming that df1, df2, and df3 are tables respectively.
# SQL
SELECT
*
FROM
df1
UNION ALL
SELECT
*
FROM
df2
UNION ALL
SELECT
*
FROM
df3
;
Out:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
Ok, back to the python code. Use pandas to UNION df1, df2, df3 like SQL. Here we use the concat function.
In [5]: pd.concat([df1, df2, df3])
Out[5]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
In the pandas doc, I create a frame and then concat it, but personally I always skip it. I will also include the official method just in case. First, create a frame.
In [6]: frame = [df1, df2, df3]
...: frame
Out[6]:
[ A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3,
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7,
A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11]
Next, using the created frame, combine the Data frames with the concat function.
In [7]: pd.concat(frame)
Out[7]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
3. SQL like UNION using pandas append
Now, let’s use pandas append to UNION “df1” and “df2” like SQL.
In [8]: df1.append(df2)
Out[8]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
Combining two or more Data frames is described as follows. Personally, I often combine data frames in a loop, so I don’t often encounter the combination of two or more data frames, but how to implement it.
In [9]: df1.append([df2, df3])
Out[9]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
The concat function might be simpler in this case.
4. Summary
So, this time, I recorded what happened when UNION was done like SQL with python – pandas in the python code.