Python – How to transform columns in Pandas: how to use map and apply

Python – How to transform columns in Pandas: how to use map and apply

This time, I will describe how to use map and apply, which are sometimes used in pandas. pandas is a library for data analysis in Python. pandas has various functions for preprocessing data, manipulating data, etc. This time, I will explain column conversion using pandas map and apply.

1. Basic usage

pandas map and apply can apply a specified function to each element of a DataFrame or Series. map can be applied to Series and apply can be applied to DataFrame.

1. Transforming Columns with map

First, let’s look at the basic usage using map. Below is an example of doubling the number in the first column.

In [1]: import pandas as pd
   ...:
   ...: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   ...: df['A'] = df['A'].map(lambda x: x * 2)
   ...: print(df)
   A  B
0  2  4
1  4  5
2  6  6

In this example, I take column A of DataFrame and double each element using map. You are passing a function as an argument to map. Here I use a lambda expression to define a function that doubles each element.

2. Transforming Columns with apply

Next, let’s look at basic usage using apply. Below is an example of doubling the entire DataFrame.

In [2]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   ...: df = df.apply(lambda x: x * 2)
   ...: print(df)
   A   B
0  2   8
1  4  10
2  6  12

In this example, I use apply to double each element over the entire DataFrame. You are passing a function as an argument to apply. Here I use a lambda expression to define a function that doubles each element.

2. Applied usage

Next, let’s look at the applied usage. For example, suppose you want to convert column A of DataFrame according to the following conditions.

  • If A is less than or equal to 1, convert to ‘low’.
  • If A is greater than or equal to 2 and less than or equal to 4, convert to ‘mid’.
  • If A is greater than or equal to 5, convert to ‘high’.

In such a case, you can define a function using map as follows. First create the data.

In [3]: df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]})
   ...: df
Out[5]:
   A
0  1
1  2
2  3
3  4
4  5
5  6

1. Maps and Lambda expressions

In [4]: df['A_label'] = df['A'].map(lambda x: 'low' if x <= 1 else 'mid' if x <= 4 else 'high')
   ...: df
Out[4]:
   A A_label
0  1     low
1  2     mid
2  3     mid
3  4     mid
4  5    high
5  6    high

In this example, map is used to perform conditional branching with lambda expressions and convert each element in column A to ‘low’, ‘mid’, or ‘high’. Conditional branching is done by nesting if statements. What I want to note here is that when nesting if statements, it is necessary to write the conditions in order from the top.

2. Apply and Lambda expressions

Now, for the entire DataFrame, I would like to transform according to the following conditions:

  • If A is less than or equal to 1, convert to ‘low’.
  • If A is greater than or equal to 2 and less than or equal to 4, convert to ‘mid’.
  • If A is greater than or equal to 5, convert to ‘high’.
  • If B is less than 3, convert to ‘low’.
  • If B is greater than or equal to 3 and less than 6, convert to ‘mid’.
  • If B is 6 or greater, convert to ‘high’.

In this case, you can define a function using apply as follows. First create the data.

In [7]: df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': [2, 4, 6, 1, 3, 5]})
   ...: df
Out[7]:
   A  B
0  1  2
1  2  4
2  3  6
3  4  1
4  5  3
5  6  5

Let’s try to create label data using the apply function.

In [8]: df_label = df.apply(lambda x: pd.Series(['low' if x['A'] <= 1 else 'mid' if x['A'] <= 4 else 'high',
   ...:                                    'low' if x['B'] < 3 else 'mid' if x['B'] < 6 else 'high']), axis=1)
   ...: df_label.columns = ['A_label', 'B_label']
   ...: df_label
Out[8]:
  A_label B_label
0     low     low
1     mid     mid
2     mid    high
3     mid     low
4    high     mid
5    high     mid

In this example, apply is used to perform conditional branching with lambda expressions, converting each element in columns A and B to ‘low’, ‘mid’, or ‘high’. Since apply specifies axis=1, it is applied to each row.

First, the lambda expression in apply gets the value of each column and performs conditional branching. Using pd.Series, the values after conversion are grouped into a Series for each column. Again, if statements are nested for conditional branching. Finally, we are assigning the transformed columns to a new DataFrame. Column names are set using the columns attribute.

Just in case, let’s join the base dataframe and label dataframe using the pd.concat function and check.

In [9]: pd.concat([df, df_label], axis=1)
Out[9]:
   A  B A_label B_label
0  1  2     low     low
1  2  4     mid     mid
2  3  6     mid    high
3  4  1     mid     low
4  5  3    high     mid
5  6  5    high     mid

I have done the labeling correctly.

3. Applied writing method (external function)

Personally, I mostly use this method when things seem complicated.

First, we will create the data.

In [10]: df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]})

1. External function call with map

I will explain how to define a function when using map. Define the function you pass to map to accept one argument and return one value. An example is shown below.

In [11]: def func(x):
    ...:     if x <= 1:
    ...:         return 'low'
    ...:     elif x <= 4:
    ...:         return 'mid'
    ...:     else:
    ...:         return 'high'
    ...:
    ...: df['A_label'] = df['A'].map(func)
    ...: df
Out[11]:
   A A_label
0  1     low
1  2     mid
2  3     mid
3  4     mid
4  5    high
5  6    high

This example defines a function that takes an argument x and returns ‘low’ or ‘high’ depending on the conditional branch. By passing this function to map, each element of column A is transformed.

2. External function call with apply

Next, I will explain how to define a function when using apply. Define the function you pass to apply to accept one argument and return one value. However, when using apply, each column and each row of the DataFrame is treated as a Series, so the Series is passed as an argument. An example is shown below.

In [14]: def func(x):
    ...:     if x['A'] <= 1:
    ...:         return 'low'
    ...:     elif x['A'] <= 4:
    ...:         return 'mid'
    ...:     else:
    ...:         return 'high'
    ...:
    ...: df['label'] = df.apply(func, axis=1)
    ...: df
Out[14]:
   A label
0  1   low
1  2   mid
2  3   mid
3  4   mid
4  5  high
5  6  high

In this example, the argument x is passed the values of columns A and B as a Series. To access each value, use the column name and specify x[‘column name’]. By defining a function that returns ‘low’ or ‘high’ according to the conditional branch and passing this function and axis=1 to apply, the entire DataFrame is transformed.

4. Conclusion

This time, I wrote about how to convert columns with Python – Pandas: how to use map and apply. By using pandas map and apply, you can apply a specified function to each element of a DataFrame or Series. We have seen some examples from basic usage to advanced usage.

(Visited 10 times, 1 visits today)