python – How to apply a function to two columns of Pandas dataframe
python – How to apply a function to two columns of Pandas dataframe
Heres an example using apply
on the dataframe, which I am calling with axis = 1
.
Note the difference is that instead of trying to pass two values to the function f
, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.
In [49]: df
Out[49]:
0 1
0 1.000000 0.000000
1 -0.494375 0.570994
2 1.000000 0.000000
3 1.876360 -0.229738
4 1.000000 0.000000
In [50]: def f(x):
....: return x[0] + x[1]
....:
In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]:
0 1.000000
1 0.076619
2 1.000000
3 1.646622
4 1.000000
Depending on your use case, it is sometimes helpful to create a pandas group
object, and then use apply
on the group.
There is a clean, one-line way of doing this in Pandas:
df[col_3] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
This allows f
to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.
Example with data (based on original question):
import pandas as pd
df = pd.DataFrame({ID:[1, 2, 3], col_1: [0, 2, 3], col_2:[1, 4, 5]})
mylist = [a, b, c, d, e, f]
def get_sublist(sta,end):
return mylist[sta:end+1]
df[col_3] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
Output of print(df)
:
ID col_1 col_2 col_3
0 1 0 1 [a, b]
1 2 2 4 [c, d, e]
2 3 3 5 [d, e, f]
If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:
df[col_3] = df.apply(lambda x: f(x[col 1], x[col 2]), axis=1)
python – How to apply a function to two columns of Pandas dataframe
A simple solution is:
df[col_3] = df[[col_1,col_2]].apply(lambda x: f(*x), axis=1)