python – Calculate weighted average using a pandas/dataframe

python – Calculate weighted average using a pandas/dataframe

Lets first create the example pandas dataframe:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: index = pd.Index([01/01/2012,01/01/2012,01/01/2012,01/02/2012,01/02/2012], name=Date)

In [4]: df = pd.DataFrame({ID:[100,101,102,201,202],wt:[.5,.75,1,.5,1],value:[60,80,100,100,80]},index=index)

Then, the average of wt weighted by value and grouped by the index is obtained as:

In [5]: df.groupby(df.index).apply(lambda x: np.average(x.wt, weights=x.value))
Out[5]: 
Date
01/01/2012    0.791667
01/02/2012    0.722222
dtype: float64

Alternatively, one can also define a function:

In [5]: def grouped_weighted_avg(values, weights, by):
   ...:     return (values * weights).groupby(by).sum() / weights.groupby(by).sum()

In [6]: grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index)
Out[6]: 
Date
01/01/2012    0.791667
01/02/2012    0.722222
dtype: float64

I think I would do this with two groupbys.

First to calculate the weighted average:

In [11]: g = df.groupby(Date)

In [12]: df.value / g.value.transform(sum) * df.wt
Out[12]:
0    0.125000
1    0.250000
2    0.416667
3    0.277778
4    0.444444
dtype: float64

If you set this as a column, you can groupby over it:

In [13]: df[wa] = df.value / g.value.transform(sum) * df.wt

Now the sum of this column is the desired:

In [14]: g.wa.sum()
Out[14]:
Date
01/01/2012    0.791667
01/02/2012    0.722222
Name: wa, dtype: float64

or potentially:

In [15]: g.wa.transform(sum)
Out[15]:
0    0.791667
1    0.791667
2    0.791667
3    0.722222
4    0.722222
Name: wa, dtype: float64

python – Calculate weighted average using a pandas/dataframe

I feel the following is an elegant solution to this problem from:(Pandas DataFrame aggregate function using multiple columns)

grouped = df.groupby(Date)

def wavg(group):
    d = group[value]
    w = group[wt]
    return (d * w).sum() / w.sum()

grouped.apply(wavg)

Leave a Reply

Your email address will not be published.