Pandas DataFrames: Reversing ‘one-hot’ encoding

I want to go from this data frame which is essentially one hot encoded .

 In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})

Out[2]:
fox  monkey  rabbit
0    0       0       1
1    0       1       0
2    1       0       0
3    0       0       0
4    0       0       0

    In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})
Out[3]:
animal
0  monkey
1  rabbit
2     fox


To this one it is encoded ‘ reverse’ one-hot.


df.idxmax(axis=1)


This chooses a column label for each row. It chooses the label with the maximum value.Since the data are 1s and 0s, it will pick .

In [40]: s = pd.Series(['dog', 'cat', 'dog', 'bird', 'fox', 'dog'])

In [41]: s
Out[41]:
0     dog
1     cat
2     dog
3    bird
4     fox
5     dog
dtype: object

In [42]: pd.get_dummies(s)
Out[42]:
bird  cat  dog  fox
0   0.0  0.0  1.0  0.0
1   0.0  1.0  0.0  0.0
2   0.0  0.0  1.0  0.0
3   1.0  0.0  0.0  0.0
4   0.0  0.0  0.0  1.0
5   0.0  0.0  1.0  0.0

In [43]: pd.get_dummies(s).idxmax(1)
Out[43]:
0     dog
1     cat
2     dog
3    bird
4     fox
5     dog
dtype: object