pandas reset_index po groupby.value_counts()

Próbuję grupować kolumna i oblicza wartość liczy na inną kolumnę.pandas reset_index po groupby.value_counts()

import pandas as pd 
dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2], 
       'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]}) 

print(dftest)

dftest wygląda

wykonywać grupowanie

grouper = dftest.groupby('A') 
df_grouped = grouper['Amt'].value_counts()

co daje

A Amt 
1 30  4 
    20  3 
    40  2 
2 40  3 
    10  2 
Name: Amt, dtype: int64

czego chcę jest, aby zachować najwyższe dwa rzędy z każdej grupy

Również byłem zakłopotany błędem gdy próbowałem reset_index

df_grouped.reset_index()

co daje następujący błąd

df_grouped.reset_index() ValueError: cannot insert Amt, already exists

Źródło

2016-09-29 muon

co potrzeba parametr name w reset_index, ponieważ Series nazwa jest taka sama jak nazwa jednego poziomów: MultiIndex:

df_grouped.reset_index(name='count')

Innym rozwiązaniem jest renameSeries nazwa:

print (df_grouped.rename('count').reset_index()) 

    A Amt count 
0 1 30  4 
1 1 20  3 
2 1 40  2 
3 2 40  3 
4 2 10  2

Bardziej powszechne rozwiązanie zamiast value_counts jest agregat size:

df_grouped1 = dftest.groupby(['A','Amt']).size().rename('count').reset_index() 

print (df_grouped1) 
    A Amt count 
0 1 20  3 
1 1 30  4 
2 1 40  2 
3 2 10  2 
4 2 40  3

Źródło

2016-09-29 19:42:16 jezrael

idealny !! rozwiązuje problem z resetowaniem indeksu ... czy istnieje lepszy sposób na utrzymanie górnych n wierszy według grupy, liczenie ... teraz po wypróbowaniu kilku rzeczy, jedyny możliwy sposób, jaki mogę wymyślić to pierwszy groupby.value_counts, następnie podzbiór – muon

Może potrzebujesz ['nlargest'] (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.nlargest.html) -' dftest.groupby (['A', 'Amt']) .size(). nlargest (3) ' – jezrael

, który nie robi tego grupowo, daje tylko ogólną nlargest – muon

pandas reset_index po groupby.value_counts()

Odpowiedz

Powiązane problemy