[Pandas] groupby

728x90

SMALL

pandas.DataFrame.groupby

매퍼를 사용하거나 일련의 열로 DataFrame을 그룹화한다. 그룹화 작업에는 개체 분할, 함수 적용 및 결과 결합의 일부 조합이 포함된다. 이는 이러한 그룹에서 많은 양의 데이터 및 계산 작업을 그룹화하는 데 사용할 수 있다.

import pandas as pd

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})
df

   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0

df.groupby(['Animal']).mean()

        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

level 매개 변수를 사용하여 계층 인덱스의 다른 수준별로 그룹화할 수 있다.

arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
          ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
                  index=index)
df

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0

df.groupby(level=0).mean()

        Max Speed
Animal
Falcon      370.0
Parrot       25.0

df.groupby(level="Type").mean()

         Max Speed
Type
Captive      210.0
Wild         185.0

dropna 파라미터를 설정하여 그룹 키에 NA를 포함할지 여부를 선택할 수도 있다. 기본 설정은 True이다.

l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
df = pd.DataFrame(l, columns=["a", "b", "c"])
df.groupby(by=["b"]).sum()

df.groupby(by=["b"], dropna=False).sum()

l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
df = pd.DataFrame(l, columns=["a", "b", "c"])
df.groupby(by="a").sum()

    b     c
a
a   13.0   13.0
b   12.3  123.0

df.groupby(by="a", dropna=False).sum()

    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0

apply()를 사용하는 경우 group_key를 사용하여 그룹 키를 포함하거나 제외한다. group_keys 인수는 기본적으로 True (포함)이다.

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})
df.groupby("Animal", group_keys=True).apply(lambda x: x)

          Animal  Max Speed
Animal
Falcon 0  Falcon      380.0
       1  Falcon      370.0
Parrot 2  Parrot       24.0
       3  Parrot       26.0

df.groupby("Animal", group_keys=False).apply(lambda x: x)

   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0

example

데이터를 중복없이 그룹화하는 방법은 index를 쓰는 것이 있다.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})

result = df.groupby(df.index // 2).mean()

print(result)

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

pandas.DataFrame.groupby — pandas 1.5.3 documentation

When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included ot

pandas.pydata.org

728x90

LIST

저작자표시 비영리 변경금지

'Python Library > Pandas' 카테고리의 다른 글

[Pandas] rolling (0)	2023.03.30
[Pandas] 데이터프레임 만들기 (0)	2022.10.26
[Pandas] 타이타닉 생존자 분석 (0)	2022.10.25
[Pandas] Iris (붓꽃) (0)	2022.10.25
[Pandas] 시각화 (0)	2022.10.23

GOATLAB

[Pandas] groupby

pandas.DataFrame.groupby

Hierarchical Indexes

example

'Python Library > Pandas' 카테고리의 다른 글

티스토리툴바

[Pandas] groupby

pandas.DataFrame.groupby

Hierarchical Indexes

example

'Python Library > Pandas' 카테고리의 다른 글

관련글

티스토리툴바