Chcę usunąć stopwords z kolumny Data w moim pliku. Odfiltrowałem linię, gdy mówi użytkownik końcowy. Ale to nie odfiltrowuje stoperów z usertext.apply(lambda x: [word for word in x if word not in stop_words])
co robię źle?Usuwanie stopwords z pliku
import pandas as pd
from stop_words import get_stop_words
df = pd.read_csv("F:/textclustering/data/cleandata.csv", encoding="iso-8859-1")
usertext = df[df.Role.str.contains("End-user",na=False)][['Data','chatid']]
stop_words = get_stop_words('dutch')
clean = usertext.apply(lambda x: [word for word in x if word not in stop_words])
print(clean)
pierwszy może y ou 1) wydrukuj 'stop_words', 2) spróbuj' clean = usertext.apply (lambda x: []) 'aby sprawdzić, czy usuwa wszystkie słowa? (tylko do testowania) –
Dane [] chatid [] dtype: obiekt ['aan', 'al', 'alles', 'als', 'altijd', 'andere', 'ben', 'bij' "daar", "dan", "dat", "de", "der", "deze", "die", "dit", "doch", "doen", "door", "dus", " een "," eens "," en "," er "," ge "," geen "," geweest "," haar "," had "," heb "," hebben "," heeft "," hem " "het", "hier", "hij", "hoe", "hun", "iemand", "iets", "ik", "in", "is", "ja", "je", " kan "," kon "," kunnen "," maar "," me "," meer "," men "," met "," mij "," mijn "," moet "," na "," naar " , "niet", "niets", "nog", "nu", "of", "om", "omdat", ...] To jest wynik zarówno – DataNewB