[Data Science] 문서의 행렬 표현 (DTM and TF-IDF)
CountVectorizer를 이용한 토큰화 import sklearn print(sklearn.__version__) from sklearn.feature_extraction.text import CountVectorizer vector = CountVectorizer() text = ['Text mining, also referred to as text data mining, similar to text analytics, is the process of deriving high-quality information from text.'] vector.fit_transform(text).toarray() array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 4,..
2022. 9. 29.