РОЗВ’ЯЗАННЯ ЗАДАЧІ КЛАСИФІКАЦІЇ ТЕКСТІВ МЕТОДАМИ ОБРОБКИ ПРИРОДНЬОЇ МОВИ ТА МАШИННОГО НАВЧАННЯ

М. О. Сперкач; Д. Ю. Юзьвак

РОЗВ’ЯЗАННЯ ЗАДАЧІ КЛАСИФІКАЦІЇ ТЕКСТІВ МЕТОДАМИ ОБРОБКИ ПРИРОДНЬОЇ МОВИ ТА МАШИННОГО НАВЧАННЯ

М. О. Сперкач, Д. Ю. Юзьвак

Анотація

У статті розглядається практичне застосування методів природньої обробки мови та машинного навчання для вирішення задачі класифікації текстів. Описано процес діяльності, що автоматизується в рамках розроблення системи класифікації текстів, сформульовано постановку задачі та описано методи її вирішення. Зроблено висновки щодо застосування алгоритмів машинного навчання для вирішення поставленої задачі. Описано результати щодо ефективності використання моделей машинного навчання на основі різних алгоритмів. Встановлено, що поєднання методів обробки природньої мови та машинного навчання є ефективним способом вирішення поставленої задачі.

Ключові слова: машинне навчання, обробка текстів, обробка природньої мови, модель, класифікація текстів

Sperkach M., PhD of Technical Sciences, Associate Professor; Yuzvak D. Solving the text classification problem using the natural language processing and machine learning methods / National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

This article deals with the practical methods of applying the methods of natural language processing and machine learning to solve the problem of text classification. The activity process that is automated is described during the developing of the system for text classification. The main problem was formulated. Conclusions are made on the application of machine learning algorithms for solving the problem. The results of using different algorithms for the creation of machine learning models were discussed. It was concluded that the combination of the natural language processing and machine learning methods is the effective way for solving the text classification problem.

Key words: machine learning, text processing, natural language processing, model, text classification.

Повний текст:

PDF

Посилання

Словник української мови: в 11 томах. — Том 9, 1978.

Daniel Jurafsky & James H. Martin. Copyright (2015) «Speech and Language Processing».

J Korean Acad Nurs Vol.43 No.2, 154-164 «An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain»

Manabu Sassano Fujitsu Laboratories, Japan «Virtual Examples for Text Classification with Support Vector Machines»

Barry de Ville, «Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner», SAS Institute Inc., Cary, NC, USA, 2006.

Kevin P. Murphy, «Naïve Bayes classifier», Department of Computer Science, University of British Columbia, 2006.

Sofia Visa Computer Science Department College of Wooster Wooster, OH, USA «Confusion Matrix-based Feature Selection».

Посилання

Поки немає зовнішніх посилань.

Цей твір ліцензовано за ліцензією Creative Commons Із зазначенням авторства 4.0 Міжнародна.

Ім'я користувача
Пароль
Запам'ятати мене