Dr. Iskander Akhmetov presented a paper on machine learning methods for Kazakh morphology at the 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering.
Abstract:
Kazakh is an agglutinative language, where the sequential attachment of morphemes forms words, each bearing specific grammatical information. The complexity of this morphological structure presents significant challenges for computational linguistics, particularly in natural language processing tasks such as part-of-speech tagging, syntactic parsing, and machine translation. This paper provides a comprehensive review of the most widely used machine learning approaches applied to Kazakh morphology. It compares classical machine learning techniques, such as hidden Markov models and conditional random fields, along with rule-based methods, and recent advances in deep learning, including recurrent neural networks, long short-term memory networks, and transformer-based architectures like BERT. The paper also explores the applicability of data-driven approaches and reviews existing research on Kazakh morphology, offering a comparative analysis based on efficiency, scalability, and accuracy. Finally, future directions are discussed, focusing on expanding annotated datasets and developing models tailored for low-resource languages like Kazakh to improve their representation in natural language processing tasks.