Aziz, Romila and Anwar, Muhammad Waqas and Jamal, Muhammad Hasan and Bajwa, Usama Ijaz and Kuc Castilla, Ángel Gabriel and Uc-Rios, Carlos and Bautista Thompson, Ernesto and Ashraf, Imran UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, carlos.uc@unini.edu.mx, ernesto.bautista@unini.edu.mx, UNSPECIFIED (2023) Real Word Spelling Error Detection and Correction for Urdu Language. IEEE Access. p. 1. ISSN 2169-3536
|
Text
Real_Word_Spelling_Error_Detection_and_Correction_for_Urdu_Language.pdf Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Real-word errors, spelling correction, spelling detection, spell checker |
| Subjects: | Subjects > Engineering |
| Divisions: | Europe University of Atlantic > Research > Scientific Production Fundación Universitaria Internacional de Colombia > Research > Scientific Production Ibero-american International University > Research > Scientific Production Ibero-american International University > Research > Scientific Production Universidad Internacional do Cuanza > Research > Scientific Production |
| Depositing User: | Sr Bibliotecario |
| Date Deposited: | 14 Sep 2023 09:41 |
| Last Modified: | 14 Sep 2023 09:41 |
| URI: | http://repositorio.funiber.org/id/eprint/8800 |
Actions (login required)
![]() |
View Item |


