![]() The dates of the documents ranges from Parliament 1 (1959) to the present's Parliament 14 (2019). MHC contains documents from the House of Representative (Dewan Rakyat) in Malaysian Parliament. Malaysian Hansard Corpus (MHC) was initially created to provide a comprehensive and digitally accessible parliamentary documents written in the Malay and English language, compiled from Malaysian Parliamentary Reports (Malaysian Hansard). Malaysia, parliamentary corpus has not been developed and Malaysian Hansard portal could not be comprehensively used to analyse linguistics patterns, semantic shift or discourse analysis. The availability of parliamentary corpora in specific languages facilitates towards the analysis of that particular language. ![]() The applications of parliamentary corpora enable researchers to analyse data from different perspectives including linguistics, political sciences, computational linguistics or history. Parliamentary corpora are pertinent language resources of various subject matters. ![]() Values affect and are affected by political developments. The article argues that the press plays a significant political role as a site where elite values change or are reproduced through discussion, deliberation, or silence. This article presents a systematic content analysis of three religious-conserva-tive and two pro-secular newspapers in 1996–2004 in Turkey, and discusses some findings and their implications regarding elite values and democratization: considerable internal pluralism within both religious-conservative and pro-secular elites general consensus on democracy but not on democratic norms' application to specific issues and groups other than one's own a division of values on religion, secularism, and social pluralism Political value change in favor of liberal democracy but social conservatism among religious-conservative elites fragmentation and relative cynicism, but not necessarily authoritarianism, among pro-secular elites weak ideational change on the Kurdish issue. The study proposes that a step of corpus building can be made easier and manageable when a researcher understands the way an OCR tool works in order to choose the best OCR tool prior to the outset of the corpus development. The study indicates that each tool possesses a variety of accuracy and error rates to convert the whole documents from PDF into txt or plain text files. The objective of this study is to give an overview based on accuracy and error rate of how each OCR tools essentially works and how it can be utilized to provide assistance towards corpus building. In this study, all of the tools are manipulated to convert Adobe Portable Document Format (PDF) files into Plain Text File (txt). By comparing four OCR tools, the study has converted ten reports of Parliamentary Reports which contains a number of 62 pages to see the conversion accuracy and error rate of each conversion tool. This study intends to investigate the performances of OCR tools in converting the Parliamentary Reports of Hansard Malaysia for developing the Malaysian Hansard Corpus (MHC). An OCR tool is able to enhance the accuracy of the results which as well relies on pre-processing and subdivision of algorithms. There are a various numbers of OCR tools in the market for commercial and research use, which are obtainable for free or restrained with purchases. It runs by converting images or texts that are scanned beforehand into machine-readable and editable texts. ![]() Optical Character Recognition (OCR) is a tool in computational technology that allows a recognition of printed characters by manipulating photoelectric devices and computer software.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |