Digitizing the Minority Language Documents in Vietnam by Using Unicode

Using its own fonts in documents of Vietnamese ethnic minority languages is a major obstacle for digitization to develop information systems. Therefore, Vietnamese ethnic minority language documents face difficulties in displaying, storing, processing, and exchanging on the internet or between computers that do not have the same font. These difficulties have affected the digitization to develop the information system of ethnic minority areas in Vietnam. In order to overcome the above difficulties, the paper proposes a solution for encoding the Unicode character sets of ethnic minority languages in Vietnam. This solution is applied in language processing of the Ede ethnic minority in Vietnam, specifically: using Unicode font in documents and converting documents using own fonts to Unicode fonts.

Unicode, Encoding, natural language processing, minority language processing, Unicode font.