upCharacter Encoding
Information, resources and products related to international character encoding, national character sets and character conversion issues.
Entries
Tips & Techniques for Foreign Content on the Web http://tlt.its.psu.edu/suggestions/international/Pennsylvania State University's guide to reading and publishing different languages on the web. Includes details of various encoding systems and links.
3rdpageSearch http://code.cside.com/3rdpage/Front end to several search engines and portals that allows you to enter queries in various character sets.
IANA: Character Sets http://www.iana.org/assignments/character-setsThe official names for character sets that may be used in the Internet and referred to in Internet documentation - held at the Internet Assigned Number Authority.
HTML Document Representation http://www.w3.org/TR/REC-html40/charset.htmlChapter covering document character sets and encodings in HTML from the World Wide Web Consortium's HTML 4.0 Specification.
EKI Letter Database http://www.eki.ee/letter/Query character sets, encoding, codepages and Unicode information in an easy-to-use web form. Held at the Institute of the Estonian Language.
HTML Validation: Using Character Encodings http://www.htmlhelp.com/tools/validator/charset.htmlHow to validate HTML documents in various character encodings.
Characters and Encodings http://www.cs.tut.fi/~jkorpela/chars/A tutorial on character code issues in digital processing and transfer of text data, on the Internet or otherwise. Includes tables and a detailed listing of control codes. In English and Finnish.
MS Windows characters in HTML http://www.cs.tut.fi/~jkorpela/www/windows-chars.htmlA review of the HTML authoring problems caused by some special characters which belong to MS Windows character set but not to ISO Latin 1. Includes technical details and substitution tables. In English and Finnish.
LangBox International http://www.langbox.com/Codetables for ISO 8859-6, ASMO 449 plus, ASMO 708 (Arabic) and ISO 8859-8 (Hebrew) and further information about the company's work in multilingual UNIX.
ScientificPublications.com: Czyborra.com Mirror http://www.unicodecharacter.com/Mirror of Roman Czyborra's work on character sets and encoding systems. In English and German.
WhatAsciiCode.com http://www.whatasciicode.com/Quick reference and searchable ASCII code and conversion tables.
ASCII and EBCDIC Compared http://www.dynamoo.com/technical/ascii-ebcdic.htmA comparison of two of these two basic encoding systems, with tables.
ISO 639 Language Names http://xml.coverpages.org/iso639a.htmlThe standard names for use in SGML and XML, including a complete list of language name codes.
Basis Technology: Presentations and Papers http://www.basistech.com/knowledge%2Dcenter/A wide range of articles on Unicode, East Asian localization and Internationalization issues.
World Wide Web Consortium http://www.w3.org/International/O-charset.htmlCovers code tables, Unicode, HTML and XML and links to other resources and discusses internationalization and localization issues relating to character sets.
Tutorial: Shady Characters http://webreference.com/html/tutorial17/A tutorial that explains HTML character sets, character encodings and character references from Webreference.com.
Chilkat Charset Conversion Component http://www.chilkatsoft.com/ChilkatCharset.aspA character set conversion component for Unicode, Japanese, Chinese, Korean, Cyrillic, Arabic, Hebrew, Thai, Vietnamese and all Western languages.
ECMA: Character Code Structure and Extension Techniques http://www.ecma-international.org/publications/standards/ECMA-035.HTMSpecifies the structure of ECMA-35, for 8-bit codes and 7-bit codes which provide for the coding of character sets, with a detailed PDF document.
Dan's Web Tips: Characters and Fonts http://www.dantobias.com/webtips/char.htmlHints and tips about character sets and fonts in web development. Includes links to related resources.
Xceed Binary Encoding Library http://www.xceedsoft.com/products/binEncod/A library for Windows developers that allows applications to encode binary data and files into text and vice-versa.
An Early History of Character Set Standardization http://homepages.cwi.nl/~dik/english/codes/stand.htmlCovers the beginnings of the ASCII standards from ASCII-1963 onwards and information on Cyrillic, Japanese, Korean, Thai and Vietnamese encoding systems, including various localized versions of EBCDIC. With tables and links to other resources.
A Brief History of Character Codes http://tronweb.super-nova.co.jp/characcodehist.htmlA concise history of the development of character encoding in Western and East Asian languages, including ASCII, EBCDIC, Unicode and TRON.
Subcategories
Arabic Arabic script encodings, including Arabic, Persian/Farsi and Kurdish.
Chinese Simplified and Traditional Chinese character encoding systems.
CJKV CJKV stands for Chinese, Japanese, Korean, and Vietnamese and is an acronym used to describe these far-east languages and writing systems that contain more than 256 individual characters and can
therefore only be represented by more than one byte per character. CJKV is a particular term used in Globalization - this category deals with the process in general. Individual language categories exist for specific languages.
(less...) Cyrillic Used by many languages, including Russian, Ukrainian, Bulgarian, Macedonian, Serbian, Belorussian, Kurdish, Kazakh, Kyrgyz, Mongolian and Uzbek.
Greek Modern Greek and Coptic character sets. Although Greek is a well-known modern language, Coptic is a
ceremonial language still in use in the Middle East.
(less...) Hangul Hangul is the Korean alphabet, related in some ways to Chinese, but otherwise unique to Korea and similar
in structure to many Indo-European alphabet systems.
(less...) Hebrew Hebrew, Yiddish and Ladino alphabets.
Indic Bengali, Devanagari, Gujarati, Gurmukhi, Hindi, Kannada, Khmer, Lao, Malayalam, Marathi, Nepali, Oriya, Sanskrit, Sinhala, Tamil, Telugu, Tibetan and Thai characters sets
use variations of Brahmi-derived Indic characters.
(less...) Japanese Japanese uses various character encoding systems, from the traditional Kanji to the Latin-derived Romaji.
Latin Used by Afrikaans, Albanian, Aymara, Azeri, Bailnese, Basque, Breton, Catalan, Cornish, Danish, Dutch/Nederlands, English, Esperanto, Finnish, French, Gaelic, German, Icelandic, Indonesian, Irish,
Italian, Malaysian, Manx, Norwegian, Portuguese, Spanish, Swedish, Tagalog, Vietnamese, Welsh and many other languages.
(less...) Native American There are many languages native to North and South America, such as Cree, Navajo, Mayan, Aztec, Incan and Inuit (Inuktitut).
Unicode Unicode is the standard character encoding system that allows the correct display and entry of
virtually all characters of every language in the world.
(less...) ASCII ASCII is the American Standard Code for Information Interchange, and is a format for storing and communicating characters.
Vietnamese Although Vietnamese is linguistically related to other East Asian and Pacific Rim languages, it uses a
variation of the Latin alphabet for written communication.
(less...)Related categories
Esperanto Por prezenti literojn kaj aliajn signojn en komputiloj, en retpoŝto, en TTT k.s., oni uzas diversajn kodsistemojn.
Farsi زیرشاخههای این شاخه مربوط به رمزگذاری نویسهها میباشند.
Hindi अक्षर कूटबन्धन अर्थात केरेक्टर ऍङ्कोडिङ्ग से सम्बन्धित
Japanese このカテゴリでは文字コードについての話題を扱います。
RussianData Formats A set of specifications that defines the way different types of data should be stored in computer systems for use by applications or the end user. This category is mainly for technical specifications
of data formats. Although it tries to also be helpful and help you find relevant software for and examples of these data formats.
(less...) Encoding The characters in the XML document can be encoded in different formats. XML uses the Unicode (UTF-8 or UTF-16) character set by default, but other encodings can be used if they are declared in
the XML declaration at the beginning of the document.
(less...) FontsNeigbour categories
CompaniesComputer Aided Translation Machine Translation (MT) and Computer Aided Translation (CAT) and Translation Memory (TM).
ConferencesEducationEmployment Links to resources for jobs in globalization and the recruitment pages of companies involved in the industry.
FAQs, Help, and TutorialsIndustry GlossariesInternationalizationInternet This category is the umbrella category for links to resources in the areas of software globalization,
internationalization and localization for the internet.
(less...) Language SpecificLocalizationMailing ListsNews and MediaOpen Tag FormatOperating SystemsProgramming LanguagesPublications This category lists links to books, magazines and journals about globalization and related topics.
Software and ToolsTesting and QA(This section is quite beta and buggy, have patience. Thanks)