How Do Character Sets Work?

 

A character set is a collection of symbols that represent the printable characters of a language. There are many standard character sets in use today. The correct character set choice for a message will depend on its language and on which character set is likely to be supported by the mail clients that your message recipients will use. Some languages are supported by multiple character sets and some character sets might be used for multiple languages. Additionally, your operating system, web browser and/or text editor will help determine which character set would be used to compose a message. Also, you should be sure that your recipients are likely to have that character set installed on their systems and that their email client is capable of using alternate character sets. Note that some web-based email clients may not support international character sets.

 

Extended characters are characters beyond the standard 7-bit ASCII characters. These are the characters that usually differ between character sets. The 7-bit ASCII characters are part of most character sets used for email messages and web pages.

 

Encoding is used to convert an 8-bit message into a 7-bit message. Most English language messages only use 7-bit characters. 8-bit characters are used for special symbols and additional characters for other languages. Since it is possible that someone may be using an older mail server that cannot handle 8-bit messages properly, messages that include 8-bit characters are often encoded to 7-bits using either quoted-printable or base64 encoding methods. This is not a requirement though.

 

The quoted-printable encoding is used when the majority of the characters in a message can be represented by 7-bits. In this case, only the 8-bit and certain 7-bit characters are encoded making the encoded message body mostly readable. If the message contains a significant number of 8-bit characters, base64 encoding might be more space efficient.

 

Note that mail-merge tags will not work properly with message bodies that are encoded to quoted-printable or base64. This is because the merge tags will also be encoded which will cause them to be corrupted. Some merge tags may appear to work properly with quoted-printable encoded messages because the tags didn't have any characters that required encoding. However, the data that is returned will be interpreted as quoted printable which could cause the results to be corrupted. Because of this, the default encoding for messages is 8-bit which means that the characters are not changed. Mail servers that cannot handle 8-bit messages should be extremely rare. It is highly unlikely that anyone who is accustomed to receiving messages in alternate character sets would be using a mail server that didn't support 8-bit messages.