When Emacs reads a file, it determines the encoding, reads the file, decodes it into an internal representation, and stores the coding-system used in a variable to be used when saving the file. When saving, the buffer is encoded using the stored coding-system and written to the file again.

You can change the encoding to use for the file when saving using ‘C-x C-m f’ . You can also force this immediately by using ‘C-x C-m c <encoding> RET C-x C-w RET’ . You can list all available encodings with ‘M-x list-coding-systems’ .

You can force Emacs to read a file in a specific encoding with ‘C-x RET c C-x C-f’ . If you opened a file and EMACS determined the encoding incorrectly, you can use ‘M-x revert-buffer-with-coding-system’ , to reload the file with a named encoding.

For characters covered by ISO 8859, you can interconvert most encodings in Emacs 21.3, courtesy of the code in ucs-tables.el. – fx

Maybe some more explanation is needed, here. In an Emacs running in a Latin-1 locale, create a buffer containing the letter ‘ö’ . Save. The modeline indicates Latin-1 via the ‘1’ . Now save using ‘C-x C-m c latin-2 RET C-x C-w RET’ . The modeline indicates Latin-2 via the ‘2’ . Kill the buffer, reopen it. It displays correctly, but the modeline indicates Latin-1 again. When and why did Emacs do the change from Latin-2 back to Latin-1? Does Locale take precedence over ‘C-x C-m c’ ?

If, in a Latin-1 environment, you visit a non-ASCII file that doesn’t contain bytes in the range #x80 to #x9f, it is decoded as Latin-1 unless its encoding is specified explicitly somehow. The character `ö’ has the same code point in Latin-1 and Latin-2, which is why it ‘displays correctly’ . See M-x list-charset-chars and C-u C-x = .

Partial Recoding

Sometimes you need to recode parts of a buffer. Here is an example: You are using Gnus to read mail, and somebody sends you a Word document. You use the AntiWord trick to automatically insert the output of antiword into your buffer. Normally, a Gnus “Article” buffer has the coding system undecided . The antiword output might be inserted using the wrong coding system. On my system, I might end up with something like this:

Mit freundschaftlichen GrÃ¼ssen und den besten WÃ¼nschen fÃ¼r 2004, Aikido Dojo ZÃ¼rich

But what I want is this:

Mit freundschaftlichen Grüssen und den besten Wünschen für 2004, Aikido Dojo Zürich

It seems that the process output was decoded as Latin-1 instead of UTF-8. I want to recode it! To that effect, use M-x recode-region . The command recode-region is part of MULE as of Emacs 22.1; here is a surrogate for older Emacsen:

( defun recode-region (start end &optional coding-system) "Replace the region with a recoded text." (interactive "r

\z Coding System (utf-8): " ) (setq coding-system (or coding-system 'utf-8)) ( let ((buffer-read-only nil) (text (buffer-substring start end))) (delete-region start end) (insert (decode-coding-string (string-make-unibyte text) coding-system))))

Now I can mark the attachment in the buffer and use M-x recode-region to recode it as UTF-8. The important part is that I need to convert the old text into “unibyte” representation. Without it, I will get the bytes used for the emacs-mule coding-system encoded as UTF-8.

Forcing windows-1252 coding

Symptom: some files that used to be opened with the right coding under Emacs 21 are now opened with raw coding under Emacs 23. This is especially true with some files that had french accents that are now shown with codes such as \340 for “acute a”.

Root cause: unknown.

proposed “Solutions” seen for this problem: this does not work in my case: (prefer-coding-system ‘windows-1252)

Since Emacs is not able to guess the coding for these types of files, here are 3 ways to address the problem.

1) On a file by file basis: reopen the file by forcing the coding with this utility function:

( defun has-revisit-file-with-coding-windows- 1252 () "Re-opens currently visited file with the windows-1252 coding. (By: hassansrc at gmail dot com) Example: the currently opened file has french accents showing as codes such as: french: t \3 42ches et activit \3 40s ( \3 40 is shown as a unique char) then execute this function: has-revisit-file-with-coding-windows-1252 consequence: the file is reopened with the windows-1252 coding with no other action on the part of the user. Hopefully, the accents are now shown properly. Otherwise, find another coding... " (interactive) ( let ((coding-system-for-read 'windows-1252) (coding-system-for-write 'windows-1252) (coding-system-require-warning t) (current-prefix-arg nil)) (message "has: Reopened file with coding set to windows-1252" ) (find-alternate-file buffer-file-name) ) )

Other ways to deal with accents that appear as codes (ex:\340 for acute e) when visiting files:

2)Intrusive way: put this at the beginning of the specific file that shows the problem :

3) General solution: apply this recipe to all *.txt files (put it in your .emacs file):

(modify-coding-system-alist 'file " \\ .txt \\ '" 'windows-1252)

These 3 solutions worked well under Emacs23 on Windows 7.

HassanSrc

CategoryInternationalization