alt.hn

3/16/2026 at 4:09:50 AM

ASCII and Unicode quotation marks (2007)

https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

by exvi

3/16/2026 at 8:21:49 AM

In the '90s, MS-Windows by default used its own 8-bit character set CP-1252 that was a superset of ISO 8859-1 with a few additional characters, including left and right single and double quotation marks in 8859-1's unused code positions.

Microsoft Word used to "auto-correct" the ASCII codes 0x22 and 0x27 to those.

Also, ISO 8859-1 was the default character set for the web, and MS Word was in common use for making simple web pages... but without narrowing from CP-1252 to ISO 8859-1.

This had the effect that when you browsed one of those pages in a browser on another operating system, the quotation marks rendered as empty boxes ( = illegal character).

by Findecanor

3/16/2026 at 7:31:53 AM

I was wondering why `this hideous quotation style' is used in so many places. Good historical window.

by xigoi

3/16/2026 at 8:24:06 AM

The explanation seems to be that it looked good in some old fonts. But I think it was always some kind of abuse. On old Typewriters the accents were usually used for accents (é è). They didn't move the cursor, so using them for apostrophes wasn't that comfortable and interrupted writing flow. Accent + space looks a bit like a quotation mark, but the right place of an accent is usually on top of a letter.

by adornKey

3/16/2026 at 11:18:09 AM

Special quotation marks sometimes end up in filenames (usually when I have saved a web page) and I hate trying to tab-complete or write globs for those. Of course it is equally annoying with other random unicode characters not present on my keyboard, but it mostly happens with quotation marks. (Yes, the solution is to copy-paste those characters in the terminal, but that is the annoying part, having to do that instead of just typing the next character).

by 1313ed01

3/16/2026 at 11:30:36 AM

This problem was solved by Plan 9 (roughly 1990) where there was a compose key to turn sequences into Unicode characters. Say compose-f-a to get ∀. This was all configurable in /lib/keyboard.

On so-called modern X11 or Wayland based systems (Linux or *BSD), there is a similar feature called XCompose. Worse syntax, but still functional.

by black_knight

3/16/2026 at 12:09:01 PM

Being able to configure your system to type the characters really doesn't solve the problem. In particular, if you get data (including metadata such as filenames) from someone else, you need to recognize the characters, both to do the configuration and then actually type them. And characters are not glyphs. There are all kinds of cases where simply looking at something doesn't and can't tell you what characters are in it.

by zahlman

3/16/2026 at 3:11:51 PM

And just to point out that there is WinCompose, for Windows, and a somewhat janky but usable solution using Karabiner Elements and macos-compose for Mac.

by vincent-manis

3/16/2026 at 8:49:15 AM

Some language settings (on windows) will auto replace the '' and "" set for ʻʼ and “” as that is the correct spelling in the set language. There is also the lower quotes that can be used but it seems usually a normal comma and double comma is used as codepoint (U+002C, U+201E) ,’ „”.

This really messed me up when I started programming since those quotes will not work when writing in a language that expects a set of the same character but they may use the same glyph. This is one of the many reasons I have my systems set to English.

I agree that for a normal writing environment it may be advantageous to have it auto replace since it is also just easier to hit the same key twice and have it auto open/close.

by trashb

3/16/2026 at 9:15:54 AM

I have never encountered that behaviour outside of Microsoft Word and its alternatives, I've always had this happen application-side. Is this an IME thing? Or a non-Unicode-compatible code page? Because I don't think there's any other Windows-side automatic replacement of that type.

Many blog engines online will also try to be helpful and replace quotes with smart quotes, which makes copy-pasting source code from tutorials quite a pain.

by jeroenhd

3/16/2026 at 10:21:39 AM

From what I remember (it was a while back) it was both in notepad, notepad++ and geany. I was using, probably win xp or 7 at the time. I remember the only way I could fix it at the time was to change the language and keyboard settings to international English.

I can't seem to replicate the behavior now on win 11 even with the same language set and keyboard layout (system language set to English), so perhaps I'm misremembering? It seems that this keyboard layout does enable typing ¨ U+00A8 so perhaps I am confused with that and that some editors (word etc) do the opening/closing replacement.

IME is not used, though interestingly in Asian languages the do used even different quotes, example Japanese:「x」and『x』

by trashb

3/16/2026 at 8:09:11 AM

But how can I enter ’ (U+2019) on macOS (US layout) without going through some magic incantation? It’s impossible!

by fainpul

3/16/2026 at 9:12:54 AM

Option + ] will produce ‘, Option+Shift+] will produce ’. Similarly, “” can be produced with Option(+Shift)+[. Alternatively, Option+Shift+e will produce ´.

by jeroenhd

3/16/2026 at 10:21:55 AM

Ugh, that is such a bad arrangement of the four combos. It should obviously have been [ for left and ] for right (just as [ and ] are a pair), and Shift to turn single into double (just as Shift turns ' into ").

by chrismorgan

3/16/2026 at 12:40:00 PM

I’ve looked at the ASCII tables to try to figure out ¿what were they thinking? and suspect it has to do with option \ and | being « and » (and euro key for ` and ´ being one key’s base and shift).

See the ASCII table in the article, for example. I've considered the thing done wrong is the ] and { should've swapped. But then < and > and ( and ) beg me to differ.

by Terretta