Note: This site might seem inactive… That’s because it is. Don’t worry though, I’m still coding webpages and stuff! If you’re interested, I suggest you get a translator and head over to Qiwi; or you could just check the latest site we’ve been working on: Apotheek Goethals – Debrabandere. Enjoy!
Texturizing and such
I’ve always refused to follow the US English typography style on this site.
That means that periods and commas go inside the end quotation mark — even when it doesn’t make sense to do so.
I want my site to make sense. So I started looking at WP’s wptexturize() function (© Matt). As you can read here, I’ve been thinking about doing that for a while now.
Like I said, the problem with wptexturize() is that it fails on occasions where (ending) quote signs are directly followed by a character different from a space. Here’s an example.
My favourite tracks are "40 ft", "Cheating on You", "Michael", and, of course, "Take Me Out".
wptexturized, this would become:
My favourite tracks are “40 ft", “Cheating on You", “Michael", and, of course, “Take Me Out".
After modifying the function, it outputs:
My favourite tracks are “40 ft”, “Cheating on You”, “Michael”, and, of course, “Take Me Out”.
Which is typographically (more) correct as far as I know.
Actually, I didn’t do that much to accomplish this… Just follow these instructions if you want to do this too.
- Find this line (if you’re using WordPress, the
wptexturize()function is declared inwp-includes/functions-formatting.php):$curl = preg_replace('/(d+)x(d+)/', "$1×$2", $curl); - Add the following code right under it:
$curl = str_replace('"', '”', $curl);
Here’s how the function should look after applying the modification.
<?php
function wptexturize($text) {
$output = '';
// Capture tags and everything inside them
$textarr = preg_split("/(<.*>)/Us", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$stop = count($textarr); $next = true; // loop stuff
for ($i = 0; $i < $stop; $i++) {
$curl = $textarr[$i];
if (isset($curl{0}) && '<' != $curl{0} && $next) { // If it's not a tag
$curl = str_replace('---', '—', $curl);
$curl = str_replace('--', '–', $curl);
$curl = str_replace("...", '…', $curl);
$curl = str_replace('``', '“', $curl);
// This is a hack, look at this more later. It works pretty well though.
$cockney = array("'tain't", "'twere", "'twas", "'tis", "'twill", "'til", "'bout", "'nuff", "'round");
$cockneyreplace = array("’tain’t", "’twere", "’twas", "’tis", "’twill", "’til", "’bout", "’nuff", "’round");
$curl = str_replace($cockney, $cockneyreplace, $curl);
$curl = preg_replace("/'s/", '’s', $curl);
$curl = preg_replace("/'(\d\d(?:’|')?s)/", "’$1", $curl);
$curl = preg_replace('/(\s|\A|")\'/', '$1‘', $curl);
$curl = preg_replace('/(\d+)"/', '$1″', $curl);
$curl = preg_replace("/(\d+)'/", '$1′', $curl);
$curl = preg_replace("/(\S)'([^'\s])/", "$1’$2", $curl);
$curl = preg_replace('/(\s|\A)"(?!\s)/', '$1“$2', $curl);
$curl = preg_replace('/"(\s|\Z)/', '”$1', $curl);
$curl = preg_replace("/'([\s.]|\Z)/", '’$1', $curl);
$curl = preg_replace("/\(tm\)/i", '™', $curl);
$curl = preg_replace("/\(c\)/i", '©', $curl);
$curl = preg_replace("/\(r\)/i", '®', $curl);
$curl = str_replace("''", '”', $curl);
$curl = preg_replace('/(d+)x(\d+)/', "$1×$2", $curl);
$curl = str_replace('"', '”', $curl);
} elseif (strstr($curl, '<code') || strstr($curl, '<pre') || strstr($curl, '<kbd' || strstr($curl, '<style') || strstr($curl, '<script'))) {
// strstr is fast
$next = false;
} else {
$next = true;
}
$output .= $curl;
}
return $output;
}
?>
Comments (16)
Listed below are the responses for this entry.
Trackbacks & Pingbacks (1)
Listed below are resources on the web that mention this article.
-
- Roblog: Texturize Fix: Rob Miller’s Blog:
[…] June 2004 Texturize Fix Filed under: Random Stuff — Rob @ 11:38 am Mathibus offers a fix for the quotation “bug” in Texturize. […]
- Pingback made on June 27th, 2004 @ 10:27 am
Ok, a couple of things… I started twitching when I saw the commas on the wrong side of the quotes. :eyes:
Next, you’re missing a few commas:
In a list, you always separate each item with a comma. (Unless you’re using a specified writing standard that allows for the missing comma, but most modern English writing styles require the final comma.)
“Of course” is a dependent clause and needs the separating commas.
Thanks for pointing that out, I edited the post.
The commas aren’t really on the wrong side of the quotes in my opinion (is this an opinion thing? :lol:) — as I mentioned, I just don’t follow the US English typography style (yet?). It really doesn’t make sense to make these commas go inside the end quotation mark, at least not to me. That song isn’t entitled “Cheating on You comma”, right?
I think Mathias is right.
Luke, despite the fact that I swallow the U.S. typography style in doing the paper (eheh), I totally agree with Mathias. Those commas go outside the quotations marks, dude. It just makes sense semantically—the comma isn’t a part of what’s being quoted, so it goes outside, unless it’s dialogue.
This is what I get for letting my editors (Lissa) know about my blog… They follow me around and make comments about my editing remarks. :P
This is one of those times when I think it is an opinion thing. I just don’t like the way it looks. The flow of the list is preserved when the punctuation is on the inside of the quotes. It’s a typographical thing; looks overrule semantics.
Thanks, I’d noticed this problem but never got round to fixing it myself, you saved me a job :)
Luke: not even that, proper (at least British) semantics dictate that that the punctuation should go outside of the quotation marks unless it is contained in what’s being quoted.
Right, but we all know the British can’t speak English worth a hoot.
I mean a really. How modern can you be if you call a flashlight a torch? ;)
luke, you deserve a verbal beating for that.
any british words the americans have, you’ll find they have fewer syllables, as that makes then easier to pronounce. you should be able to work out what i’m getting at here.
Actually, I seem to be missing your point. Are you saying the British are more lazy than the Americans thus they can’t handle saying flashlight instead of torch? Or that the English language didn’t originate in Britain? My joke was based on the fact that English is from there, but our languages are very different. Lighten up a bit…you’ll have more fun because in the game of life none of us make it out alive. :ta:
Leaving the quotes and comma location aside, my version of WP (1.3-pre-alpha) already has the correct quotes (inclined to the left and to the right, that is). This — problem you are describing — might only apply to WP 1.2.
My two cents. Keep the change! :)
Actually, Mathias, you were right and I was wrong. I have done the modifications you recommended and, fair enough, the quotes are fixed. Thanks!
P.S.: I have approved your comment over .US :)
You got me there for a second 8-) I started thinking I reinvented the weel.
Actually, I hope this does get implemented in WP, as I think it’s part of localization: not every country’s typography style is US English–like.
Yep that’s a bug that was introduced with the last version of WP. Sorry! The good news is I’ll fix it before 1.3. The bad news is your fix breaks other cases (curling quotes where they shouldn’t be curled).
BTW, I like what you’ve done with the comment form around here, especially the highlighting of the new comment.
I don’t fully understand where quotes should not be curled… :s
Thanks for the comment form compliments :) In case you’re interested in redesigning it the way I did, you might want to read this post.
This has now been fixed in CVS.