Help - Search - Members - Calendar
Full Version: What's Up Doc?
America's Debate > Forum Information > Comments and Suggestions
Google
BoF
QUOTE
No, I will answer that. Obviously, you havent seen the show (which is called Lil Bush, BTW), or youd know lil Hillary (and even lil' Bill) is there with the rest of the political gang. laugh.gif


More and more I'm seeing the type characters highlighted. This is annoying. Why is this happening? How can it be corrected?

Here's a link to the post I took this from, but I've seen it elsewhere.

http://www.americasdebate.com/forums/index...st&p=219825
Google
Mike
Hm.

I'm not sure why this is happening. ermm.gif

I'll take a look and see if I can track it down, although I probably won't be able to get to it until the weekend.

Thanks BoF!

smile.gif

Mike
Doclotus
Whew! For a second there I thought this thread was about me blush.gif mrsparkle.gif
logophage
It's probably a unicode/utf-8 encoding/decoding problem
Mike
QUOTE(logophage @ Jul 2 2007, 08:59 PM) *
It's probably a unicode/utf-8 encoding/decoding problem

Yeah, I figured it was along those lines. I'm thinking it's being caused by the RTE/WYSIWYG editor.

They've released 2.3.1 (we're running 2.2.2) and it fixed-- believe it or not-- 192 bugs that are present in the version we're running. ohmy.gif

I'll get it upgraded, eventually... heh!

smile.gif

Mike
Amlord
What exactly is the error? The reason I ask is because I have very little knowledge of coding, but I do have a crappy little website for my daughter's dance school. I always get weird coding errors where odd symbols (mostly a goofy looking capital A) show up. It isn't related to formatting as far as I can tell.
BaphometsAdvocate
QUOTE(BoF @ Jul 2 2007, 05:41 PM) *
QUOTE
No, I will answer that. Obviously, you havent seen the show (which is called Lil Bush?, BTW), or youd know lil Hillary (and even lil' Bill) is there with the rest of the political gang. laugh.gif


More and more I'm seeing the type characters highlighted. This is annoying. Why is this happening? How can it be corrected?

Here's a link to the post I took this from, but I've seen it elsewhere.

http://www.americasdebate.com/forums/index...st&p=219825

Oh man... I thought this was about all the Doctors who are showing up as terrorist in the recent UK/Glasgow bombing plots....
Seamus
QUOTE(Amlord @ Jul 2 2007, 09:49 PM) *
What exactly is the error? The reason I ask is because I have very little knowledge of coding, but I do have a crappy little website for my daughter's dance school. I always get weird coding errors where odd symbols (mostly a goofy looking capital A) show up. It isn't related to formatting as far as I can tell.

Here'a a good tutorial, but it's TLDR and a bit out of date.

The problem is with how computers encode special text characters into binary numbers. When your site gets its special characters mixed up, chances are the character encoding isn't consistent from end-to-end; the Web browser or server was expecting text encoded with one character set, but instead was given text encoded in a slightly different character set. Mike's having a problem because a JavaScript-generated form or form element isn't honoring his encoding choices due to a bug, but that's not the most common problem... here's how to start fixing the most common sources of character encoding problems:

If you're using a text editor or Dreamweaver to build your pages, you'll need to find out what "character encoding" system it's using: the most common Web standards are UTF-8 (preferred, more characters) and ISO-8859-1 (fallback for software that can't handle multibyte characters). Most text editors that cost money will let you pick from a long list of alternatives, while other editors only use the default character set supplied by their proprietary OS (windows-1250 or mac-roman, among others), so this is something you'll have to look up in your software. The official list of alternatives is very long.

Regardless of what encoding your editor uses, your Web page markup must be set to match your editor. Assuming you're using XHTML-Transitional formatting, there are at least three places you'll need to set your character encoding, illustrated in this sample page wherever you see "utf-8":

CODE
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Untitled</title>
</head>
<body>
<form accept-charset="utf-8" action="url.cgi" method="get" enctype="application/x-form-urlencoded"id="fn" name="fn">
...
</form>
</body>
</html>


To be clear, some (most?) editing software won't automatically switch to UTF-8 just because your Web page markup is asking it to... if your editor won't let you pick from a menu, then you'll have to replace "utf-8" in the example above with whatever encoding the editor is forcing on you, which is probably the proprietary OS default you'll need to look up. It's a pain in the rear, but c'st la vie.

You'll also need to be sure your server software (PHP/Perl scripts, MySQL/SQL database, Apache/IIS, etc.) are configured to accept the encoding scheme and generate the right HTTP headers. How to do that can vary greatly, so you'll probably have to look it up in your documentation. You won't usually have trouble if you can use ISO-8859-1 (a.k.a. ISO-Latin-1) or UTF-8 (a.k.a. 8-bit Unicode, no BOM). The HTTP header your server software should generate is:
CODE
Content-Type: text/html; charset=utf-8
where you can replace "utf-8" with the file's actual encoding scheme. If you can't control your HTTP headers, don't worry too much about it; the meta element in the first example should override the HTTP header, anyway.

About the worst thing to do is to leave the encoding completely unspecified. That way, your text editor, server software, and everyone's individual browsers are left to their own devices to make some random guess. They almost always guess wrong.

Beware that if you cut text from some arbitrary document and paste it into your Web page source code or a form element, your text editor or Web browser has to be smart enough to convert the character encoding for you automatically; otherwise, you'll either need to fix problems manually by re-typing the problem characters, or by using separate conversion software (a hassle, I just use editors with conversion features built-in).

If you notice one special character is being replaced with two or more special characters, then you may have a Web form accepting UTF-8, which the server expects to be ISO-8859-1 or some proprietary one-byte-per-character format, so the multibyte characters get misinterpreted as multiple single-byte characters. I've heard this called a Unicode avalanche, because the characters seem to multiply or snowball on multiple trips between client and server. In most simpler sites, there's not really a Unicode avalanche, just an inconsistency in encoding.

Geeky Details: The first 127 characters are usually consistent with ASCII, but the other characters can vary greatly; each OS has at least one proprietary way to do it, and most of them support several alternatives, occasionally for different languages. The three preferred varieties are ISO-8859-1, UTF-16, and UTF-8. The ISO-8859-1 variety uses only one byte per character, so it can theoretically encode up to 256 characters (although it actually reserves a few for special purposes). The UTF-16 variety uses two bytes for every character, so it could theoretically have 65,536 characters, but actually only allows a few thousand so far (and most of us avoid it because it's less efficient for English and other Latin alphabets). UTF-8 strikes a happy medium by using single bytes to encode most latin characters in ISO-8859-1, then by using multiple bytes to encode extended characters from the UTF-16 set. The world is moving to UTF-8, but some older software only supports ISO-8859-1.
Mike
I swapped the character set for our html pages from iso-8859-1 to UTF-8, and it seems to have fixed the problem (at least in this topic).

Our database is latin1_swedish_ci.

Anyone see any problems with this, technical or otherwise?

smile.gif

Mike
Amlord
Thanks Seamus!
Google
Seamus
You're welcome, Amlord.

Mike, the latin1_swedish_ci collation is in iso-8859-1 (latin1), so you'd need to upgrade your DB to UTF-8 for UTF-8 to work with older posts, and your software may not take well to that (more info). Databases with mixed encodings can be a nightmare to sort out later.

Another option is to stay with latin1 and convert incoming form input from UTF-8 back to iso-8859-1 server-side before inserting text into the DB. In Perl, you can use Unicode::String. In PHP, iconv() or the mb_string module's mb_convert_encoding() function can be helpful. The built-in htmlentities() can help stop the avalanche, but it won't try to remap characters to latin1. (edited to add... btw, this isn't a very robust option. It's better to get the client to encode text the way you need it, but Ajax stuff frequently overlooks encoding issues by assuming everyone's already using UTF-8).

Then again, waiting for the update might be good. smile.gif
Mike
Ugh.

It looks like option one will be better. Our software package has 250,000+ lines of code, and I'm not about to take what is currently a 50+ hour upgrade process and complicate it even more with character set code hacks.

I'll have to get with my host and see if/when I can do this.

Thanks,

Mike
Paladin Elspeth
While you're at it, Mike, could you check into the reason why there are little squares within the quotation on my signature line? Thanks.
Mike
OK, I've reverted back to iso-8859-1 so I don't cause more problems by switching to UTF-8. I'll look at all the bugs that were fixed, and see if they relate to the bugs we're experiencing.

QUOTE(Paladin Elspeth @ Jul 3 2007, 03:21 PM) *
While you're at it, Mike, could you check into the reason why there are little squares within the quotation on my signature line? Thanks.

I see no squares, PE. ermm.gif

Mike
Amlord
QUOTE(Mike @ Jul 3 2007, 04:46 PM) *
QUOTE(Paladin Elspeth @ Jul 3 2007, 03:21 PM) *
While you're at it, Mike, could you check into the reason why there are little squares within the quotation on my signature line? Thanks.

I see no squares, PE. ermm.gif

Mike

I saw them before, but now they are gone. Hmm....
BoF
Mike and Seamus,

Your knowledge of coding has left me a bit jealous. mrsparkle.gif

I just posted something. Apostrophes were showing up as the same extraneous characters that were in the sample I posted yesterday.

I edited extraneous characters out, replace the apostrophes, reposted and it worked fine.

Members may have to do a little more careful editing until we get this resolved.
Paladin Elspeth
The little squares were beside the hyphens within the quotation. I'm glad they're gone in any case. Thanks for validating what I said, Amlord! flowers.gif
This is a simplified version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.