Help - Search - Members - Calendar
Full Version: What's Up Doc?
America's Debate > Forum Information > Comments and Suggestions
Google
BoF
QUOTE
No, I will answer that. Obviously, you havent seen the show (which is called Lil Bush, BTW), or youd know lil Hillary (and even lil' Bill) is there with the rest of the political gang. laugh.gif


More and more I'm seeing the type characters highlighted. This is annoying. Why is this happening? How can it be corrected?

Here's a link to the post I took this from, but I've seen it elsewhere.

http://www.americasdebate.com/forums/index...st&p=219825
Google
Mike
Hm.

I'm not sure why this is happening. ermm.gif

I'll take a look and see if I can track it down, although I probably won't be able to get to it until the weekend.

Thanks BoF!

smile.gif

Mike
Doclotus
Whew! For a second there I thought this thread was about me blush.gif mrsparkle.gif
logophage
It's probably a unicode/utf-8 encoding/decoding problem
Mike
QUOTE(logophage @ Jul 2 2007, 08:59 PM) *
It's probably a unicode/utf-8 encoding/decoding problem

Yeah, I figured it was along those lines. I'm thinking it's being caused by the RTE/WYSIWYG editor.

They've released 2.3.1 (we're running 2.2.2) and it fixed-- believe it or not-- 192 bugs that are present in the version we're running. ohmy.gif

I'll get it upgraded, eventually... heh!

smile.gif

Mike
Amlord
What exactly is the error? The reason I ask is because I have very little knowledge of coding, but I do have a crappy little website for my daughter's dance school. I always get weird coding errors where odd symbols (mostly a goofy looking capital A) show up. It isn't related to formatting as far as I can tell.
BaphometsAdvocate
QUOTE(BoF @ Jul 2 2007, 05:41 PM) *
QUOTE
No, I will answer that. Obviously, you havent seen the show (which is called Lil Bush?, BTW), or youd know lil Hillary (and even lil' Bill) is there with the rest of the political gang. laugh.gif


More and more I'm seeing the type characters highlighted. This is annoying. Why is this happening? How can it be corrected?

Here's a link to the post I took this from, but I've seen it elsewhere.

http://www.americasdebate.com/forums/index...st&p=219825

Oh man... I thought this was about all the Doctors who are showing up as terrorist in the recent UK/Glasgow bombing plots....
Seamus
QUOTE(Amlord @ Jul 2 2007, 09:49 PM) *
What exactly is the error? The reason I ask is because I have very little knowledge of coding, but I do have a crappy little website for my daughter's dance school. I always get weird coding errors where odd symbols (mostly a goofy looking capital A) show up. It isn't related to formatting as far as I can tell.

Here'a a good tutorial, but it's TLDR and a bit out of date.

The problem is with how computers encode special text characters into binary numbers. When your site gets its special characters mixed up, chances are the character encoding isn't consistent from end-to-end; the Web browser or server was expecting text encoded with one character set, but instead was given text encoded in a slightly different character set. Mike's having a problem because a JavaScript-generated form or form element isn't honoring his encoding choices due to a bug, but that's not the most common problem... here's how to start fixing the most common sources of character encoding problems:

If you're using a text editor or Dreamweaver to build your pages, you'll need to find out what "character encoding" system it's using: the most common Web standards are UTF-8 (preferred, more characters) and ISO-8859-1 (fallback for software that can't handle multibyte characters). Most text editors that cost money will let you pick from a long list of alternatives, while other editors only use the default character set supplied by their proprietary OS (windows-1250 or mac-roman, among others), so this is something you'll have to look up in your software. The official list of alternatives is very long.

Regardless of what encoding your editor uses, your Web page markup must be set to match your editor. Assuming you're using XHTML-Transitional formatting, there are at least three places you'll need to set your character encoding, illustrated in this sample page wherever you see "utf-8":

CODE
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Untitled</title>
</head>
<body>
<form accept-charset="utf-8" action="url.cgi" method="get" enctype="application/x-form-urlencoded"id="fn" name="fn">
...
</form>
</body>
</html>


To be clear, some (most?) editing software won't automatically switch to UTF-8 just because your Web page markup is asking it to... if your editor won't let you pick from a menu, then you'll have to replace "utf-8" in the example above with whatever encoding the editor is forcing on you, which is probably the proprietary OS default you'll need to look up. It's a pain in the rear, but c'st la vie.

You'll also need to be sure your server software (PHP/Perl scripts, MySQL/SQL database, Apache/IIS, etc.) are configured to accept the encoding scheme and generate the right HTTP headers. How to do that can vary greatly, so you'll probably have to look it up in your documentation. You won't usually have trouble if you can use ISO-8859-1 (a.k.a. ISO-Latin-1) or UTF-8 (a.k.a. 8-bit Unicode, no BOM). The HTTP header your server software should generate is:
CODE
Content-Type: text/html; charset=utf-8
where you can replace "utf-8" with the file's actual encoding scheme. If you can't control your HTTP headers, don't worry too much about it; the meta element in the first example should override the HTTP header, anyway.

About the worst thing to do is to leave the encoding completely unspecified. That way, your text editor, server software, and everyone's individual browsers are left to their own devices to make some random guess. They almost always guess wrong.

Beware that if you cut text from some arbitrary document and paste it into your Web page source code or a form element, your text editor or Web browser has to be smart enough to convert the character encoding for you automatically; otherwise, you'll either need to fix problems manually by re-typing the problem characters, or by using separate conversion software (a hassle, I just use editors with conversion features built-in).

If you notice one special character is being replaced with two or more special characters, then you may have a Web form accepting UTF-8, which the server expects to be ISO-8859-1 or some proprietary one-byte-per-character format, so the multibyte characters get misinterpreted as multiple single-byte characters. I've heard this called a Unicode avalanche, because the characters seem to multiply or snowball on multiple trips between client and server. In most simpler sites, there's not really a Unicode avalanche, just an inconsistency in encoding.

Geeky Details: The first 127 characters are usually consistent with ASCII, but the other characters can vary greatly; each OS has at least one proprietary way to do it, and most of them support several alternatives, occasionally for different languages. The three preferred varieties are ISO-8859-1, UTF-16, and UTF-8. The ISO-8859-1 variety uses only one byte per character, so it can theoretically encode up to 256 characters (although it actually reserves a few for special purposes). The UTF-16 variety uses two bytes for every character, so it could theoretically have 65,536 characters, but actually only allows a few thousand so far (and most of us avoid it because it's less efficient for English and other Latin alphabets). UTF-8 strikes a happy medium by using single bytes to encode most latin characters in ISO-8859-1, then by using multiple bytes to encode extended characters from the UTF-16 set. The world is moving to UTF-8, but some older software only supports ISO-8859-1.
Mike
I swapped the character set for our html pages from iso-8859-1 to UTF-8, and it seems to have fixed the problem (at least in this topic).

Our database is latin1_swedish_ci.

Anyone see any problems with this, technical or otherwise?

smile.gif

Mike
Amlord
Thanks Seamus!
Google
Seamus
You're welcome, Amlord.

Mike, the latin1_swedish_ci collation is in iso-8859-1 (latin1), so you'd need to upgrade your DB to UTF-8 for UTF-8 to work with older posts, and your software may not take well to that (more info). Databases with mixed encodings can be a nightmare to sort out later.

Another option is to stay with latin1 and convert incoming form input from UTF-8 back to iso-8859-1 server-side before inserting text into the DB. In Perl, you can use Unicode::String. In PHP, iconv() or the mb_string module's mb_convert_encoding() function can be helpful. The built-in htmlentities() can help stop the avalanche, but it won't try to remap characters to latin1. (edited to add... btw, this isn't a very robust option. It's better to get the client to encode text the way you need it, but Ajax stuff frequently overlooks encoding issues by assuming everyone's already using UTF-8).

Then again, waiting for the update might be good. smile.gif
Mike
Ugh.

It looks like option one will be better. Our software package has 250,000+ lines of code, and I'm not about to take what is currently a 50+ hour upgrade process and complicate it even more with character set code hacks.

I'll have to get with my host and see if/when I can do this.

Thanks,

Mike
Paladin Elspeth
While you're at it, Mike, could you check into the reason why there are little squares within the quotation on my signature line? Thanks.
Mike
OK, I've reverted back to iso-8859-1 so I don't cause more problems by switching to UTF-8. I'll look at all the bugs that were fixed, and see if they relate to the bugs we're experiencing.

QUOTE(Paladin Elspeth @ Jul 3 2007, 03:21 PM) *
While you're at it, Mike, could you check into the reason why there are little squares within the quotation on my signature line? Thanks.

I see no squares, PE. ermm.gif

Mike
Amlord
QUOTE(Mike @ Jul 3 2007, 04:46 PM) *
QUOTE(Paladin Elspeth @ Jul 3 2007, 03:21 PM) *
While you're at it, Mike, could you check into the reason why there are little squares within the quotation on my signature line? Thanks.

I see no squares, PE. ermm.gif

Mike

I saw them before, but now they are gone. Hmm....
BoF
Mike and Seamus,

Your knowledge of coding has left me a bit jealous. mrsparkle.gif

I just posted something. Apostrophes were showing up as the same extraneous characters that were in the sample I posted yesterday.

I edited extraneous characters out, replace the apostrophes, reposted and it worked fine.

Members may have to do a little more careful editing until we get this resolved.
Paladin Elspeth
The little squares were beside the hyphens within the quotation. I'm glad they're gone in any case. Thanks for validating what I said, Amlord! flowers.gif
entspeak
QUOTE(Mike @ Jul 3 2007, 02:46 PM) *
OK, I've reverted back to iso-8859-1 so I don't cause more problems by switching to UTF-8. I'll look at all the bugs that were fixed, and see if they relate to the bugs we're experiencing.


Just a note that this problem is still occurring - and is still frustrating. smile.gif

This problem generally occurs when replying to or editing a post that has been typed using something other than the editor on this site and then copied in or if the text was copied from some other outside source - a pdf document, website article, etc...

HAHA! Temporary workaround. For those having a problem with this, here is a website that will convert UTF-8 text to iso-8859-1. If you copy your text into this converter and choose iso-8859-1 as the destination character set, it will do the conversion. I have tested this and it works. smile.gif

The site:
Online charset/codepage conversion
Curmudgeon
When I saw this question , Irecalled one of the standard pranks that would be played on a new apprentice. Someone would always give the newbie a list of parts to pick up from a stockroom, and include, "1 pint of updoc."
Maybe Maybe Not
QUOTE(entspeak @ Jan 1 2010, 11:57 AM) *
Just a note that this problem is still occurring - and is still frustrating. smile.gif

This problem generally occurs when replying to or editing a post that has been typed using something other than the editor on this site and then copied in or if the text was copied from some other outside source - a pdf document, website article, etc...

HAHA! Temporary workaround. For those having a problem with this, here is a website that will convert UTF-8 text to iso-8859-1. If you copy your text into this converter and choose iso-8859-1 as the destination character set, it will do the conversion. I have tested this and it works. smile.gif

The site:
Online charset/codepage conversion
I appreciate the help.

I've tried removing formatting. I've tried pasting my posts into Notepad and changing the font and formatting. I've tried switching between standard and rich text editor (which at least had the advantage of showing me when a problem will be evident). I haven't found anything foolproof yet.

I'll give the converter you suggst a try ...
Curmudgeon
QUOTE(Maybe Maybe Not @ Jan 1 2010, 03:37 PM) *
I've tried removing formatting. I've tried pasting my posts into Notepad and changing the font and formatting. I've tried switching between standard and rich text editor (which at least had the advantage of showing me when a problem will be evident). I haven't found anything foolproof yet.

I have problems with editing a post, and being timed out before I am finished. I also find that sometimes I want to include more than a single citation to an earlier point in the thread. My solution has been to start my post, copy and paste to Microsoft Word, edit to my heart's content, copy and paste back to the original thread, do a CTRL A, and then format the entire post to the original Verdana 2. I've either found a workaround, or been very lucky.

I did have a computer set up with two monitors to facilitate that sort of stunt, but it caught fire. One of these days, I'll take one of my computers in and have a professional set it up for me. It's so much more satisfactory than using half screen images...
entspeak
QUOTE(Curmudgeon @ Jan 1 2010, 11:27 PM) *
QUOTE(Maybe Maybe Not @ Jan 1 2010, 03:37 PM) *
I've tried removing formatting. I've tried pasting my posts into Notepad and changing the font and formatting. I've tried switching between standard and rich text editor (which at least had the advantage of showing me when a problem will be evident). I haven't found anything foolproof yet.

I have problems with editing a post, and being timed out before I am finished. I also find that sometimes I want to include more than a single citation to an earlier point in the thread. My solution has been to start my post, copy and paste to Microsoft Word, edit to my heart's content, copy and paste back to the original thread, do a CTRL A, and then format the entire post to the original Verdana 2. I've either found a workaround, or been very lucky.

I did have a computer set up with two monitors to facilitate that sort of stunt, but it caught fire. One of these days, I'll take one of my computers in and have a professional set it up for me. It's so much more satisfactory than using half screen images...


Yes, changing the fonts doesn't solve the problem I've had, I tried copying some of the troubling text into Word and changing the font to Verdana, but that doesn't work... as the thread indicates, it's a character set problem and not a font problem. UTF-8 text has to be converted into iso-8859-1. The problem in this discusssion really only occurs if someone else has used UTF-8 text itn their post and you try to reply or if you've used UTF-8 text (copied from another source that is using that character set) in your post and try to go back in to edit.
Maybe Maybe Not
QUOTE(entspeak @ Jan 2 2010, 01:43 AM) *
The problem in this discusssion really only occurs if someone else has used UTF-8 text itn their post and you try to reply or if you've used UTF-8 text (copied from another source that is using that character set) in your post and try to go back in to edit.
I'm sure this is the issue.

In the abortion thread I've copied and pasted extensively from a .pdf file of the White Paper and from the HTML version of Roe on Cornell's Legal Information Institute's website. Either one or both must be using the different character set.
Maybe Maybe Not
Arrrrgghhh!!

Now I'm not so sure the fault lies entirely with characters copied from another website.

In my most recent post in the abortion thread, I replied to a post using the "Reply" button at the bottom of the other member's post. I copied this into MS Word, and composed and edited my response within Word. I copied and pasted my response into the standard editor on this site. When I switched to the rich text editor to check for extraneous characters (as I have taken to doing in order to detect any problems), there they were! Only a few, and only (so far as I can tell) when I used dashes, ellipses, and single quotation marks.

Is this a problem with MS Word? I'm using Office 2003.
entspeak
QUOTE(Maybe Maybe Not @ Jan 9 2010, 05:59 PM) *
Arrrrgghhh!!

Now I'm not so sure the fault lies entirely with characters copied from another website.

In my most recent post in the abortion thread, I replied to a post using the "Reply" button at the bottom of the other member's post. I copied this into MS Word, and composed and edited my response within Word. I copied and pasted my response into the standard editor on this site. When I switched to the rich text editor to check for extraneous characters (as I have taken to doing in order to detect any problems), there they were! Only a few, and only (so far as I can tell) when I used dashes, ellipses, and single quotation marks.

Is this a problem with MS Word? I'm using Office 2003.


I know that MS Word sometimes en-dashes into em-dashes (the longer ones) and this gets corrupted because the Iso character set for the site doesn't recognize it. It may be the same thing for the others.
Lasher
QUOTE(BoF @ Jul 3 2007, 04:08 PM) *
Mike and Seamus,

Your knowledge of coding has left me a bit jealous. mrsparkle.gif

I just posted something. Apostrophes were showing up as the same extraneous characters that were in the sample I posted yesterday.

I edited extraneous characters out, replace the apostrophes, reposted and it worked fine.

Members may have to do a little more careful editing until we get this resolved.

What does extraneous mean?

QUOTE(Maybe Maybe Not @ Jan 9 2010, 06:59 PM) *
Arrrrgghhh!!

Now I'm not so sure the fault lies entirely with characters copied from another website.

In my most recent post in the abortion thread, I replied to a post using the "Reply" button at the bottom of the other member's post. I copied this into MS Word, and composed and edited my response within Word. I copied and pasted my response into the standard editor on this site. When I switched to the rich text editor to check for extraneous characters (as I have taken to doing in order to detect any problems), there they were! Only a few, and only (so far as I can tell) when I used dashes, ellipses, and single quotation marks.

Is this a problem with MS Word? I'm using Office 2003.

Golly, that all sounds complicated.
This is a simplified version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2013 Invision Power Services, Inc.