|
|
| Next: Accepted emacs-jabber 0.7.93-2 (source all) |
| Author |
Message |
Thomas Koch External

Since: Jan 16, 2009 Posts: 14
|
Posted: Mon Aug 10, 2009 8:10 am Post subject: default character encoding for everything in debian Archived from groups: linux>debian>devel (more info?) |
|
|
Hi,
I've an issue, that I forgot to set the character encoding of tomcat to utf-8
after reinstalling a server.
Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to
discuss) shouldn't utf8 be the default character set everywhere? So when
installing a package from Debian I can assume that where a character encoding
can be set, it't set to utf8.
MySQL would be another example, which to my knowledge uses isoXYZ as default
character encoding.
Best regards,
Thomas Koch, http://www.koch.ro
--
To UNSUBSCRIBE, email to debian-devel-REQUEST RemoveThis @lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster RemoveThis @lists.debian.org |
|
| Back to top |
|
 |
Giacomo A. Catenazzi External

Since: Jun 20, 2005 Posts: 44
|
Posted: Mon Aug 10, 2009 9:10 am Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Thomas Koch wrote:
> Hi,
>
> I've an issue, that I forgot to set the character encoding of tomcat to utf-8
> after reinstalling a server.
> Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to
> discuss) shouldn't utf8 be the default character set everywhere? So when
> installing a package from Debian I can assume that where a character encoding
> can be set, it't set to utf8.
> MySQL would be another example, which to my knowledge uses isoXYZ as default
> character encoding.
There are different problems.
Future debian systems will have a UTF-8 charset as default.
Look at debian-policy archives.
A lot of debian files will be encoded in utf-8 (control, changelog
and manpages), and transformed in the needed charset runtime.
But for databases there are different issues. I think the best solution
is to do it as mediawiki: the UTF-8 data in put as binary blob: it is
difficult to have database engines and system libraries syncronized, and
it is also difficult to implement support for all Unicode characters.
But let to concentrate to the first task: having a good UTF-8 support
in all programs/terminals/etc.
ciao
cate
--
To UNSUBSCRIBE, email to debian-devel-REQUEST RemoveThis @lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster RemoveThis @lists.debian.org |
|
| Back to top |
|
 |
Norbert Preining External

Since: Oct 13, 2004 Posts: 838
|
Posted: Mon Aug 10, 2009 4:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mo, 10 Aug 2009, Roger Leigh wrote:
> Of course there's a penalty for certain operations. But UTF-8 is about
> as compact as an extended encoding is going to get.
Rubbish. You know why in Japan and other Asian countries UTF8 is not
so common? Because many of their glyphs need 4 (four!) bytes, while
for example jis-2022 (AFAIR) is much more compact.
We are not living in an ASCII world anymore.
Best wishes
Norbert
-------------------------------------------------------------------------------
Dr. Norbert Preining <preining.DeleteThis@logic.at> Vienna University of Technology
Debian Developer <preining.DeleteThis@debian.org> Debian TeX Group
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
CHICAGO (n.)
The foul-smelling wind which precedes an underground railway train.
--- Douglas Adams, The Meaning of Liff
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.DeleteThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.DeleteThis@lists.debian.org |
|
| Back to top |
|
 |
Russ Allbery External

Since: Nov 17, 2005 Posts: 897
|
Posted: Mon Aug 10, 2009 4:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Josselin Mouette <joss.DeleteThis@debian.org> writes:
> Now we could concentrate on removing from the archive programs without
> proper UTF8 support.
There are, sadly, some very useful programs with no adequate replacement
that don't have UTF-8 support. tf5, for instance.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.DeleteThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.DeleteThis@lists.debian.org |
|
| Back to top |
|
 |
Philipp Kern External

Since: Jun 22, 2009 Posts: 8
|
Posted: Mon Aug 10, 2009 4:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On 2009-08-10, Norbert Preining <preining.RemoveThis@logic.at> wrote:
> On Mo, 10 Aug 2009, Roger Leigh wrote:
>> Of course there's a penalty for certain operations. But UTF-8 is about
>> as compact as an extended encoding is going to get.
> Rubbish. You know why in Japan and other Asian countries UTF8 is not
> so common? Because many of their glyphs need 4 (four!) bytes, while
> for example jis-2022 (AFAIR) is much more compact.
> We are not living in an ASCII world anymore.
Really because of the size? We are not living in a byte beancounting
world anymore. At worst you double the *text* size (we're not talking
about images or anything, which are far larger), going from 2 bytes
that you need anyway to four. ISO 2022 also wastes one bit per byte
to be 7bit safe. If I read the Wikipedia article correctly at least
the JP escaping only needs to be put into the document once. (Well,
or maybe several times switching back and forth if you're embedding
latin-encoded words into the text.)
Maybe I'm an ignorant European but I'm not sure that equation still
holds. Of course there are certain tradeoffs about latin characters
being the privileged few to get a short encoding, but that doesn't
make UTF-8 bad per se to call it "rubbish".
Kind regards,
Philipp Kern
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.RemoveThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.RemoveThis@lists.debian.org |
|
| Back to top |
|
 |
Norbert Preining External

Since: Oct 13, 2004 Posts: 838
|
Posted: Mon Aug 10, 2009 4:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mo, 10 Aug 2009, Philipp Kern wrote:
> >> Of course there's a penalty for certain operations. But UTF-8 is about
> >> as compact as an extended encoding is going to get.
[...]
> make UTF-8 bad per se to call it "rubbish".
I didn't call utf-8 itself rubbish, I am myself a strong proponent for
utf-8, only your quote that it is "about as compact as an extended encoding
is going to get".
OTOH, I agree that UTF-8 is the way to go in general computing, I have had
too much pain with all those local encodings around the world.
Best wishes
Norbert
-------------------------------------------------------------------------------
Dr. Norbert Preining <preining.TakeThisOut@logic.at> Vienna University of Technology
Debian Developer <preining.TakeThisOut@debian.org> Debian TeX Group
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
HUTTOFT (n.)
The fibrous algae which grows in the dark, moist environment of
trouser turn-ups.
--- Douglas Adams, The Meaning of Liff
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Samuel Thibault External

Since: May 08, 2009 Posts: 47
|
Posted: Mon Aug 10, 2009 9:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Harald Braumann, le Tue 11 Aug 2009 01:33:58 +0200, a écrit :
> Or do you mean the user pays the price, because if the encoding is set
> to UTF-8 then performance would suffer? In that case, I'd love to see
> some real life numbers. I doubt the difference would be noticeable.
Google utf-8 grep performance loss.
Samuel
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.RemoveThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.RemoveThis@lists.debian.org |
|
| Back to top |
|
 |
Gunnar Wolf External

Since: Nov 12, 2004 Posts: 200
|
Posted: Tue Aug 11, 2009 3:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]:
> > There are a lot of users out there that are not willing to pay the
> > price for increased generality.
>
> Don't you mean s/users/programmers? As a user I don't see what price I
> pay. I only see advantages in having a consistent encoding. Which,
> btw., doesn't have to be UTF-8. In an ideal world every programme would
> adhere to LC_CTYPE. But if the encoding has to be configured then I
> would also prefer UTF-8 as the default.
>
> Of course, for the programmer there might be a price to pay. And if
> he's not willing to pay it, he can't be forced, anyway.
>
> Or do you mean the user pays the price, because if the encoding is set
> to UTF-8 then performance would suffer? In that case, I'd love to see
> some real life numbers. I doubt the difference would be noticeable.
Yes, performance will suffer. We enjoyed many decades of blissfully
ignoring the difference between a character and a byte. So, while
length(str) in any language up to the 1990s was a mere substraction,
now we must go through the string checking each byte to see if it is a
Unicode marker and substract the appropriate number of bytes. Also,
for a very long time we didn't really care much what was a buffer's
content - Everything could be printed, even if it had control
characters which made you beep (with the ocassional control sequence
re-injecting output into the terminal as input). Now... Well, printing
an unprintable string can cause segfaults in some cases.
--
Gunnar Wolf • gwolf.TakeThisOut@gwolf.org • (+52-55)5623-0154 / 1451-2244
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Gunnar Wolf External

Since: Nov 12, 2004 Posts: 200
|
Posted: Tue Aug 11, 2009 3:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Norbert Preining dijo [Mon, Aug 10, 2009 at 08:55:27PM +0200]:
> On Mo, 10 Aug 2009, Roger Leigh wrote:
> > Of course there's a penalty for certain operations. But UTF-8 is about
> > as compact as an extended encoding is going to get.
>
> Rubbish. You know why in Japan and other Asian countries UTF8 is not
> so common? Because many of their glyphs need 4 (four!) bytes, while
> for example jis-2022 (AFAIR) is much more compact.
>
> We are not living in an ASCII world anymore.
It's not that much about the size as it is about backwards
compatibility. We users of Latin-based alphabets migrate easily to
UTF8, with occassional problems where we use diacritics. Eastern Asian
encodings are _completely_ incompatible with UTF8, so it is just not
possible to tolerate broken text every now and then. Everything just
breaks completely.
--
Gunnar Wolf • gwolf DeleteThis @gwolf.org • (+52-55)5623-0154 / 1451-2244
--
To UNSUBSCRIBE, email to debian-devel-REQUEST DeleteThis @lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster DeleteThis @lists.debian.org |
|
| Back to top |
|
 |
Samuel Thibault External

Since: May 08, 2009 Posts: 47
|
Posted: Tue Aug 11, 2009 4:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
> while length(str) in any language up to the 1990s was a mere
> substraction, now we must go through the string checking each byte to
> see if it is a Unicode marker and substract the appropriate number of
> bytes.
Not necessarily. Any sane implementation should just use wchar_t and
substraction gets back. The width of the text is another matter, but
it's a problem for truetype rendering anyway. What is still costly is
then the conversion, which in principle only happens while talking with
other programs (files/socket/etc.)
Samuel
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Bernd Eckenfels External

Since: May 21, 2009 Posts: 5
|
Posted: Tue Aug 11, 2009 5:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
In article <20090811182041.GD19541.TakeThisOut@cajita.gateway.2wire.net> you wrote:
> encodings are _completely_ incompatible with UTF8, so it is just not
> possible to tolerate broken text every now and then. Everything just
> breaks completely.
Or everything works out of the box, when you use it correctly...
Gruss
Bernd
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Bernd Eckenfels External

Since: May 21, 2009 Posts: 5
|
Posted: Tue Aug 11, 2009 5:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
In article <20090811183800.GE5487 DeleteThis @const.famille.thibault.fr> you wrote:
> Not necessarily. Any sane implementation should just use wchar_t
Which could be UTF16 and therefore still has complicatd length semantics.
And even with UTF32 there are combining characters. Sadly. But the length
could be defined in code units - its just a question how usefull it is.
Gruss
Bernd
--
To UNSUBSCRIBE, email to debian-devel-REQUEST DeleteThis @lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster DeleteThis @lists.debian.org |
|
| Back to top |
|
 |
Bastian Blank External

Since: Nov 21, 2004 Posts: 774
|
Posted: Tue Aug 11, 2009 5:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
> In article <20090811183800.GE5487.DeleteThis@const.famille.thibault.fr> you wrote:
> > Not necessarily. Any sane implementation should just use wchar_t
> Which could be UTF16 and therefore still has complicatd length semantics.
No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
Windows).
Bastian
--
Phasers locked on target, Captain.
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.DeleteThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.DeleteThis@lists.debian.org |
|
| Back to top |
|
 |
Samuel Thibault External

Since: May 08, 2009 Posts: 47
|
Posted: Tue Aug 11, 2009 5:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Bernd Eckenfels, le Tue 11 Aug 2009 21:40:35 +0200, a écrit :
> In article <20090811183800.GE5487.RemoveThis@const.famille.thibault.fr> you wrote:
> > Not necessarily. Any sane implementation should just use wchar_t
>
> Which could be UTF16 and therefore still has complicatd length semantics.
??
wchar_t may be 32 or 16bit (in which case it can't express unicode after
U+FFFF), but it's still meant to have the simple length semantics.
> And even with UTF32 there are combining characters.
Which account for one character. Then there is a problem of rendering
width of course, but as I said it's there anyway as soon as you have
a font with varying letter widths, string manipulation don't pose any
problem anyway.
> But the length could be defined in code units - its just a question
> how usefull it is.
Of course. It's rarely useful to take into account character width
yourself, unless you are rendering on a tty, but then speed usually
doesn't matter and you can afford calling wcswidth() on your string
as late as possible.
Samuel
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.RemoveThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.RemoveThis@lists.debian.org |
|
| Back to top |
|
 |
Jakub Wilk External

Since: Nov 12, 2005 Posts: 93
|
Posted: Tue Aug 11, 2009 6:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
* Bastian Blank <waldi.TakeThisOut@debian.org>, 2009-08-11, 22:24:
>> > Not necessarily. Any sane implementation should just use wchar_t
>> Which could be UTF16 and therefore still has complicatd length semantics.
>
>No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
>Windows).
And in the most esoteric (while still conforming to the C standard)
implementations it is not related to Unicode at all.
--
Jakub Wilk
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Adam Borowski External

Since: May 22, 2006 Posts: 84
|
Posted: Tue Aug 11, 2009 7:10 pm Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Aug 10, 2009 at 09:04:37PM +0100, Roger Leigh wrote:
> If having a C.UTF-8 locale always available for system services is
> required for them to fully support UTF-8, then that needs adding to
> glibc.
It would also bring significant speed increase. Since about everything
calls setlocale(), having the locale internal speeds up the typical process
startup sequence by 20%! And that's 20% of the whole thing from fork(),
through link, up to getopt(), so it's not a speedup you can shake a stick at.
I'm speaking about having the locale supported natively by glibc, of course;
what the udeb does is merely shipping a generated locale file.
> For a locale available after /usr is mounted, a simple localedef
> invocation is all that's needed; for all times, after starting init,
> it needs the tables compiling into glibc as for the standard C locale.
> I've been looking at how to do the latter, but I'm not expert with the
> "3-level" locale tables and other glibc internals, so if anyone who
> knows the details of glibc locales could provide me with
> assistance/guidance here, that would be much appreciated.
>
> For reference, this is bug #522776. This would be great to have as a
> release goal for Squeeze, and (speculatively) a native C UTF-8 locale
> for Squeeze+1 to give us a default pure UTF-8 system from end-to-end.
I'm not an expert with glibc internals too, but a couple of years ago I
researched the issue a bit. Apparently, there are only two first-class
locales: C and POSIX, all other get loaded from the disk. In the past,
en_US.ISO-8859-1 and ru_RU.KOI8-R were such first-class ones as well, but
that's no more. What I'd propose would be making C.UTF-8 built in.
Another possible optimization would be building the table used by 8-bit
isalpha/etc on the fly for all locales. Iconving 128 characters is
certainly faster than opening a file on the disk, and (sanely) glibc doesn't
support character classification contrary to Unicode so this could result in
completely nuking all LC_CTYPE files for other locales as well.
--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Giacomo A. Catenazzi External

Since: Jun 20, 2005 Posts: 44
|
Posted: Wed Aug 12, 2009 3:10 am Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Samuel Thibault wrote:
> Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
>> while length(str) in any language up to the 1990s was a mere
>> substraction, now we must go through the string checking each byte to
>> see if it is a Unicode marker and substract the appropriate number of
>> bytes.
>
> Not necessarily. Any sane implementation should just use wchar_t and
> substraction gets back.
An implementation that use wchar_t is usually not sane, but usually
it is (also) buggy. It is very difficult (AFAIK not impossible,
but I'm not so sure) to write portable (POSIX way, so with changing
locales) programs using wchar_t.
The only way I know is to use sanely the wchar_t is to use as the simple
C standard requirements: only one runtime environment and locale.
PS: note that the binary encoding depend on compiler environment (but
such info is not exported).
ciao
cate
--
To UNSUBSCRIBE, email to debian-devel-REQUEST RemoveThis @lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster RemoveThis @lists.debian.org |
|
| Back to top |
|
 |
Giacomo A. Catenazzi External

Since: Jun 20, 2005 Posts: 44
|
Posted: Wed Aug 12, 2009 3:10 am Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Bastian Blank wrote:
> On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
>> In article <20090811183800.GE5487.TakeThisOut@const.famille.thibault.fr> you wrote:
>>> Not necessarily. Any sane implementation should just use wchar_t
>> Which could be UTF16 and therefore still has complicatd length semantics.
>
> No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
> Windows).
No wchar_t is locale dependent (per POSIX).
BTW on gcc:
-fwide-exec-charset=charset
Set the wide execution character set, used for wide string and
character constants. The default is UTF-32 or UTF-16, whichever
corresponds to the width of wchar_t. As with -fexec-charset, charset can
be any encoding supported by the system's iconv library routine;
however, you will have problems with encodings that do not fit exactly
in wchar_t.
Note that default encoding is UTF-8, thus giving a UTF-32 wchar_t
in most developer machines.
ciao
cate
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Samuel Thibault External

Since: May 08, 2009 Posts: 47
|
Posted: Wed Aug 12, 2009 5:10 am Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit :
> Bastian Blank wrote:
> > On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
> >> In article <20090811183800.GE5487.TakeThisOut@const.famille.thibault.fr> you wrote:
> >>> Not necessarily. Any sane implementation should just use wchar_t
> >> Which could be UTF16 and therefore still has complicatd length semantics.
> >
> > No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
> > Windows).
>
> No wchar_t is locale dependent (per POSIX).
What do you mean? The compiler can't know the locale in advance for
the width and endianness. The value might depend on the locale, yes,
but that's not a problem as long as you convert into UTF-8 before
communicating with other applications.
One same systems (Debian systems are), it's just always UCS-4.
> BTW on gcc:
>
> -fwide-exec-charset=charset
> Set the wide execution character set, used for wide string and
> character constants.
It hurts when I shoot myself in the foot.
> The default is UTF-32 or UTF-16, whichever corresponds to the width of
> wchar_t.
This documentation is bogus BTW. It should read "UCS-4 or UCS-2".
> Note that default encoding is UTF-8, thus giving a UTF-32 wchar_t
> in most developer machines.
I don't understand this sentence.
Samuel
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.TakeThisOut@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.TakeThisOut@lists.debian.org |
|
| Back to top |
|
 |
Samuel Thibault External

Since: May 08, 2009 Posts: 47
|
Posted: Wed Aug 12, 2009 5:10 am Post subject: Re: default character encoding for everything in debian [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Giacomo A. Catenazzi, le Wed 12 Aug 2009 07:54:33 +0200, a écrit :
> Samuel Thibault wrote:
> > Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
> >> while length(str) in any language up to the 1990s was a mere
> >> substraction, now we must go through the string checking each byte to
> >> see if it is a Unicode marker and substract the appropriate number of
> >> bytes.
> >
> > Not necessarily. Any sane implementation should just use wchar_t and
> > substraction gets back.
>
> An implementation that use wchar_t is usually not sane, but usually
> it is (also) buggy.
Why? It's just about using wide functions instead of usual functions.
> PS: note that the binary encoding depend on compiler environment (but
> such info is not exported).
See my other mail. A lot of things can be made to depend on the
compiler environment.
Samuel
--
To UNSUBSCRIBE, email to debian-devel-REQUEST.DeleteThis@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster.DeleteThis@lists.debian.org |
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|