especially in fidonet where dominating codepage is CP866
especially in fidonet where dominating codepage is CP866Which completely ignores most Russian users which are more than likely using CP1251.
The above quote assumes that Russian sysops are content keeping fidonet within a small group of users of Russian abandonware rather than open
it up to the larger group of actual Russian users.
Also, like everywhere else in the world, utf-8 is the majority of
systems on the internet which will likely spread to the majority of
users ... eventually. One could easily argue that universal adoption of utf-8 text messaging only makes economic sense since the exact same
capable device could be used anywhere by anyone no matter what their
native language(s) happen to be.
By the way, this tagline is 8 bit Russian except NOT CP866 but instead CP1251 which is the most likely to happen in real usuage, and as a
matter of fact I have witnessed it happen on a Russian website in the
past. It should say the same as the 8 bit Greek (CP737) tagline except
in Russian.
Using CP1251 exactly where? And why don't KOI8-R? Or even
ISO-8859-5?
But you don't care isn't it?
Using CP1251 exactly where? And why don't KOI8-R? Or evenDo you have any statistics to back any of that up? I only went by what I've witnessed in the past.
ISO-8859-5?
At one time the majority of Russian sites appeared to be KOI8-R based
but not anymore. I see mostly UTF-8 these days and only one of them required a CHRS kludge. At the time I wondered how many hoops had to be jumped through to cripple the Apache module to 'accept' that silliness. :::snicker:::
It doesn't matter. It makes absolutely zero difference if I do or don't care.
Using CP1251 exactly where? And why don't KOI8-R? Or even
ISO-8859-5?
Do you have any statistics to back any of that up? I only went by
what I've witnessed in the past.
Why I need to back anything up? It was yours assumption that most
Russian users using CP1251.
At one time the majority of Russian sites appeared to be KOI8-R
based but not anymore. I see mostly UTF-8 these days
If you don't care that you messages could not be read correctly then
you can continue to write with bogus or missing codepage information.
But if it was in my power I would have restricted such traffic and
treat it as a case of 1.3.5 with all its consequences.
All I can say is that I have never seen a message encoded in CP1251 in Fidonet. What I see is CP437, CP850, Latin-1, and CP866.
While it is true that most of the World Wide Web has migrated to UTF-8,
we should keep in mind that Fidonet is not the Internet. As you have pointed out, the vast majority of Fidonet messages still use 8 bit encoding. The UTF-8 evangelists in here may wish it different, but the reality is that we still have a long way to go before UTF-8 will be the dominant encoding for Fidonet and it is quit likely that will never happen.
If you don't care that you messages could not be read correctly thenSee the new area rules published earier today.
you can continue to write with bogus or missing codepage information.
But if it was in my power I would have restricted such traffic and
treat it as a case of 1.3.5 with all its consequences.
* Origin: Blóf Tønón
The UTF-8 evangelists in here may wish it different, but the reality
is that we still have a long way to go before UTF-8 will be the
dominant encoding for Fidonet and it is quit likely that will never
happen.
If we want for widespread UTF-8 usage in fidonet then we must fix the
lack of unicode-capable editors first.
P.S. What's written there?
* Origin: Blóf Tønón
There are some strange 0x7F character:
0620 20 4F 72 69 67 69 6E 3A 20 42 6C 7F C3 B3 66 20
0630 54 C3 B8 6E 7F C3 B3 6E 20 28 32 3A 32 38 30 2F
If we want for widespread UTF-8 usage in fidonet then weThere is something you can do now to promote UTF-8 in Fidonet.
must fix the lack of unicode-capable editors first.
You can join the UTF-8 nodelist project. For some time now, ZC2
distributes a UTF-8 version of the nodelist in parallel witg the
old ASCII only list.
So far participation is limited, but it would be a great push for
the project if R50 were to join. IIRC RC50 has said that he would
join if five NCs in R50 would express the wish to join. (Or was
it four?)
NC5075 has already stated he would join and he has an
utfn5075.nnn ready for submission. So how about doing some
lobbying among your fellow NCs in R50?
So far participation is limited, but it would be a great push
Yes, five:for the project if R50 were to join. IIRC RC50 has said that he
would join if five NCs in R50 would express the wish to join.
(Or was it four?)
area://R50.SYSOP?msgid=2:5000/363+58b7c32a
Obviously something went wrong in code translation. There shoulf
be no '7F' and the code should be "C4 B3", not "C3 B3".
...
With Unicode it came back. So I thought this was good for my
origin line.But apparently there is a snake in the grass....
* Origin: BlЇf TЫnЇnarea://UTF-8?msgid=2:280/5555+58b86a8d
* Origin: Blijf Tønijn
Through I'm not a big fan of the idea of UTF-8 nodelist project in
it's current state because: 1) It is completely separated version of nodelist. So it is prone to out of sync issues with ascii-only
version.
2) IMHO it must also contain latin-only representations.
Ability to see native representation of information is very good, but
to properly understand it you must know that language.
If you doesn't know it then you just couldn't read it and most likely couldn't type it.
So it renders it basically useless.
I understand the desire to keep compatibility with old format but
maybe we need to also introduce xml/json version which would contain
both ascii and utf-8 representations if they are different.
But despite that why not to join...
Just asked R50C about current state of NC's counter and expressed my desire to join.
Through I'm not a big fan of the idea of UTF-8 nodelist project
in it's current state because: 1) It is completely separated
version of nodelist. So it is prone to out of sync issues with
ascii-only version.
2) IMHO it must also contain latin-only representations.
Ability to see native representation of information is very good,
but to properly understand it you must know that language.
I understand the desire to keep compatibility with old format but
maybe we need to also introduce xml/json version which would
contain both ascii and utf-8 representations if they are
different.
MvdV> Any response?Just asked R50C about current state of NC's counter and expressed
my desire to join.
Sadly, that method wouldn't work for cyrillic names as there are several standards for names transliteration. Wiki mentions that there are at least five. Above that users often prefer using mixed versions and add something from themselves like 'v' ending becomes 'off'.
Yes. So maybe for now the UTF-8 version needs to be derivative version of ascii version, not the completely parallel replacement. It must be generated from ascii segments and additional information like native names, locations and so on.
I hardly imagine a man who in 2017 doesn't know latin alphabet on transliteration level. But easily imagine the one who doesn't know cyrillic alphabet. I don't even say anything about any asian languages or for example hebrew.
Yes, so as I suggested above we can use ASCII version as a source for "extended" unicode version. And virtually nothing prevents generation of additional versions with different format.
I have been looking into Romanisation of Cyrillic names. Indeed
there are a number of standards. Just to make a start, I looked at
the ICAO tranlation, that is used to translate Cyrillic names to
Latin for use in machine readable passports.
The is some Perl code to do the translation from Cyrillic to
Latin. I tried a reverse routine, that is not complete yet, ad it
probably does a dusgusting translation, but at lest the same comes
out when translated back to Latin. A norn for Fidonet could be to
use the ICAO method for the nodelist as well.
I hardly imagine a man who in 2017 doesn't know latinLets take it one step at a time. There are very few participants
alphabet on transliteration level. But easily imagine the
one who doesn't know cyrillic alphabet. I don't even say
anything about any asian languages or for example hebrew.
in Fidonet that require Asian languages. Cyrillic is difficult
enough with the difference between Russia, Ukrain, en Belarusse.
As for hebrew, all sysops in R40 are of Russian origin.
Yes, so as I suggested above we can use ASCII version as aIt is far easier to degrade an extended version, that it is to
source for "extended" unicode version. And virtually nothing
prevents generation of additional versions with different
format.
enrich a limited one.
Sadly, that method wouldn't work for cyrillic names
as there are several standards for names transliteration. Wiki
mentions that there are at least five. Above that users often prefer
using mixed versions and add something from themselves like 'v' ending becomes 'off'.
And it doesn't solve the problem if one of the segments were lost in transit or not processed by nodelist generator.
Yes. So maybe for now the UTF-8 version needs to be derivative version
of ascii version, not the completely parallel replacement. It must be generated from ascii segments and additional information like native names, locations and so on.
Just asked R50C about current state of NC's counter and
expressed my desire to join.
Nope, not yet.
It is far easier to degrade an extended version, that it is to enrich
a limited one.
Sysops already provided their latin names representation, forcing them to any auto transliteration script isn't a wise move. It can lead to frustration and even anger from sysops whose names suddenly changed without their request. If that thing takes off then I'll just setup a table in database and will store both representations there. Then just generate both versions from that database.
Lets not assume anything based on current population situation. It is a point of UTF-8 nodelist thing to have virtually any alphabet isn't it?
Maybe tomorrow there will be plenty of chinese friends there, who
knows?
Maybe, but as Michael mentioned ascii-only version won't go anywhere. I think currently it's virtually impossible to change segments format and to convince Ward to change his software to produce various versions of nodelist from that one source. So the only way we can go for now is building utf-8 versions from ascii-only nodelist and extend it with additional information if wouldn't want to deal with out of sync issues. Maybe in distant future we can swap them to how it must be - building a limited version from extended one. But we are not there yet.
Sysops already provided their latin names representation, forcing them
to any auto transliteration script isn't a wise move.
It can lead to frustration and even anger from sysops whose names
suddenly changed without their request.
If that thing takes off then I'll just setup a table in database and
will store both representations there. Then just generate both
versions from that database.
So the only way we can go for now is building utf-8 versions from ascii-only nodelist and extend it with additional information if
wouldn't want to deal with out of sync issues.
Sysop: | Nelgin |
---|---|
Location: | Plano, TX |
Users: | 416 |
Nodes: | 10 (0 / 10) |
Uptime: | 15:45:35 |
Calls: | 6,178 |
Files: | 15,724 |
Messages: | 752,646 |