• oh those russians

    From Maurice Kinal@1:153/7001 to Konstantin Kuzov on Tue Feb 28 01:40:03 2017
    -={ 2017-02-28 01:40:03.265507043+00:00 }=-

    Hey Konstantin!

    especially in fidonet where dominating codepage is CP866

    Which completely ignores most Russian users which are more than likely using CP1251. The above quote assumes that Russian sysops are content keeping fidonet within a small group of users of Russian abandonware rather than open it up to the larger group of actual Russian users. Also, like everywhere else in the world, utf-8 is the majority of systems on the internet which will likely spread to the majority of users ... eventually. One could easily argue that universal adoption of utf-8 text messaging only makes economic sense since the exact same capable device could be used anywhere by anyone no matter what their native language(s) happen to be.

    By the way, this tagline is 8 bit Russian except NOT CP866 but instead CP1251 which is the most likely to happen in real usuage, and as a matter of fact I have witnessed it happen on a Russian website in the past. It should say the same as the 8 bit Greek (CP737) tagline except in Russian.

    Life is good,
    Maurice

    ...
    --- GNU bash, version 4.4.12(1)-release (x86_64-atom-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)
  • From Konstantin Kuzov@2:5019/40.1 to Maurice Kinal on Tue Feb 28 16:03:42 2017
    Greetings, Maurice!

    especially in fidonet where dominating codepage is CP866
    Which completely ignores most Russian users which are more than likely using CP1251.

    Using CP1251 exactly where? And why don't KOI8-R? Or even ISO-8859-5?

    The above quote assumes that Russian sysops are content keeping fidonet within a small group of users of Russian abandonware rather than open
    it up to the larger group of actual Russian users.

    Until that majority of users switch to modern software which supports unicode it isn't possible to switch to UTF-8. How you want to convince them for necessity of such move?
    Using CP1251 or KOI8-R can also cause issues if receiver doesn't setup xlat tables right. And there are no benefits using other 8bit encoding instead of CP866 in any case.

    Also, like everywhere else in the world, utf-8 is the majority of
    systems on the internet which will likely spread to the majority of
    users ... eventually. One could easily argue that universal adoption of utf-8 text messaging only makes economic sense since the exact same
    capable device could be used anywhere by anyone no matter what their
    native language(s) happen to be.

    No one argues that UTF-8 is better over 8bit encodings. But in fidonet there are currently many technical stoppers exits whose blocking usage of it.

    By the way, this tagline is 8 bit Russian except NOT CP866 but instead CP1251 which is the most likely to happen in real usuage, and as a
    matter of fact I have witnessed it happen on a Russian website in the
    past. It should say the same as the 8 bit Greek (CP737) tagline except
    in Russian.

    Which turned to garbage because of bogus CHRS kludge. But you don't care isn't it?

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Maurice Kinal@1:153/7001 to Konstantin Kuzov on Tue Feb 28 14:34:10 2017
    -={ 2017-02-28 14:34:10.244872030+00:00 }=-

    Hey Konstantin!

    Using CP1251 exactly where? And why don't KOI8-R? Or even
    ISO-8859-5?

    Do you have any statistics to back any of that up? I only went by what I've witnessed in the past.

    At one time the majority of Russian sites appeared to be KOI8-R based but not anymore. I see mostly UTF-8 these days and only one of them required a CHRS kludge. At the time I wondered how many hoops had to be jumped through to cripple the Apache module to 'accept' that silliness. :::snicker:::

    But you don't care isn't it?

    It doesn't matter. It makes absolutely zero difference if I do or don't care.

    Life is good,
    Maurice

    ... Se ðe him ealne weg ondræt, se bið swylce he sy ealne weg cwellende.
    He who is always afraid is like one who is always dying.
    --- GNU bash, version 4.4.12(1)-release (x86_64-atom-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)
  • From Konstantin Kuzov@2:5019/40.1 to Maurice Kinal on Wed Mar 1 13:35:58 2017
    Greetings, Maurice!

    Using CP1251 exactly where? And why don't KOI8-R? Or even
    ISO-8859-5?
    Do you have any statistics to back any of that up? I only went by what I've witnessed in the past.

    Why I need to back anything up? It was yours assumption that most Russian users using CP1251.

    At one time the majority of Russian sites appeared to be KOI8-R based
    but not anymore. I see mostly UTF-8 these days and only one of them required a CHRS kludge. At the time I wondered how many hoops had to be jumped through to cripple the Apache module to 'accept' that silliness. :::snicker:::

    What apache module? Webserver shouldn't care about content it serves. Just as a fidonet's mailers doesn't care about bundles content.
    If site administrator used some clutches like mod_charset_lite then his backends were lacking. For example old mysql database.

    It doesn't matter. It makes absolutely zero difference if I do or don't care.

    If you don't care that you messages could not be read correctly then you can continue to write with bogus or missing codepage information. But if it was in my power I would have restricted such traffic and treat it as a case of 1.3.5 with all its consequences.

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Michiel van der Vlist@2:280/5555 to Konstantin Kuzov on Wed Mar 1 16:33:47 2017
    Hello Konstantin,

    On Wednesday March 01 2017 13:35, you wrote to Maurice Kinal:

    Using CP1251 exactly where? And why don't KOI8-R? Or even
    ISO-8859-5?

    Do you have any statistics to back any of that up? I only went by
    what I've witnessed in the past.

    Why I need to back anything up? It was yours assumption that most
    Russian users using CP1251.

    Indeed, it was he who claimed wide spread use of CP1251. The onus of proof is on the one making the claim.

    All I can say is that I have never seen a message encoded in CP1251 in Fidonet. What I see is CP437, CP850, Latin-1, and CP866.

    At one time the majority of Russian sites appeared to be KOI8-R
    based but not anymore. I see mostly UTF-8 these days

    While it is true that most of the World Wide Web has migrated to UTF-8, we should keep in mind that Fidonet is not the Internet. As you have pointed out, the vast majority of Fidonet messages still use 8 bit encoding. The UTF-8 evangelists in here may wish it different, but the reality is that we still have a long way to go before UTF-8 will be the dominant encoding for Fidonet and it is quit likely that will never happen.

    If you don't care that you messages could not be read correctly then
    you can continue to write with bogus or missing codepage information.
    But if it was in my power I would have restricted such traffic and
    treat it as a case of 1.3.5 with all its consequences.

    See the new area rules published earier today.


    Cheers, Michiel

    --- GoldED+/W32-MSVC 1.1.5-b20130111
    * Origin: Blóf Tønón (2:280/5555)
  • From Konstantin Kuzov@2:5019/40.1 to Michiel van der Vlist on Thu Mar 2 12:37:16 2017
    Greetings, Michiel!

    All I can say is that I have never seen a message encoded in CP1251 in Fidonet. What I see is CP437, CP850, Latin-1, and CP866.

    A decade ago KOI8-R/KOI8-U messages were pretty common in some areas.
    Today many non-windows users still setups their golded for KOI8-R<->CP866 recoding despite the fact that most of them also use luit/screen/terminal translator for UTF-8.

    While it is true that most of the World Wide Web has migrated to UTF-8,
    we should keep in mind that Fidonet is not the Internet. As you have pointed out, the vast majority of Fidonet messages still use 8 bit encoding. The UTF-8 evangelists in here may wish it different, but the reality is that we still have a long way to go before UTF-8 will be the dominant encoding for Fidonet and it is quit likely that will never happen.

    If we want for widespread UTF-8 usage in fidonet then we must fix the lack of unicode-capable editors first. Especially console ones. Currently there are basically only msged which after golded is too limited and ugly. After that enforce UTF-8 usage on echoes distributed through backbones. Don't accept echoes with rules restricting writing to specific 8bit codepages. Treat complains about not being able to read UTF-8 messages as 1.3.5. If there are enough UTF-8 traffic users will switch to unicode-capable software and the problem will be solved.

    But in reality it is not much easier than adopt new fidonet policy.

    If you don't care that you messages could not be read correctly then
    you can continue to write with bogus or missing codepage information.
    But if it was in my power I would have restricted such traffic and
    treat it as a case of 1.3.5 with all its consequences.
    See the new area rules published earier today.

    He-he... Nice one. ^_^

    But I talked about more higher level restriction. Like treat it as technical violation in any echoarea and if writers of such messages doesn't want to fix that problem than just excommunicate them.

    P.S. What's written there?
    * Origin: Blóf Tønón

    There are some strange 0x7F character:
    0620 20 4F 72 69 67 69 6E 3A 20 42 6C 7F C3 B3 66 20
    0630 54 C3 B8 6E 7F C3 B3 6E 20 28 32 3A 32 38 30 2F

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Michiel van der Vlist@2:280/5555 to Konstantin Kuzov on Thu Mar 2 19:46:51 2017
    Hello Konstantin,

    On Thursday March 02 2017 12:37, you wrote to me:

    The UTF-8 evangelists in here may wish it different, but the reality
    is that we still have a long way to go before UTF-8 will be the
    dominant encoding for Fidonet and it is quit likely that will never
    happen.

    If we want for widespread UTF-8 usage in fidonet then we must fix the
    lack of unicode-capable editors first.

    There is something you can do now to promote UTF-8 in Fidonet. You can join the UTF-8 nodelist project. For some time now, ZC2 distributes a UTF-8 version of the nodelist in parallel witg the old ASCII only list. It is called DAILYUTF. Link into the DAILYUTF filw area here...

    So far participation is limited, but it would be a great push for the project if R50 were to join. IIRC RC50 has said that he would join if five NCs in R50 would express the wish to join. (Or was it four?) NC5075 has already stated he would join and he has an utfn5075.nnn ready for submission. So how about doing some lobbying among your fellow NCs in R50?


    Cheers, Michiel

    --- GoldED+/W32-MSVC 1.1.5-b20130111
    * Origin: Blf Tnn (2:280/5555)
  • From Michiel van der Vlist@2:280/5555 to Konstantin Kuzov on Thu Mar 2 22:17:28 2017
    Hello Konstantin,

    On Thursday March 02 2017 12:37, you wrote to me:

    P.S. What's written there?
    * Origin: Blóf Tønón

    It is supposed to read "Blijf Tonijn" with the 'ij' replaced by the ligature' (code point U+0133) and the 'o' replaced by an o with slash.

    There are some strange 0x7F character:
    0620 20 4F 72 69 67 69 6E 3A 20 42 6C 7F C3 B3 66 20
    0630 54 C3 B8 6E 7F C3 B3 6E 20 28 32 3A 32 38 30 2F

    Obviously something went wrong in code translation. There shoulf be no '7F' and the code should be "C4 B3", not "C3 B3".

    The 'ij' is the 25th letter in the Dutch alfabet. It takes the place of the 'y' in the English alfabet.

    https://graphemica.com/%C4%B3

    Today it is normally written as the bigraph 'ij' but on old Dutch typewriters it was one character. There is no glyph for it in any of the 8 bit character sets that I know of and as a result the Dutch ligature 'ij' became archaic with the coming of computers.

    With Unicode it came back. So I thought this was good for my origin line.But apparently there is a snake in the grass....



    Cheers, Michiel

    --- GoldED+/W32-MSVC 1.1.5-b20130111
    * Origin: Blijf Tønijn (2:280/5555)
  • From Konstantin Kuzov@2:5019/40.1 to Michiel van der Vlist on Fri Mar 3 10:06:54 2017
    Greetings, Michiel!

    If we want for widespread UTF-8 usage in fidonet then we
    must fix the lack of unicode-capable editors first.
    There is something you can do now to promote UTF-8 in Fidonet.
    You can join the UTF-8 nodelist project. For some time now, ZC2
    distributes a UTF-8 version of the nodelist in parallel witg the
    old ASCII only list.

    Through I'm not a big fan of the idea of UTF-8 nodelist project in it's current state because:
    1) It is completely separated version of nodelist. So it is prone to out of sync issues with ascii-only version.
    2) IMHO it must also contain latin-only representations. Ability to see native representation of information is very good, but to properly understand it you must know that language. If you doesn't know it then you just couldn't read it and most likely couldn't type it. So it renders it basically useless. I understand the desire to keep compatibility with old format but maybe we need to also introduce xml/json version which would contain both ascii and utf-8 representations if they are different.

    But despite that why not to join...

    So far participation is limited, but it would be a great push for
    the project if R50 were to join. IIRC RC50 has said that he would
    join if five NCs in R50 would express the wish to join. (Or was
    it four?)

    Yes, five:
    area://R50.SYSOP?msgid=2:5000/363+58b7c32a

    NC5075 has already stated he would join and he has an
    utfn5075.nnn ready for submission. So how about doing some
    lobbying among your fellow NCs in R50?

    Just asked R50C about current state of NC's counter and expressed my desire to join.

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Konstantin Kuzov@2:5019/40.1 to Konstantin Kuzov on Fri Mar 3 10:14:44 2017
    Greetings, Konstantin!

    So far participation is limited, but it would be a great push
    for the project if R50 were to join. IIRC RC50 has said that he
    would join if five NCs in R50 would express the wish to join.
    (Or was it four?)
    Yes, five:
    area://R50.SYSOP?msgid=2:5000/363+58b7c32a

    Oops... wrong link, that's the right one: area://R50.SYSOP?msgid=2:5020/715.1+57ee36a5

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Konstantin Kuzov@2:5019/40.1 to Michiel van der Vlist on Fri Mar 3 13:23:38 2017
    Greetings, Michiel!

    Obviously something went wrong in code translation. There shoulf
    be no '7F' and the code should be "C4 B3", not "C3 B3".
    ...
    With Unicode it came back. So I thought this was good for my
    origin line.But apparently there is a snake in the grass....

    So all messages that you written in 2016 contained correct one.
    It started happening recently with message about extra space.
    There are also the message with this variation:

    * Origin: BlЇf TЫnЇn
    area://UTF-8?msgid=2:280/5555+58b86a8d

    But looks like it fixed now ^_^
    * Origin: Blijf Tønijn

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Nicholas Boel@1:154/10 to Michiel van der Vlist on Fri Mar 3 06:51:52 2017
    On 3/2/2017 3:17 PM, Michiel van der Vlist -> Konstantin Kuzov wrote:

    MvdV> With Unicode it came back. So I thought this was good for my origin
    MvdV> line.But apparently there is a snake in the grass....

    MvdV> --- GoldED+/W32-MSVC 1.1.5-b20130111
    MvdV> * Origin: Blijf Tønijn (2:280/5555)

    I was wondering the same when I first read the message Konstantin replied to. This one came through fine, though.

    --
    Regards,
    Nick

    --- Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45
    * Origin: thePharcyde_ distribution system (1:154/10)
  • From Michiel van der Vlist@2:280/5555 to Konstantin Kuzov on Fri Mar 3 14:50:25 2017
    Hello Konstantin,

    On Friday March 03 2017 10:06, you wrote to me:

    Through I'm not a big fan of the idea of UTF-8 nodelist project in
    it's current state because: 1) It is completely separated version of nodelist. So it is prone to out of sync issues with ascii-only
    version.

    Yes, there is that risk. For my own net, I have minimised the risk by only editing the UTF-8 version and derive the ASCII version from the UTF-8 version with a SED script. So they are always in sync.

    http://www.vlist.eu/downloads/fidolist/ascifi.sed

    When we would redesign Fidonet from the start, there are a lot of things we would do different. One of them would be the nodelist. As it is, we are stuck with decisions made in the past. Changing the nodelist format will involve the cooperation of everyone in Fidonet and that is just not going to happen. There are too many that use software thet depends on the nodelist as it is. So the ASCII only nodelist as it is, is not going away for the forseeable future, so everything alse - whatever it is - will have to be a parallel distribution. No way around it.

    2) IMHO it must also contain latin-only representations.
    Ability to see native representation of information is very good, but
    to properly understand it you must know that language.

    But that goes both ways. Someone from Russia, or any other country with a native language that uses the Cyrillic alfabet, who does not know English or any other language that uses the Latin alfabet, can not properly understand the information in the ASCII only nodelist either.

    English may still be the official language of Fidonet, it is no longer the dominant language of Fidonet. It has not been the dominant language for many years. Well over a decade I'd say. Same for the alfabet. Latin is no longer the dominant alfabet in Fido, most messages are written in Cyrillic these days. Don't be shy. Your alfabet deserves to be seen. If not in the official ASCII only nodelist, then at least in the UTF-8 version.

    If you doesn't know it then you just couldn't read it and most likely couldn't type it.

    That need not be true. I do not master the Russian language, but I have familiarised myself with the Cyrillic alfabet. I can read names in Cyrillic. .. sort of..

    I can even type it, I have installed the proper keyboard driver, I may be the exception. But in orer to read non Latin characters, one does need to be able to type them.

    So it renders it basically useless.

    I disagree that it is useless.

    I have Golded compile both the ASCII nodelist and a non ASCII nodelist derived from the UTF nodelist. That way the nodelist lookup in Golded shows me both the ASCII and the non ASCII version of the sysop names. You could do the same for Cyrillic names.

    I understand the desire to keep compatibility with old format but
    maybe we need to also introduce xml/json version which would contain
    both ascii and utf-8 representations if they are different.

    At present I think that is a bridge too far. Plus that we would still have the sync problem, the old ASCII only nodelist is not going away. Not for a long time.

    But despite that why not to join...

    Indeed, that is what is avaiialble now, so why not use it? So welcome to the club!

    Just asked R50C about current state of NC's counter and expressed my desire to join.

    Any response?

    Do you know any other NCs in R50 that maybe would like to join?


    Cheers, Michiel

    --- GoldED+/W32-MSVC 1.1.5-b20130111
    * Origin: Blijf Tønijn (2:280/5555)
  • From Konstantin Kuzov@2:5019/40.1 to Michiel van der Vlist on Sun Mar 5 20:12:14 2017
    Konnichi wa, *Michiel-kun*! Aogu manako oyobi uketamawaru waga koe!
    Tomodachi _Michiel van der Vlist_ tsukuru airon _Konstantin Kuzov_
    Nichiji - /*03 17 14:50*/, Daizai - /*UTF-8 nodelist*/:

    Through I'm not a big fan of the idea of UTF-8 nodelist project
    in it's current state because: 1) It is completely separated
    version of nodelist. So it is prone to out of sync issues with
    ascii-only version.

    MvdV> Yes, there is that risk. For my own net, I have minimised the risk by
    MvdV> only editing the UTF-8 version and derive the ASCII version from the
    MvdV> UTF-8 version with a SED script. So they are always in sync.
    MvdV> http://www.vlist.eu/downloads/fidolist/ascifi.sed

    Sadly, that method wouldn't work for cyrillic names as there are several standards for names transliteration. Wiki mentions that there are at least five. Above that users often prefer using mixed versions and add something from themselves like 'v' ending becomes 'off'.

    And it doesn't solve the problem if one of the segments were lost in transit or not processed by nodelist generator.

    MvdV> When we would redesign Fidonet from the start, there are a lot of
    MvdV> things we would do different. One of them would be the nodelist. As it
    MvdV> is, we are stuck with decisions made in the past. Changing the
    MvdV> nodelist format will involve the cooperation of everyone in Fidonet
    MvdV> and that is just not going to happen. There are too many that use
    MvdV> software thet depends on the nodelist as it is. So the ASCII only
    MvdV> nodelist as it is, is not going away for the forseeable future, so
    MvdV> everything alse - whatever it is - will have to be a parallel
    MvdV> distribution. No way around it.

    Yes. So maybe for now the UTF-8 version needs to be derivative version of ascii version, not the completely parallel replacement. It must be generated from ascii segments and additional information like native names, locations and so on.

    2) IMHO it must also contain latin-only representations.
    Ability to see native representation of information is very good,
    but to properly understand it you must know that language.

    MvdV> But that goes both ways. Someone from Russia, or any other country
    MvdV> with a native language that uses the Cyrillic alfabet, who does not
    MvdV> know English or any other language that uses the Latin alfabet, can
    MvdV> not properly understand the information in the ASCII only nodelist
    MvdV> either.

    I hardly imagine a man who in 2017 doesn't know latin alphabet on transliteration level. But easily imagine the one who doesn't know cyrillic alphabet. I don't even say anything about any asian languages or for example hebrew.

    I understand the desire to keep compatibility with old format but
    maybe we need to also introduce xml/json version which would
    contain both ascii and utf-8 representations if they are
    different.

    MvdV> At present I think that is a bridge too far. Plus that we would still
    MvdV> have the sync problem, the old ASCII only nodelist is not going away.
    MvdV> Not for a long time.

    Yes, so as I suggested above we can use ASCII version as a source for "extended" unicode version. And virtually nothing prevents generation of additional versions with different format.

    Just asked R50C about current state of NC's counter and expressed
    my desire to join.
    MvdV> Any response?

    Nope, not yet.

    MvdV> Do you know any other NCs in R50 that maybe would like to join?

    Sadly, no. I weren't particularly active in fidonet for the last 7-8 years. Most of the people I knew are long retired from fidonet.

    Ganbatte, *Michiel*!

    [_N0SF3R@TU_]
    ... GoldED-NSF/LNX 1.1.5-b20140107 (Linux 4.10.1-gentoo iF6M63)
    --- #[Kaori Sekken: Master.NoSFeRaTU[@]Gmail.com] [Kumi Nyaa]#
    * Origin: Ojisan, oriru mottekuru suna oyobi korosu sagaru kabe (2:5019/40.1)
  • From Kees van Eeten@2:280/5003.4 to Konstantin Kuzov on Mon Mar 6 00:13:14 2017
    Hello Konstantin!

    05 Mar 17 20:12, you wrote to Michiel van der Vlist:

    Sadly, that method wouldn't work for cyrillic names as there are several standards for names transliteration. Wiki mentions that there are at least five. Above that users often prefer using mixed versions and add something from themselves like 'v' ending becomes 'off'.

    I have been looking into Romanisation of Cyrillic names. Indeed there
    are a number of standards. Just to make a start, I looked at the ICAO
    tranlation, that is used to translate Cyrillic names to Latin for
    use in machine readable passports.

    The is some Perl code to do the translation from Cyrillic to Latin.
    I tried a reverse routine, that is not complete yet, ad it probably does
    a dusgusting translation, but at lest the same comes out when translated
    back to Latin. A norn for Fidonet could be to use the ICAO method for the
    nodelist as well.

    Yes. So maybe for now the UTF-8 version needs to be derivative version of ascii version, not the completely parallel replacement. It must be generated from ascii segments and additional information like native names, locations and so on.

    One way or the other.

    I hardly imagine a man who in 2017 doesn't know latin alphabet on transliteration level. But easily imagine the one who doesn't know cyrillic alphabet. I don't even say anything about any asian languages or for example hebrew.

    Lets take it one step at a time. There are very few participants in Fidonet
    that require Asian languages. Cyrillic is difficult enough with the
    difference between Russia, Ukrain, en Belarusse.

    As for hebrew, all sysops in R40 are of Russian origin.

    Yes, so as I suggested above we can use ASCII version as a source for "extended" unicode version. And virtually nothing prevents generation of additional versions with different format.

    It is far easier to degrade an extended version, that it is to enrich a
    limited one.

    Kees

    --- GoldED+/LNX 1.1.5
    * Origin: As for me, all I know is that, I know nothing. (2:280/5003.4)
  • From Konstantin Kuzov@2:5019/40.1 to Kees van Eeten on Mon Mar 6 10:11:10 2017
    Greetings, Kees!

    I have been looking into Romanisation of Cyrillic names. Indeed
    there are a number of standards. Just to make a start, I looked at
    the ICAO tranlation, that is used to translate Cyrillic names to
    Latin for use in machine readable passports.
    The is some Perl code to do the translation from Cyrillic to
    Latin. I tried a reverse routine, that is not complete yet, ad it
    probably does a dusgusting translation, but at lest the same comes
    out when translated back to Latin. A norn for Fidonet could be to
    use the ICAO method for the nodelist as well.

    Sysops already provided their latin names representation, forcing them to any auto transliteration script isn't a wise move. It can lead to frustration and even anger from sysops whose names suddenly changed without their request.
    If that thing takes off then I'll just setup a table in database and will store both representations there. Then just generate both versions from that database.

    I hardly imagine a man who in 2017 doesn't know latin
    alphabet on transliteration level. But easily imagine the
    one who doesn't know cyrillic alphabet. I don't even say
    anything about any asian languages or for example hebrew.
    Lets take it one step at a time. There are very few participants
    in Fidonet that require Asian languages. Cyrillic is difficult
    enough with the difference between Russia, Ukrain, en Belarusse.
    As for hebrew, all sysops in R40 are of Russian origin.

    Lets not assume anything based on current population situation. It is a point of UTF-8 nodelist thing to have virtually any alphabet isn't it? Maybe tomorrow there will be plenty of chinese friends there, who knows? And they will be dealing with the crippled legacy not better than we currently have. Ability to read any information is important. And English language as the "global" language is the best suited for that.

    Yes, so as I suggested above we can use ASCII version as a
    source for "extended" unicode version. And virtually nothing
    prevents generation of additional versions with different
    format.
    It is far easier to degrade an extended version, that it is to
    enrich a limited one.

    Maybe, but as Michael mentioned ascii-only version won't go anywhere. I think currently it's virtually impossible to change segments format and to convince Ward to change his software to produce various versions of nodelist from that one source. So the only way we can go for now is building utf-8 versions from ascii-only nodelist and extend it with additional information if wouldn't want to deal with out of sync issues. Maybe in distant future we can swap them to how it must be - building a limited version from extended one. But we are not there yet.

    --- Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
    * Origin: Via 2:5019/40 NNTP (GaNJaNET STaTi0N, Smolensk) (2:5019/40.1)
  • From Michiel van der Vlist@2:280/5555 to Konstantin Kuzov on Mon Mar 6 12:51:49 2017
    Hello Konstantin,

    On Sunday March 05 2017 20:12, you wrote to me:

    MvdV>> Yes, there is that risk. For my own net, I have minimised the
    MvdV>> risk by only editing the UTF-8 version and derive the ASCII
    MvdV>> version from the UTF-8 version with a SED script. So they are
    MvdV>> always in sync.
    MvdV>> http://www.vlist.eu/downloads/fidolist/ascifi.sed

    Sadly, that method wouldn't work for cyrillic names

    I realize that a simple character by character substition that I use for my
    net would not work in your case. It was just an example of how I did it.

    as there are several standards for names transliteration. Wiki
    mentions that there are at least five. Above that users often prefer
    using mixed versions and add something from themselves like 'v' ending becomes 'off'.

    It would ultimately be the responsibility of the NC or in the case of large nets, the hub coordinator to do it in such a way that satisfies the needs of the sysops in the net.

    I do not think there is a universal solution and I do not think there need to be one. In the case of small nets like yours, an individual name by name replacement will do. Like:

    s/Смоленск/Smolensk/ s/Константин_Кузов/Konstantin_Kuzov/
    s/Андрей Пахоментов/Andrey_Pakhomentov/ s/алексей_сивакофф/alexey_sivakoff/

    And it doesn't solve the problem if one of the segments were lost in transit or not processed by nodelist generator.

    In the rare case that one of the two is lost and the other is processed, it will be solved the next day, as the UTF list is distributed on a daily basis.

    Where is that pioneer spirit of the founding fathers of Fidonet? Did Tom Jennings first make an exhaustive list of what could go wrong, or did he just launch the Fidonet Project?

    We can spend all day inventing problems, or we can just jump aboard and see where it gets us.

    Yes. So maybe for now the UTF-8 version needs to be derivative version
    of ascii version, not the completely parallel replacement. It must be generated from ascii segments and additional information like native names, locations and so on.

    Whatever the NC of the net finds convenient. I see no justification for "regulating". We can offer suggestions, but let the NC do the job his way.

    Just asked R50C about current state of NC's counter and
    expressed my desire to join.

    MvdV>> Any response?

    Nope, not yet.

    So keep nagging him. ;-)


    Cheers, Michiel

    --- GoldED+/W32-MINGW 1.1.5-b20110320
    * Origin: Blijf Tønijn (2:280/5555)
  • From Michiel van der Vlist@2:280/5555 to Kees van Eeten on Mon Mar 6 16:09:39 2017
    Hello Kees,

    On Monday March 06 2017 00:13, you wrote to Konstantin Kuzov:

    It is far easier to degrade an extended version, that it is to enrich
    a limited one.

    That is what I thought too. And so I derive the net 280 ASCII segment from the UTF segment. But what works for me may not work for someone else. I certainly don't want to impose my way of doing on others. The main reason I do it this way is that I am lazy. This way I just have to edit one file when making changes.

    My general principle is "do not regulate what needs no regulation". So far the UTF and ASCII lists getting out of sync is just a theoretical problem. Automatic translation from UTF to ASCII or the other way around may be convenient for some, and just maintaining two seperate lists may be convenient for others. Let's just leave it to the NCs. Let them decide what method is best for them and the sysops in their net.


    Cheers, Michiel

    --- GoldED+/W32-MSVC 1.1.5-b20161221
    * Origin: Blijf Tønijn (2:280/5555)
  • From Kees van Eeten@2:280/5003.4 to Konstantin Kuzov on Mon Mar 6 15:54:04 2017
    Hello Konstantin!

    06 Mar 17 10:11, you wrote to me:

    Sysops already provided their latin names representation, forcing them to any auto transliteration script isn't a wise move. It can lead to frustration and even anger from sysops whose names suddenly changed without their request. If that thing takes off then I'll just setup a table in database and will store both representations there. Then just generate both versions from that database.

    I suppose that is the ultimate solution to keep everybody happy. But I am
    experimenting with what a UTF-8 nodelist would look like, and doing that
    I do not care about personal sentiments.

    Lets not assume anything based on current population situation. It is a point of UTF-8 nodelist thing to have virtually any alphabet isn't it?

    In the end we will all have heaven on earth. What some of us are doing now
    is to demonstrate that it is worth while to strive for it.

    Maybe tomorrow there will be plenty of chinese friends there, who
    knows?

    The current available Fidonet software doen not make that a likely event,
    and history has shown that style of communication does not fit with their
    culture. For upcoming communities, there are far better ways for social
    networking available.

    Maybe, but as Michael mentioned ascii-only version won't go anywhere. I think currently it's virtually impossible to change segments format and to convince Ward to change his software to produce various versions of nodelist from that one source. So the only way we can go for now is building utf-8 versions from ascii-only nodelist and extend it with additional information if wouldn't want to deal with out of sync issues. Maybe in distant future we can swap them to how it must be - building a limited version from extended one. But we are not there yet.

    Sync issues are just that, we have lived with them ever since we changed from
    an IC issued nodelist to the ZC issued versions. In the current daily issues,
    it is inevetible. And who cares for a short out of sync, when NC's fail to
    correct errors for two or three months.

    As for the use of a UTF-8 nodelist, with current software, I also see very
    little use. But if there is some progress to be made, one has to start
    somewhere. It s a chicken and egg problem. No one will tackel the full
    solution. Software that can make use of UTF-8 will not fall out of the
    blue sky. There have to be some steps made, that will challenge others to
    to do the next step.

    I do not beleive UTF-8 will concour Fidonet, but it is interesting to see
    if it can be done at all.

    Kees

    --- GoldED+/LNX 1.1.5
    * Origin: As for me, all I know is that, I know nothing. (2:280/5003.4)
  • From Michiel van der Vlist@2:280/5555 to Konstantin Kuzov on Mon Mar 6 16:29:09 2017
    Hello Konstantin,

    On Monday March 06 2017 10:11, you wrote to Kees van Eeten:

    Sysops already provided their latin names representation, forcing them
    to any auto transliteration script isn't a wise move.

    Nobody is forcing anyone. It is up to the NCs. Automatic transliteration may be a good idea for soem, for others it may not.

    It can lead to frustration and even anger from sysops whose names
    suddenly changed without their request.

    Indeed, and that should be avoided.

    If that thing takes off then I'll just setup a table in database and
    will store both representations there. Then just generate both
    versions from that database.

    Sounds good...

    So the only way we can go for now is building utf-8 versions from ascii-only nodelist and extend it with additional information if
    wouldn't want to deal with out of sync issues.

    I think you are looking for a solution to a non-existant problem. As yet there is no sunc problem. Let us cross that bridge when we get to it. Maybe there is no bridge.


    Cheers, Michiel

    --- GoldED+/W32-MSVC 1.1.5-b20161221
    * Origin: Blijf Tønijn (2:280/5555)