Bug on the message in Chinese (Bugs)

by Solis ⌂ @, China, Monday, November 21, 2016, 16:51 (2684 days ago)

When I submit the message in Chinese, the system always tell me:

Error!
The word "xxx..." is too long.

So, can you fix this bug in the next version?

thanks!

[image]

Avatar

Bug on the message in Chinese, a few questions

by Auge ⌂, Monday, November 21, 2016, 17:34 (2684 days ago) @ Solis

Hello

When I submit the message in Chinese, the system always tell me:

First question: Where you received the error message? Was it in your own installation or here in the project forum? If it is your own forum, you can increase the value of the setting word_max_length. But I am at all not convinced, that this is a bug.

[image]

The box on top (#1) contains the error message with the too long word. The text seems not to have any spaces, when I compare it to the first two lines of text in the textarea in the box #2. There seems to be bigger gaps between the characters. This might be related to the used font, but I can't evaluate it.

Are there spaces " " between the words? Do chinese spaces differ from the spaces in western/latin based character systems?

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

chinese with english punctuation marks

by Solis ⌂ @, China, Wednesday, November 23, 2016, 23:32 (2682 days ago) @ Auge

test
this text is chinese, but with english punctuation marks.

随着社会的发展,各国政治,经济,文化及科学技术交流合作日益增多,这就迫切需要一种国际共同语,以解决世界民族语言的繁杂性所造成的交流障碍. 但现在任何一种民族语言都无法担此重任.这是因为一方面,选用任何一种民族语言,都会对其它民族造成感情上的伤害与事实上的不平等;另一方面,任何一种民 族语都是在长期复杂的历史条件下形成的,不可避免的包含很多不规则 ,不合理的成分,给其它民族的学习者带来极大困难.所以,这种国际共同语必须是中立 ,不属于任何民族,并且简单易学,富有表现力的.而世界语正是这样一种语言.

chinese with chinese punctuation marks

by Solis ⌂ @, China, Wednesday, November 23, 2016, 23:38 (2682 days ago) @ Solis

随着社会的发展,各国政治、经济、文化及科学技术交流合作日益增多,这就迫切需要一种国际共同语,以解决世界民族语言的繁杂性所造成的交流障碍。 但现在任何一种民族语言都无法担此重任。这是因为一方面,选用任何一种民族语言,都会对其它民族造成感情上的伤害与事实上的不平等;另一方面,任何一种民 族语都是在长期复杂的历史条件下形成的,不可避免的包含很多不规则 、不合理的成分,给其它民族的学习者带来极大困难。所以,这种国际共同语必须是中立 、不属于任何民族,并且简单易学、富有表现力的。而世界语正是这样一种语言。

Avatar

A few questions about the chinese writing system

by Auge ⌂, Thursday, November 24, 2016, 08:55 (2682 days ago) @ Solis

Hello Solis

[edit] I should read all new entries before answering one. So please forget my questions. :-)[/edit]

Thank you for supporting my search for a solution.

I copy'n'pasted your texts from both entries into a qualified editor, played a bit around and now, after the playtime, I have a few questions.

As first, I can't read the text. As second, I try to understand the rules of notation.

1. Do you write from left to right or from right to left?
2. In both entries the punctuation marks (comma, semicolon, question mark, exclamation mark) are noted without a space before and after the marks. Is this the normal notation for writing chinese?
3. "" = comma | "" = punctuation mark (dot) | "" = a letter?

If my assumption in the second question is correct, then your whole texts are counted as one word. In languages that are wrote with latin, greek or cyrillic based alphabets it is common, that a punctuation mark is followed by a space (blank) and in some cases there is also a space in front of the mark. So the length of the words are counted from space to space and the words are much shorter than in your examples.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

about text in chinese (appended to the bug report)

by Solis ⌂ @, China, Thursday, November 24, 2016, 00:23 (2682 days ago) @ Solis
edited by Auge, Thursday, November 24, 2016, 11:46

in this forum(http://mylittleforum.net/forum)
when I test the forum by this article:
http://reto.cn/php/blog/wordpress/kio_estas_esperanto
ths system said:
Error!
[image]

so, I replaced all chinese punctuation marks with english punctuation marks, and i add a space after the punctation mark,the message was submited.

two point on chinese:
1. there are no space between the chinese words (chinese character).
2. there are no space after the chinese punctuation mark.

punctuation marks of two languages:
english , . ; "" !
chinese ,。;“”!
[image]

this is the chinese text, but with the english punctuation marks, and with space after the marks:

什么是世界语 Kio Estas Esperanto

世界语 (Esperanto) 是波兰的柴门霍夫博士于1887年公布的一种国际辅助语方案.

随着社会的发展, 各国政治, 经济, 文化及科学技术交流合作日益增多, 这就迫切需要一种国际共同语, 以解决世界民族语言的繁杂性所造成的交流障碍. 但现在任何一种民族语言都无法担此重任. 这是因为一方面, 选用任何一种民族语言, 都会对其它民族造成感情上的伤害与事实上的不平等;另一方面, 任何一种民 族语都是在长期复杂的历史条件下形成的, 不可避免的包含很多不规则 , 不合理的成分, 给其它民族的学习者带来极大困难. 所以, 这种国际共同语必须是中立 , 不属于任何民族, 并且简单易学, 富有表现力的. 而世界语正是这样一种语言.

世界语书写形式采用拉丁字母, 同一个字母在任何单词中的发音都是相同的. 只要学会了28个字母, 掌握了拼音规则, 即可读出和写出任何单词. 世界语的词汇源于自然语言 (主要是印欧语系所属语言)中的国际化成分. 它还通过附加前缀, 后缀及词根间的组合来构成新词. 这不但增强了它的构词能力, 而且大大减少了基本词汇量, 减 轻了人们记忆单词的负担. 世界语的语法规则是在印欧语系的基础上制定的, 简单明了而又不失严谨, 其基本文规只有16条. 由于世界语的上述特点, 使得它比任何一种外语都容易学习. 在世界语的单词中, 每一个字母的音值始终不变, 句子中也没有连读现象, 因而口语及听力比较容易掌握. 熟悉了语法规则, 掌握2000个左右的词根及词缀, 即可自由地通信, 交谈, 毫无困难地进行交流.

随着人类社会的发展, 各民族间的交往也越来越频繁, 人们对国际语的需求也会越来越迫切, 因而从发展前景来看, 世界语有着广泛的前途.

多少人说世界语?没有确切的数字. 大约一千人的母语之一是世界语, 数万人在交往中使它, 十几万人时不时地用它, 几百万人曾经或正在学习这种国际辅助语. 我国学习或曾经学习过世界语的有大约四, 五十万人. 虽然相对于其它一 些大的民族语言, 现在世界语只能算一种小语种, 但这种语言分布地域却很广. 学了世界语可以和世界各地, 各民族的人进行交流.

对世界语感兴趣者, 可与中华全国世界语协会联系:

100037 北京 百万庄路24号 中华全国世界语协会 电话:(010)68326682

或在网上访问: http://verdareto.com http://zh.wikipedia. org/zh-cn/世界语

中国世界语网站绿网
Verda Reto
la ĉina esperanta retejo
http://reto.cn
http://verdareto.com

Avatar

about text in chinese (appended to the bug report)

by Auge ⌂, Thursday, November 24, 2016, 11:45 (2681 days ago) @ Solis
edited by Auge, Thursday, November 24, 2016, 15:08

Hello Solis

I appended your entry to the bug report, because it gives some insights to the problem.

Like I assumed here, is the notation of the chinese literary language without spaces the source of the problem. The code in contrast, assumes a space as the separator. Thatswhy is a whole paragraph counted as one word and that's (in most cases) far too long with the standard setting of text_word_maxlength of 90 characters (and all other pickeable values).

One solution could be, to make a value of 0 for the setting text_word_maxlength disabling the length check. But this might have side effects and it's questionable, if comparable settings like name_word_maxlength etc. must then be disableable too. That's IMHO a real can of worms.

A second solution would be a "setting" (key-value-pair; maybe optional) in the language file, that provides the info about the char(s), that separating words in the concrete language.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

about text in chinese (appended to the bug report)

by Solis ⌂ @, China, Xinjiang, Sunday, December 04, 2016, 12:13 (2671 days ago) @ Auge

Thanks for your help. Coz my English is poor, I write to you in simple English. ( I can speak only in Chinese and Esperanto.)

1. Yes, we write Chinese from left to right, as you write English.

2. Yes, in the normal chinese text, there are no spaces before and after the punctuation marks. In the text, the mark have the same location with the chinese ideogram.

3."、" is not a letter, it is a punctuation mark. In the chinese, it's name is 顿号 [dùn hào]. It's a slight-pause mark used to set off items in a series; used between parallel words or short phrases. if you can read the german, you can visit the page https://de.wikipedia.org/wiki/Aufz%C3%A4hlungskomma

4. Can you read chinses in your computer? if not, you can download the free chinses font from here:
https://sourceforge.net/projects/wqy/files/

Thanks you very much, and I must say, the my little forum is the best forum which I used.

Avatar

about text in chinese (appended to the bug report)

by Auge ⌂, Sunday, December 04, 2016, 14:19 (2671 days ago) @ Solis

Hello

2. Yes, in the normal chinese text, there are no spaces before and after the punctuation marks.

We introduced a new item in the language files named word_delimiters, engineered by Milo. The setting contains nothing (is empty) or a list of delimiters. So it will be possible to cut words at spaces and at delimiters with the next release.

3."、" is not a letter, it is a punctuation mark. In the chinese, it's name is 顿号 [dùn hào]. It's a slight-pause mark used to set off items in a series; used between parallel words or short phrases. if you can read the german, you can visit the page https://de.wikipedia.org/wiki/Aufz%C3%A4hlungskomma

Ok, I will add it to the list of delimiters.

Thanks you very much, and I must say, the my little forum is the best forum which I used.

Thank you for your support.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

RSS Feed of thread