It is currently not yet ready for the end-user, but propably interesting for software developers. |
TextBreak is a language-Independent textual breaking module, which is a program that segment text into smaller units, word - for instance, for all languages (e.g. English, Chinese and etc. ) by one engine.
Project pages[]
https://gna.org/projects/textbreak/
Design[]
Overview[]
Suite[]
Result[]
Implementation strategy[]
This diagram show the development strategy of TextBreak. There 3 sub-projects that are running simultaneously. Since implementation TextBreak in C is pretty difficult. Thus the prototype in Python was built before building fully implementation in C. However, there is some modules have been written in C already. For instance, Dict, which is dictionary in Trie structure. In order to integrate them, Python binding is built. At the last phase, the prototype will be ported into C.
Status[]
It is not usable yet.