OpenSource
Advertisement
Applications-development
This software project is work in progress.

It is currently not yet ready for the end-user, but propably interesting for software developers.

TextBreak is a language-Independent textual breaking module, which is a program that segment text into smaller units, word - for instance, for all languages (e.g. English, Chinese and etc. ) by one engine.

Project pages[]

https://gna.org/projects/textbreak/

Design[]

Overview[]

TextBreakOverview

Textbreak overview

Suite[]

TextBreakSuite

Breaking suites

Result[]

TextBreakResult

Breaking result

Implementation strategy[]

This diagram show the development strategy of TextBreak. There 3 sub-projects that are running simultaneously. Since implementation TextBreak in C is pretty difficult. Thus the prototype in Python was built before building fully implementation in C. However, there is some modules have been written in C already. For instance, Dict, which is dictionary in Trie structure. In order to integrate them, Python binding is built. At the last phase, the prototype will be ported into C.

TextBreakStrategy

Implementation strategy of TextBreak

Status[]

It is not usable yet.

Advertisement