MoA (Mophological Analyzer)
under construction - any link can have no informations.
Version | : | 0.9.0 (alpha)
|
---|
Date | : | 95.7.29 (Sat)
|
---|
Author | : | Byung-Gyu, Chang (bgjang@csone.kaist.ac.kr)
|
---|
| | Jae-Hoon, Kim (jhoon@csking.kaist.ac.kr)
|
---|
X. Warranty
Its version is only 0.9.0. And more, alpha version. Its sources
are originated to debugging but it works, so I release this
program without any certification and with lack of help files
cause it is the time to end this work.
Almost touchable problems in Morphological Analysis in Korean
is solved and some minor speed up/tuning problems are remained but
those can be ignored currently.
If this job is thought as time-consuming job and I cannot do my
study/research, then I can give up to develop this system.. ;P
This program follows CopyLeft of GNU.
0. What is MoA ?
MoA is an Morphological Analyzer for Korean language.
Morphological analyzer(in short, I will call this "MA") is needed
to make some good(?) information retrieval tool or natural
language processing.
For example, in Korean, "Na-Nun Gong-Bu-Lul Han-Da.", then
MA will says to you that "Na+Nun Gong-Bu+Lul Han+Da." according to
their POS(part-of-speech).
MoA is that kinda program and specialized to Korean Language only.
I don't know the scheme used in MoA can be applied to the other
Languages.
1. General introduction
There is public morphological analyzer written by Sang-Ho, Lee,
that is, KTS (Korean Tagging System, you can get it from
cair-archive's its directory).
But it have some disadvantages which it cannot be changed easily
because it is tightly coupled with tagging system.
So I rewrite morphological analyzer(it is my first job in laboratory)
for Korean language with great help from Jae-Hoon, Kim and
Sang-Ho, Lee.
Because I didn't know in detail about the previous effort concentrated to
this kind algorithm, I follow the instructions from Kim. But mainly
I thought to generalize the concepts used in morphological analysis
so the beginner know well in detail and modify the sources by his own
efforts.
2. Features
3. Limitations & Prerequiste to system
- SunOS-KLE is required to processing "lex" cause the "lex"
int SunOS-KLE can handle KSC-5601 character set(I
didn't tried on the other OS).
- delimiter of words are one space. It cause the problem that
spacing error cannot be fixed or can be fixed with heavy difficulty.
- Perl must be installed. Some scripts in program use perl to
processing the source code.
4. Test machine/OS
machine : SS2, OS : SunOS 4.1.2 KLE
5. Main problem
The main problem is listed in "PROBLEMS" file.
And my notes about "what to do?" is in "TODO" file.
TODO file is so clumsy... ;)
Y. thanks to
Seniors :
Musicians :
Z. My future job related to this job
Home of Byung-Gyu Chang
Byoung-Gyu, Chang / bgjang@csone.kaist.ac.kr