| บทคัดย่อ(English) |
In this thesis, the guidelines for performancecomparison of Thai Words Separation-Programs have beenanalyzed. The thesis begin from synthesis of example ofperformance indicators, study the characteristics ofThai documents that effect performance of the ThaiWords Separation Programs. Then, collect-Thai WordsSeparation Programs and algorithms that had beendeveloped and announced to be used currently, collectthe Thai.. reference data which include the referencedictionary to validate the accuracy of Thaiwords-separation, and-develop the measurementmethodology. Finally, I do the performance measurementusing the developed methodology. Experimental results show that the LongestPattern Matching' gives the-most-accurate words-outputwhile the Back-Tracking Algorithm:gives the least errorwords. Words Usage Frequency gives the highest validwords ratio per number of words in its dictionary. Theusage of ambiguity dictionary gives the:best ambiguouscase resolution, whereas the Shortest Pattern Matchinggives the highest number of words output. Additionally,it is found that the data structure for dictionary thatused in Thai Words Separation Programsextremely-effects in term of speed, meanwhilethe-Trie structure is the most popular method that hasbeen used in the present due to its outperformed speed. |