
T.C.
YEDİTEPE UNIVERSITY
FACULTY OF ENGINEERING
DEPARTMENT
OF COMPUTER ENGINEEERING
TEMPLATE GENERATOR USING NATURAL LANGUAGE
(TEG-NALAN)
by
Z.İlknur Karadeniz
ENGINEERING PROJECT REPORT
Approved by:
Asst. Prof. Dr. Ender
Özcan
(Supervisor)
Prof. Dr. Şebnem
Baydere
Dr.Birol Aygün
Date of Approval: 30 / 12 / 2003
TABLE of CONTENTS
2. 1. Natural Language Processing and Machine Learning
2.2. Template and Code Generation
2.2.2. A Natural Language
Interface for Programming in Java (NaturalJava)
3. NATURAL LANGUAGE PROCESSING(NLP)
4. THE COMPONENTS OF TEG-NALAN
4.1.Augmented Transition Network (ATN)
5. REQUIREMENTS and PROGRAMMING
ENVIRONMENT
6.2. Data Flow and Data Design
7. MORPHOLOGY, SYNTAX AND SEMANTICS
7.2.Class Declaration Sentences
7.3. Attribute Declaration Sentences
7.4. Method Declaration Sentences
7.5. Hiearchy Declaration Sentences
LIST OF FIGURES
Figure 2. 1Target Scenario
of TOY
5
Figure 2.
2 Results of a Query Given to TuSA 5
Figure 2.
3 Rational Rose Interface
7
Figure 2.
4 Architecture of NaturalJava
8
Figure 3.
1 Turkish Letters
10
Figure 3.
2 Turkish Morphology Example
11
Figure 3.
3 Suffix Examples
11
Figure 3.
4 Example Sentences (Ambiguity)
11
Figure 3.
5 Turkish Syntactic Categories
12
Figure 3.
6 Word Order
12
Figure 3.
7 Inverted Sentences
12
Figure 4.
1 ISA Hierarchy
14
Figure 4.
2 HASA Hierarchy
14
Figure 6.
1 Library Hierarchies in TEG-NALAN 17
Figure 6.
2 Data Flow Diagram
18
Figure 6.
3 Architecture of TEG-NALAN
19
Figure 7.
1 Word Entries 20
Figure 7.
2 Suffix Changes
20
Figure 7.
3 Example of a Class Definition Sentence
21
Figure 7.
4 Example of an Interface Definition Sentence
21
Figure 7.
5 ATN for Class Declaration 21
Figure 7.
6 Example of an Undetailed Attribute Declaration Sentence 22
Figure 7.
7 Example of a Detailed Attribute Declaration Sentence 22
Figure 7.
8 ATN for Attribute Declaration 23
Figure 7.
9 ATN for Detailed Method Declaration 24
Figure 7.
10 Method Declaration with Parameters 24
Figure 7.
11Example of Detailed Method Declaration Sentence 25
Figure 7.
12 ATN for Detailed Method Declaration 25
Figure 7.
13 Example of Hierarchy Declaration Sentence 26
Figure 7.
14 ATN for ISA Relationship Declaration 27
Figure 8.
1 Data Structure for Attributes 27
Figure 8.
2 Data Structure for Methods
28
Figure 8. 3 Data Structure
for Classes
30
LIST OF ABBREVIATIONS
|
AI |
Artificial Intelligence |
|
ATN |
Augmented Transition Network |
|
NLP |
Natural Language Processing |
|
TEG-NALAN |
Template Generator Using Natural language |
|
TuSA |
Turkish Speaking Assistant |
ACKNOWLEDGEMENTS
At first, I would like to thank my advisor
Ender Özcan for his guidance and encouragement from the very beginning until
the end. I am truly grateful for Şadi Evren Şeker’s valuable suggestions,
comments and assistance at every step of my study. Special thanks are due to
Şeniz Demir, for her guidance in Prolog and natural language concepts. Finally,
I want to thank my family for their love and support during my entire education
and life.
TEG-NALAN: Template Generator
Using Natural Language
Natural Language Processing
(NLP) is a subfield of Artificial Intelligence whose aim is the use of
computers to understand natural languages such as Turkish, English, or Italian.
In this report, an intelligent natural language interface based on Turkish
Language is designed for creating Java class skeleton, listing the class and
its members. This interface which is identified as TEG-NALAN (Template
Generator Using Natural Language) is developed as a part of a project named as
TUJA (Java Code Generator Using Turkish), a tool for producing Java programs
using Turkish sentences. TEG-NALAN uses mainly three components to achieve its
goal; augmented transition network (ATN), knowledge database and java code
generator. Turkish sentences are converted into instances of schemata
(attribute, method or class) and inserted into knowledge database by ATN, which
utilizes concept hierarchies. Then, java code generator produces the output,
Java class skeleton, retracting the required knowledge from that database.
ÖZET
TEG-NALAN: Doğal Dil Kullanan Java İskelet Kod Üreticisi
Doğal dil
işleme yapay zekanın bir alt dalıdır. Amaç bilgisayarların ana dili anlamasını,
insanların bilgisayarla olan ilişkilerinde ana dillerini kullanabilmesini
sağlamaktır. Bu raporda, Java sınıf iskeleti yaratmak, ilgili her sınıfa ait
üyeleri listelemek amacıyla geliştirilen Türkçe’ye dayalı akıllı bir doğal dil
uygulaması anlatılmaktadır. TEG-NALAN
(Doğal Dil Kullanan Java İskelet Kod Üreticisi ) olarak adlandırılan bu
uygulama, Türkçe cümlelerden Java programı üreten TUJA (Türkçe Kullanan Java
Kod Üreticisi) yazılım aracı projesinin bir parçasıdır. TEG-NALAN verilen Türkçe
cümleyi öncelikle parçalara ayırır; sonra da bu parçalardan cümlenin anlamına
uygun bir yapı oluşturarak gerekli bilgiyi veri ambarında tutar. Sonuçta,
kullanıcının isteğine bağlı olarak veri ambarından bilgiler çekilip, Java sınıf
iskelet üretimi tamamlanır. Daha sonra kullanıcının isteğine göre Java kaynak
dosyası oluşturulabilir veya sınıf
şemaları sorgulanabilir.
Programming
languages are precise and mostly
unambiguous with predefined syntax and semantics. Still, a programmer spends a
lot of effort in learning syntactic rules and at the same time developing
general programming skills. Even an experienced programmer may have the same
problems, if the programming language is a new one. On the other hand, natural
languages are more declarative, flexible, powerful and richer, being useful
even for occasional users. Also, the programmer may not know the language used
in the resources, such as books, to learn a new programming language.
There are visual
tools for creating object oriented designs, furthermore, generating Java/C++
skeletal programs, such as Rational Rose (an
The next chapter is a brief summary of previous work on the topics examined in this report. In Chapter 3, natural language processing is explained in two levels NLP definition and NLP in Turkish. Required software packages and programming language Prolog, which our software is written by, are presented in the next chapter. Then, Some AI concepts used in TEG-NALAN are explained. Design used in TEG-NALAN is in Chapter6. The syntax, semantics and morphology of TEG-NALAN is explained in Chapter 7. In the following chapter, knowledge database is explained. Test concepts used in TEG-NALAN are examined in Chapter 9. Finally the results obtained are discussed and what can be done as future work is stated.
In this chapter, some concepts and previous work related to this report are discussed. Previous related work can be categorized into two groups:
Ø
Natural Language Programming
and Machine Learning
Ø
Template and Code Generation
2. 1. Natural Language Processing and Machine
Learning
2.1.1.
Çetinoğlu [4] developed the
Canan uyudu mu (Did Canan fall asleep?)
Bilmiyorum
(I don’t know)
Kemal uyudu mu? (Did Kemal fall asleep?)
Evet,
Kemal uyudu (Yes, Kemal fell asleep)
Canan küçük bir çocuktur. Kemal
küçük bir çocuktur. Bütün küçük çocuklar 10 saat
uyurlar.
(Canan is a little child. Kemal is a little child. All
little children sleep for 10 hours.)
Kemal ne zaman uyudu? (When did Kemal fall
asleep?)
Kemal
yirmiüçte uyudu (Kemal fell asleep at
Kemal ne zaman uyandı
(When did Kemal wake up?)
Kemal
saat 9’da uyandı (Kemal woke up at
Canan kahvaltıda ne
yiyecek? (What will Canan eat in the breakfast?)
Bilmiyorum
(I don’t know.)
Canan kahvaltıda peynir,ekmek,zeytin
yiyecek (Canan will eat some cheese, bread, olive in the breakfast.)
Teşekkürler
öğrendim (Thanks, I learned.)
Küçük çocuklar hariç
herkes kahvaltıda çay içer (Everybody except little children drink tea in the
breakfast.)
Canan kahvaltıda ne
içecek? (What will Canan drink in the breakfast? )
Bilmiyorum
(I don’t know.)
Canan kahvaltıda çay
içecek mi? (Will Canan drink tea in the breakfast? )
Bilmiyorum
(I don’t know.)
Kaç kişi kahvaltıda çay
içmeyecek? (How many people will not drink tea in the breakfast?)
Bilmiyorum
(I don’t know.)
Kim kahvaltıda çay
içmeyecek? (Who will not drink tea in the breakfast? )
Bilmiyorum
(I don’t know.)
Kaç kişi kahvaltıda peynir,
ekmek, zeytin yiyecek?
(How many people will eat some cheese, bread, olive in
the breakfast?)
Bir kişi kahvaltıda peynir, zeytin, ekmek yiyecek
(One
person will eat some cheese, bread, olive in the breakfast.)
Kim kahvaltıda peynir, ekmek, zeytin yiyecek?
(Who will eat some cheese, bread, olive in the
breakfast?)
Canan kahvaltıda peynir, ekmek, zeytin yiyecek.
(Canan
will eat some cheese, bread, olive in the breakfast)
Figure 2. 1Target Scenario
of
Şeker [9] developed a Turkish speaking assistant to hold and
retrive appoinment details. In TuSA, the
appointments (whose details are entered by the user using Turkish sentences)
are stored in takvim schemata and written into a file. Takvim schemata has the
following structure:
takvim(ID,Minute,Hour,Day,Month,Year,Person,Location,Subject,Duration,Recurring)
where
ID: the unique number given to the
appoinment.
Minute and Hour: the representatives of
the time phrases
Day,Month and Year: the date of the
appoinment
Person,Location and Subject: with whom,where
and about what the appoinment will be.
Duration: either minute or hour
Recurring: recurring event information
like “aliyle iki günde bir toplantı var”(there is ameeting with ali every two
days).
If
a sentence like “Haftaya Aliyle olan
toplantıları göster” is given as an input, TuSA converts the query to an
internal formula like takvim(_,_,_,15,9,2003,_,[ali],[yemekli,toplanti],_,_)
and runs this query on the database. The results of the query, if found,
are shown to the user at the end in the format shown in Figure 2. 2.
ID:11
Tarih:15/Eylul/2003|10:05|Pazartesi (Date: 15/September/2003|10:05|Monday)
Kisi: arkadasim ali (Person: my friend ali)
Yer:komsumuz Canan (Location: neighbour Canan)
Konu: yemekli bir toplanti (Subject: Dinner)
Suresi: 7 saat (Duration: 7 hours)
Semantics:
takvim(11,10,5,15,9,2003,[komsumuz,canan],[arkadasim,ali],[yemekli,bir,toplanti],7,’saat’).
Figure 2. 2
Results of a Query Given to TuSA
2.2. Template and Code Generation
There are visual tools for creating object oriented designs, furthermore, generating Java/C++ skeletal programs. In this section, they will be examined.
Rational Rose/C++ [10] is application software that generates C++ code from
Unified Modeling Language (UML) diagrams. Rational Rose has the same notation
and syntax as UML. Its notation comprises a set of specialized shapes for
constructing different kinds of software diagrams such as class diagram, state
diagram, activity diagram etc. TEG-NALAN resembles the generated class skeleton
with Rational Rose/C++. The difference is that Rational Rose generates C++ code
from UML diagrams drawn whereas TEG-NALAN generates Java class skeleton from Turkish
sentences given by the user. Rational Rose interface can be seen below in Figure 2. 3.

Figure 2. 3 Rational Rose
Interface
2.2.2. A Natural Language Interface for
Programming in Java (NaturalJava)
NaturalJava [11] is a prototype for an intelligent
natural-language-based user interface for creating, modifying, and examining
Java programs. The interface exploits
three subsystems:
Ø
The Sundance
natural processing system accepts English sentences as input and uses
information extraction techniques to generate case frames representing program
construction and editing directives.
Ø
A knowledge-
based case frame interpreter, PRISM, uses a decision tree to infer program
modification operations from the case frames.
Ø
A Java
abstract syntax tree manager, TreeFace, provides the interface that PRISM uses
to build and nevigate the tree representation of an evolving Java program.

Figure 2. 4 Architecture of NaturalJava
3.
NATURAL LANGUAGE PROCESSING(NLP)
In this section we will mention about definition and levels of NLP,
then we will talk about Turkish and some difficulties faced with when
developing NLP applications in Turkish.
Natural Language
Processing (NLP) is a subfield of artificial intelligence which ultimate aim is
to enable computers to use natural languages with performance levels comparable
to humans. Natural language communication with computers has long been a major
research area of artificial intelligence (AI), both for the information it can
give about intelligence in general and for its practical utility.
There are some
researches for Turkish as a natural language phenomenon, like creating
morphological structure or building a statement structure or some chat robots
and algorithm analyzers but each research is built on exclusive areas and it is
almost impossible to combine them.
NLP depends on
putting limits on the need for outside knowledge, human experience, cheap
computer power and exact knowledge of how human languages work.
NLP has built
into 5 levels [1] that are;
o
Phonology (sounds of words)
o
Morphology (structure of words)
o
Syntax (order of words)
o
Semantics (meaning of words)
o Pragmatics (use of language)
o
Phonology
Phonology is the study of how sounds are used in
language. Every language has an alphabet of sounds that it distinguishes: These
are called its Phonemes and each
Phoneme has one or more physical realizations called Allophones. As an example consider the “t” sounds in the words
“top” and “stop”. They are physically different. However in English these two
sounds are allophones of the same phoneme, because the language does not distinguish
them.
Morphology
Morphology is
the word formation. Every language has two kinds of word formation processes: Inflection, which provides the various
forms of any single word like singular man and plural men, and Derivation, which creates new words from
the old ones. For example, the creations of dogcatcher from dog, catch, and –er
is a derivational process.
Syntax
Syntax is
the lowest level at which human language is constantly creative. People not
often create new speech sounds or new words. But everyone who speaks a language
is constantly inventing new sentences that he or she has never heard before.
Therefore, syntax is quite unlike phonology or morphology.
Semantics
Semantics is the
level at which language makes contact with the real word. As a field of study,
semantics has only recently started to mature. For a long time it was unclear
how to describe the meanings of natural-language utterances. Suitable tools
have now been provided by mathematical logic and set theory, and since 1970s
the study of semantics has made great strides.
Pragmatics
Pragmatics is the use of language in context. The
boundary between semantics and pragmatics is uncertain and different authors
use the terms somewhat different from each other. As a result pragmatics includes
aspects of communication that go beyond the literal truth conditions of each
sentence.
The aim of the study reported in this thesis is to build a software
infrastructure for computational processing of Turkish, which smoothly
integrates the above-mentioned levels, and which can be used in the
construction of various Natural Language Processing applications for the
language. The Prolog logic programming language was selected for the
implementation.
Since the whole Turkish space is a huge set, we have chosen to decrease
the set of possible sentences by selecting a special subset of Turkish. This
subset is also in APPENDIX – A.
Turkish [5] is a member of Ural-Altaic Language Family. This section analyses Turkish from the language perspective and shows important aspects of it. Turkish is characterized by certain morphophonemic, morphotactic, and syntactic features which are vowel harmony, agglutination of all-suffixing morphemes, free order of constituents, and head-final structure of phrases.Turkish language uses Latin characters. In the Turkish alphabet there are 29 letters. These letters divided into two categories vowels and constants. As seen from Figure 3. 1, we have 8 vowels and 21 constants. As a further level vowels can be divided into sub-categories according to their phonetics or shape. Similarly constants also have some sub-categories where some of are fricative, nasal or liquid. 
Turkish MorphologyTurkish morphology is really complicated for generating applications based on it. Because, Turkish is an agglutinative language [6] with word structures formed by productive affixations of derivational and inflectional suffixes to root words. This extensive use of suffixes causes morphological parsing of words to be rather complicated, and results in ambiguous lexical interpretations in many cases. 
Figure
3. 2 Turkish Morphology Example
For example in Figure
3. 2, “annesi” (his or her
mother) may be interpreted as their child. This type of ambiguity can be
resolved at phrase and sentence levels by the help of agreement requirements
though this is not always possible.
Let’s look at
the examples in Figure 3.
3. The first word takes several suffixes and although
the root is a verb (“gör” see) it turns into a noun. In the second word when we
add the suffix “-a” to the word “ağaç” (tree) the letter “ç” turns to be “c”.
And the last one is an example of letter drop. When “–ıyor” suffix is added to
the verb “ağla” (cry) the “a” letter drops and verb becomes “ağlıyor”.
· görünürlerde à gör + ün + ür + ler + de · ağaca à ağaç + a · ağlıyor à ağla + ıyor
Moreover typical heuristics used in English to
disambiguate between noun and verb readings of the same lexical form (just like
checking the previous word whether a determiner or not) are in general
applicable in Turkish as a, or, the determiner (“bir” in Turkish) may
also function as an adverb. Let’s consider
the sentences in Figure
3. 4 . In the
first sentence “giderim” means
that expense, on the other sentence it
is used as go. For an NLP application
the program should recognize this type of morphological structures.

Figure 3. 4 Example Sentences (Ambiguity)
Syntactic Categories
As in Figure
3. 5, these are nouns,
proper nouns, compound nouns, adjectives, verbs, adverbs and conjunctions.
Notice that determiners are not in this list.

Figure
3. 5 Turkish
Syntactic Categories
Word
Order
Order of words in Turkish is subject – object – verb (SOV). However different orders from SOV are also commonly used. In Turkish grammatical function of the sentence is determined noun phrase (NP) regardless to its position. Therefore typical word order can change freely without affecting the grammar of the sentence. Only verb keeps its position that is at the end in sentence. In Figure 3. 6, we have an

example for this kind of situation.
All of the three sentences have the same meaning. The first one is an example
of typical word order. In the second one the subject I is emphasized. Although we changed order of subject and object,
that of verb remains unchanged in all of the three sentences. If the verb is
also moved from its typical place (at the end) we call this type of sentences
as inverted sentence. Figure 3. 7 is an example of
inverted sentences. The reason why
inverted sentences are
used is they

Figure
3. 7 Inverted
Sentences
generally emphasizes the verb. But
this type of change in word order results in change in grammar functions of the
sentence. In other words these sentences are not means the same. Because
grammar function remains unchanged if just the order of noun phrase is changed.
4.
THE COMPONENTS OF TEG-NALAN
In this section, some AI concepts ( “Augmented Trantion Network”, “Concept
Hieararchy”, “Schemata” ) [13] utilized in the project are explained.
4.1.Augmented Transition Network (ATN)
Augmented Transition Networks (or
ATNs) were developed in an attempt to provide a practical framework for natural
–language understanding. In order to combine parsing with semantic analysis, it
should be possible to attach semantic routines to specific parts of the parsing
mechanism or grammar. An ATN can offer the following advantages: (1) The basic
parsing scheme is easy to understand; the grammatical information is
represented in a transition network, and consequently, an ATN is relatively
easy to design. (2) Semantic analysis proceeds simultanously with syntactic
analysis, and semantics may easily be used to constrain parsing to resolve
ambiguities.
The ATN framework does not place any restrictions on the kinds of actions
one can specify. Thus by deciding to use an ATN, one does not narrow the design
alternatives for a system very much. However, the ATN approach seems to provide
enough structure to a natural-language system to be helpful.
Much of our knowledge about the world is organized hiearachically. All the
“things” we know of we group into classes and sets. These classes are grouped
ınto superclasses and the superclasses into even bigger ones. With most of
theses classes we associate names which we use to identify the classes. There
is a group we call “dogs” and another we call “cats”. These are grouped, with
some other classes, into superclass called “mammals”. Plants, minerals,
machines, emotions, information, and ideas are treated similarly. Much of our
knowledge consists of an understanding of the inclusion relationship on all these classes and
cognizance of various properties shared by all rmembers of particular classes.
“All horses have four legs” states that the property “has four legs” is shared
by each member of the class of horses.
The inclusion relation on a set of classes is very important in AI. “ A
bear is mammal ” expresses that the class of the bears is a subclass of the
class of mammals in Figure 4. 1. For this reason, the data structures used to
represent inclusion relations are often called “

“ All hourses have four legs ” expresses that the class of the hourse has a
member legs and all hourses have this property. The data structures used to
represent member relations are often called “HASA” hiearachies.

To represent knowledge, some organizational structures are required and
schema is the one of them. A schema commonly consists of two parts: a name and
a list of attribute-value pairs. The attributes are sometimes called “slot
names” and the values “filters”. An example schema representation is given
below in Table 1.
|
Slots |
Fillers |
|
FRAME NAME |
KITCHEN |
|
DISHWASHER |
(5,4) |
|
FRIDGE-LOC |
(2,1) |
|
STOVE-LOC |
(3,5) |
Table 1Schema representing a kitchen with its
attributes
5.
REQUIREMENTS and PROGRAMMING
ENVIRONMENT
TEG-NALAN is application software that is
completely written in Prolog. So,I n order to run
TEG-NALAN on your machine you need to install SWI-Prolog software package. It
does not require any special hardware to run. What you need is only a computer
capable of running Prolog.
TEG-NALAN is
mainly implemented in SWI-Prolog that is a Free Software Prolog compiler,
licensed under the Lesser GNU Public License. It is the most commonly used one among the other Prolog compilers especially
in educational purposes. SWI-Prolog can be obtained from [7] where both Linux and Windows versions are available.
Among all of the
programming languages available today; Prolog [2] [12] may be the most suitable for natural language
processing purposes. Here are some reasons;
·
It is possible to define,
build, and modify large, complex data structures easily. This makes it easy to
represent syntactic and semantic structures and lexical entries in Prolog.
·
List manipulation is widely
handled in Prolog, and lists are the preferred data structures for representing
Natural Language structures in any level.
·
The program can examine and
modify itself. The user is able to make modifications to the program as well as
the knowledge base dynamically as the program runs.
·
Prolog is based on first-order
predicate logic. Logic rules and knowledge representation are integrated within
the system. Extensions to this logic are relatively easy to implement.
·
The ability to store the
knowledge base in terms of predicates and facts allows the programmer to easily
integrate query systems using rules of interface.
·
The dept-first search algorithm
is built into Prolog and is easily used in all kind of parsers. In fact Prolog
has a built-in and ready-to-use parser. These features of prolog ease the
implementations of morphological level of syntactic level parsing algorithms
and improve the efficiency relative to a hand-coded dept-first parser.
·
The backtracking property of
Prolog means that the user does not need to explicitly handle the alternatives
of a clause. Whenever a clause fails, Prolog backtracks to find an alternative
solution that does not fail. This property can also be used to find out the
entire solution set of a given query.
·
Pattern matching (unification)
is built into Prolog. With this property, arguments of data structures can be
constructed in different steps within the clauses and without any strict order.
·
Most Prolog programs are
reversible, that is, they can work in both directions without any changes or
with slight changes on the code, in the sense that the output arguments of a
Prolog predicate could be used as input arguments for the same predicate in
another call. This feature of Prolog allows us to develop applications which
can perform not only analysis but also generation with the same code.
Lisp shares only
a few of these advantages. Conventional languages such as Pascal and C are lack
all of them. Of course natural language processing can be done in any
programming language; Prolog is much easier than others.
Beside all of
the advantages, choosing Prolog will also have some disadvantages. The
backtracking property sometimes causes the generation of unwanted solutions. To
prevent such problems, the cut predicate ’!’ is used, which causes an artificial
restriction. The usage of such kinds of restrictions prevents the user from
testing the formalism whether it completely represents the entire set of the
theory it is based on or not.
Another
disadvantage of using Prolog can be efficiency results. In most of the cases
Prolog programs run slower than the ones implemented in procedural languages
such as C or Pascal.
However, the
advantages we listed above are so important and convenient for NLP that Prolog
is still the most suitable and widely used programming language, in spite of
the fact that it has disadvantages.
In
this section the main design issues considered in TEG-NALAN will be explained.
These are Component Level Design, Data Flow and Data Design.
TEG-NALAN is composed of several libraries
which are shown in Figure
6. 1 below. TEG-NALAN
depends on two main libraries that are tuja and sohbetson.
Figure 6. 1 Library Hierarchies in TEG-NALAN
Ø Tuja file is the main library in our program. There are the vital components in TEG-NALAN which can be seen in Figure 6. 3 (ATN Manager, Knowledge Database, Java code Generator).
Ø Sohbetson file includes main Turkish grammar rules. As an example;
Sentence à noun phrase + verb phrase.
This file is originally written by [4]. However, all of the grammar rules are changed, in order to adapt them into TEG-NALAN. Semantic Creation that is an important part of TEG-NALAN is also implemented in Sohbetson.
Ø Morphoson used as a morpheme database (Turkish dictionary). TEG-NALAN is a dictionary based application. Therefore all words (not only nouns also verbs, adjectives, numbers, everything) needed in TEG-NALAN should be defined beforehand. Otherwise we can not be able to recognize the given word.
Ø Formula library contains rules that transform the semantic input given, into a set of Prolog facts and rules. In this file semantic formulas are applied. Sohbetson needs this file to finalize the semantic.
Ø
Arcson is the implementation of Oflazer’s finite state machine [8] for Turkish. Inside the file there are arc rules to realize
Ø
Misc contains some predicates and rules that are in design and test
level. If we compare this library with the other libraries, it is not as stable
as the other ones.
6.2. Data Flow and Data Design
TEG-NALAN takes a Turkish sentence, and then applies Turkish grammar rules to create a meaningful semantics, finally creates Java class templates and writes them into an output java file as in Figure 6. 2.

TEG-NALAN achieve its task with the following
components:
Ø
ATN Manager
An augmented
transition network (ATN) is developed for TEG-NALAN interface. HASA
relationship is used for composing classes and
Ø Knowledge Database
Knowledge database keeps all the class hierarchy resulting from the object oriented design and the skeleton of each class.
Ø Java Code Generator
Retrieving information from the database, Java Code Generator generates java class skeleton.

Figure 6. 3 Architecture of TEG-NALAN
7.
MORPHOLOGY, SYNTAX
Morphology of TEG-NALAN is inherited from a previous project, TuSA [9] based on PROLOG.
For syntax part, some grammatical rules, which can be seen in APPENDIX – B, are created. All the possible syntax types are supported to create an abstract model representing the classes. More sample sentences except given in this section can be seen in APPENDIX – A.
Sentences excepted
by our program can be categorized into four different groups:
– Class Declaration Sentences
– Attribute Declaration Sentences
– Method Declaration Sentences
– Hierarchy Declaration Sentences
At the start, the required Turkish words have been added into the morphoson as tr_morph_entry form. This is due to the reason that our program is a dictionary based software and an assumption is been made that there is a library containing the entire Turkish words in tr_morph_entry form. There is an example in Figure 7. 1, which shows different types of tr_morph_entry. In this example İlknur is used as a pronoun, “liste” (list) as a noun, “tüket” (consume) as a verb and “yüksek” (high) as an adjective.
Tr_morph_entry has eight parameters. Three of them are on explicitly left empty to leave space for improvement. The first parameter is a string that shows the root type of the word. The second parameter is a list that has two elements. The first one is the word in a list form and the following one is another list having type and semantic inside. The third, fourth and fifth arguments are left empty. Sixth parameter is the last vowel and the seventh one is last letter. The final parameter is the state value.
The last vowel, last letter and the state value are used to determine the suffixes that that word may have. The last vowel and the last letter are saved since some suffixes can be changed according to the following word.
Figure 7. 2 illustrates some examples of the suffix changes due to ending of the word. In addition, some examples of state values are also shown in Table 2. Further detailed explanations about the suffix addition and the word formation are available in [4] [9].
tr_morph_entry('AdKök',[[l,i,s,t,e],[type(noun),sem(liste)]],_,_,_,e,e,ok). tr_morph_entry('AdKök',[['İ',l,k,n,u,r],[type(propernoun),sem(ilknur)]],_,_,_,u,r,ok). tr_morph_entry('FiilKök',[[t,ü,k,e,t],[type(verb),sem(tuket-al)]],_,_,_,e,t,ok).
à lar (plural suffix) “araba + lar” (cars) “çiçek + ler”
(flowers)
|
State Value |
Description |
Example |
|
Ok |
Regular words |
“abla”
(sister) |
|
Specok |
Standard forms
of special words |
“çocuk”
(child) |
|
Spec |
Exceptional form of special words |
“çocuğ” (child) |
Table 2 State Values
7.2.Class
Declaration Sentences
This group of
sentences is used to create a new class as shown in Figure 7. 3. Note that declaration of abstract classes; Java
interfaces are also supported as you see in Figure
7. 4.

Figure 7. 3 Example of a Class Definition Sentence

Figure 7. 4 Example of an Interface Definition Sentence
Part of the ATN, which detects class declaration sentences, is shown in Figure 7. 5. Note that the figure only shows the class declaration sentences, not the sentences required to define interfaces.

Figure 7. 5 ATN for Class Declaration
7.3.
Attribute Declaration Sentences
This group of
sentences is used to define the attributes of an existing class or to define a
new class with specified attributes.
The attributes
of a class can be declared via two types of sentences according to the level of
the details wanted to given by the user:
Ø
For undetailed programming:
The users who do
not write programs using any object oriented language or professionals who do
not want to enter detailed sentences can use this type of sentences. These
sentences do not require any information about Java primitive types, or
attribute access specifiers (public,
private or protected). All
attributes, which are entered by the user in this way, are accepted as private attributes. Each of these
attributes is the instance of a user defined object. An example of this kind of
sentence and the corresponding output can be seen in Figure 7. 6.
Figure 7. 6 Example of an Undetailed Attribute
Declaration Sentence
Ø
For detailed programming:
For programmers,
who are familiar with the object oriented concept and want to declare detailed
classes, can use this type of sentences. In this case, user has the opportunity
to determine the name and access specifier of the attribute. So, the attribute
can be public, private or protected. Each of these attributes can
either be an instance of a user defined object or a primitive type such as “int”. An example of this
kind of sentence and the corresponding output can be seen in Figure 7. 7.

Figure 7. 7 Example of a Detailed Attribute Declaration Sentence