GENERATING JAVA CLASS SKELETON
USING
A NATURAL LANGUAGE INTERFACE
Ender ÖZCAN, Şadi Evren
ŞEKER, Zeynep İlknur KARADENİZ
{eozcan,
seseker, ikaradeniz}@cse.yeditepe.edu.tr
Artificial Intelligence Laboratory (AR+I)
26 Ağustos Yerleşkesi Kayışdağı/İstanbul
Abstract
An intelligent natural language interface based on
Turkish Language is designed for creating Java class skeleton, listing the
class and its members. This interface is developed as a part of a project named
as TUJA, a tool for producing Java programs using Turkish sentences. Turkish
sentences are converted into instances of schemata. There are three types of schemata:
class definition schema, member method schema and member attribute schema. Concept hierarchies are utilized for building
the classes and their hierarchical representation for Java class skeleton
generation. In this paper, the details of the design and the implementation are
described.
Key Words
Natural Language Processing, Class Skeleton Generator,
JAVA, Turkish, Concept hierarchies, Artificial Intelligence
1. Introduction
Programming languages are machine processible,
precise and mostly unambiguous with
predefined syntax and semantics. Still, a novice programmer spends a lot of
effort in learning syntactic rules and at the same time developing general
programming skills. Even an experienced
programmer may face the very same problems, if the programming language is a
new one. On the other hand, natural languages are more declarative, flexible,
powerful and richer, being useful even for occasional users. Also, the
programmer may not know the language used in the resources, such as books, to
learn a new programming language.
There are visual tools for creating object oriented
designs, furthermore, generating Java/C++ skeletal programs, such as Rational
Rose (an

2. Natural Language
Processing using Turkish
NLP consists of 5 layers: morphology, syntax,
semantics, pragmatics and phonetics. Due to our scope and purposes we have
limited our work in morphology and especially in syntax and semantics
layers.
Turkish is one of the most widely spoken languages in
the world, distributed over a large geographical region in
The same suffix can be attached to different words in
different ways. Sometimes, a vowel or a consonant towards the end of a word may
deform. For this reason, morphological analysis in Turkish is not
straightforward as shown in Table
1.
|
Word |
(Stem) + Suffixes |
|
görünürlerde (in sight) |
(gör) + ün + ür + ler +de (görmek - to see) |
|
ağaca (towards the tree) |
(ağaç) + a (tree) |
|
ağlıyor (he/she/it is crying) |
(ağla) + yor (ağlamak – to cry) |
Table 1. Some deformation examples in
Turkish words due to suffixes.
There are seven morphological categories in Turkish:
nouns, private nouns, compound nouns, adjectives, verbs, adverbs and
conjunctions.
In Turkish, another difficulty rises due to the syntax. Sentences
with different syntaxes using the same words are allowed in Turkish, yielding a
group of sentences with the same meaning as illustrated in Table 2. The common property in all these three sentences is
a feature of Turkish language, that is, the
verb appears at the end of the sentences.
|
Sentence |
|
Çocuğa kitabı ben verdim |
|
Çocuğa ben kitabı verdim |
|
Ben çocuğa kitabı verdim |
Table
2. Turkish sentences with different
syntax having the same meaning: “I gave the book to the child”.
3. Morphology, Syntax and Semantics
Morphology of TUJA is inherited from a previous
project, TUSA [1] based on PROLOG. All the possible syntax types are supported to
create an abstract model representing the classes. The sentences are
categorized into four different groups:
– Class Declaration Sentences
– Attribute Declaration Sentences
– Method Declaration Sentences
– Hierarchy Declaration Sentences
An augmented transition network (ATN) is developed for
TUJA interface. HASA relationship is used for composing classes and
3.1 Class Declaration Sentences
This group of sentences is used to create a new class or
name an existing class as shown in Figure
2. Note that declaration of abstract classes; Java
interfaces are also supported.

Figure 2. An example of
a class definition sentence
Part of the ATN for TUJA detects class declarations as
illustrated in Figure
3.

Figure 3. ATN for class declaration
3.2 Attribute Declaration Sentences
This group of sentences is used to define the
attributes of an existing class or to define a new class with specified attributes
as shown Figure
4.

Figure 4. An example of an attribute declaration sentence

Figure 5. ATN for attribute
declaration
3. 3 Method Declaration
Sentences
This group of sentences is used to define
the methods of predefined classes or to define a new class with specified
member methods as shown in Figure
6.

Figure 6. An example of a method declaration sentence
Part of the ATN for TUJA determines method
declarations as shown in Figure
7.

Figure 7. ATN for method declaration
3.4 Hierarchy Declaration Sentences
Our knowledge about the world can be organized
hierarchically using a naming convention for each class including a set of
objects with common properties. For example, cows and horses represent
two different set of objects, and mammals
contains both of them as a super class. Note that cows and horses carry the
properties of mammals. Similarly objects defined by a formal object oriented
programming language can be organized hierarchically, forming a class
hierarchy, supporting inheritance.

Figure 8. An example of a hierarchy declaration
sentence
Part of the ATN for TUJA detects hierarchy
declarations as shown in Figure
7.

Figure 9. ATN for
4. Knowledge Database
Knowledge database keeps all the class
hierarchy resulting from the object oriented design and the skeleton of each
class. There are several syntax rules supported to represent the relationship
between the classes and their members. After parsing and understanding the
command, TUJA converts the new input into an appropriate instance of a schema
or modifies an existing schema in the knowledge database. PROLOG provides the
advantage of keeping the instances in a relational database form. There are three
basic schemata, supported by TUJA:
–
Class Schema
–
Method Schema
–
Attribute Schema
4.1 Method Schema
In order to retrieve the prototype of a
method fully, verbs are categorized into four groups:
–
Consuming verbs, identifying the methods requiring a list of
parameters with no return value
Example
1:
“Ekmek yiyor” (He/she/it is
eating bread)
Parameters: Ekmek
(bread)
Return: void
–
Producing verbs, identifying the methods requiring no parameters and
returning a value
Example
2:
“At tay doğurur” (A
horse gives birth to a foal)
Parameters: none
Return: tay (foal)
–
Unaffecting verbs, identifying the methods requiring no parameters with
no return value
Example
3:
“At hızlanır”
(The horse speeds up)
Parameters: none
Return: void
–
Modifiying
verbs, identifying the
methods requiring a list of parameters and returning a value
Example
4:
“İnsan undan
Parameters: un, su (flour, water)
Return: ekmek (bread)
There is a special case for modifying verbs:
“İnsan kuş avlar” (Man hunts for bird)
In this example, the question “what is the
result of the hunt?” can be easily answered as “bird”. Similarly, the question
“what does man hunt for?” can be answered as “bird” again. It is obvious that both
the return value and parameter are the same, which is “bird”.
ATN will parse Turkish sentences, and if a
sentence defines a method, during the categorization, nouns will be identified
as a parameter or a return value of the related method. Then the data structure
shown in will be generated. Using Example 4, TUJA yields:
method(cook, [flavour,water],[bread],public)
Since in Java, a method may have only one
return value, it is assumed that if there are more than one return values as a
list, because in natural languages we may use more than one return value in a
sentence like “insan undan sudan ekmek borek
pisirir” (Man cooks bread and pastry from flavour and water). In such a case we need to produce two
functions, one with return type of bread, second with return type of pastry,
which is function overloading. Still JAVA can not understand function
overloading with different return types and same parameters, we need to
distinguish these two functions, and we have solved this problem by
implementing different function names.
method( MethodName, ListofParameters, ListofReturnValues, MethodSpecifier )
Figure 10. Data structure for methods
4.2 Attribute Schema and Class Schema
XXXX EXPLAIN how attributes are identified!!!
attribute( AttributeName, AttributeType, AttributeSpecifier )
Figure 11. Data structure for attributes
In this group of semantics, we have kept
the information about classes. The general schema of a class is shown below:
class( ClassName, InheritedClass,
ListofAttributes, ListofMethods,
ListofImplementedIntefaces, ClassSpecifier )
Figure 12. Data structure for classes
Here, ClassName
keeps the name of class, InheritedClass
keeps the name of class which this class inherits. ListofAttributes is
the attributes of this class in the form of List, because a class may have more
than one attribute. Similiarly, we keep the methods
in a list form. As we have previously mentioned in “3.4. Hierarchy Declaration Sentences”
section, we have implemented interfaces and we can keep the interfaces in this
data field. In the ClassSpecifier field, we keep the access
specifier of this class (e.g. public, private, protected).
As an example, the following input to TUJA
generates the output in Figure
13:
“insan
diye bir kavram vardır” (there is a
concept called human)
“insan
ekmek et ve balık yer” (human eats bread meat and fish)
“insan
balık avlar” (human hunts for fish)
“insanin
kilosu boyu vardır” (human
has weight and height)
“insan
bir canlıdır” (human is a living thing)
“insan
trafik kurallarına uyar” (human obeys
the traffic rules)
class(insan, canli, [attribute(
weight, weight, _), attribute( height, height, _)], [method(eat, [bread,
meat, fish], []. _), method(hunt, [fish], [fish],
_)],traffic,_)
Figure 13. An example data structure generated TUJA using a given input.
5. Experiments
A sample run is performed to test TUJA. Turkish
sentences describing a Linked List are entered into the system as shown in . Then an output is generated in.
Liste soyut bir kavramdır. (List is an
abstract contept) [list-interface]^definition Liste elemanı
sonaekle diye bir metoda sahiptir. (List inserts element
into tail) [list-[ element]- (insertLast-al)]^method_interface Liste elemanı başaekle diye bir metoda sahiptir. (List inserts element
into head) [list-[ element]- (insertFirst-al)]^method_interface Liste elemanı
sıralıekle diye
bir metoda sahiptir. (List inserts
element sorted) [list-[ element]- (insertSorted-al)]^method_interface Liste elemanı
sonrakineekle diye bir metoda sahiptir. (List inserts
next element) [list-[ element]- (insertNext-al)]^method_interface Liste eleman çıkarır. (List removes element) [list-[ element]- (remove-al)]^method_interface Liste durumdan eleman bulur. (List finds element
from position) [list-[position]- element - (find-ver)]^method_interface Liste yazar. (List prints) [list-(print-void)]^method_interface Bağlıliste bir
kavramdır. (Linkedlist is a concept) [linkedlist-class]^definition Her bağlıliste bir listedir. (All linkedlists are a list) [linkedlist-list]^relation Bağlıliste head adında Element
tipinde korunan bir özelliğe sahiptir. (Linkedlist has a protected attribute whose name is head
and the type is Element) [linkedlist-[head-element
-protected]-class]^attribute Bağlıliste tail adında Element tipinde korunan bir özelliğe sahiptir. (Linkedlist has a protected attribute whose name is tail
and the type is element) [linkedlist-[tail-element-protected]-class]^attribute Bağlıliste next adında Element tipinde korunan bir özelliğe sahiptir. (Linkedlist has a protected attribute whose name is next
and the type is element) [linkedlist-[next-element-protected]-class]^attribute Bağlıliste previous adında Node tipinde korunan bir özelliğe sahiptir. (Linkedlist has a protected attribute whose name is
previous and the type is element) [linkedlist-[previous-element-protected]-class]^attribute Bağlı-liste ölçü
ile oluşur. (Linkedlist is composed of size) [linkedlist-[size-size-protected]-class]^attribute
Element bir kavramdır. (Element is a
concept) [element-class]^definition Elemanın eleman
adında korunan bir tamsayısı vardır. (Element has a
protected integer whose name is element) [element-[element-int-protected]-class]^attribute Elemanın next adında eleman tipinde korunan bir özelliği vardır. (Element has a
method whose name is next and the type is Element) [element-[next-element-protected]-class]^attribute
6. Conclusion
Natural language is far away from human created
computer programming language. Since there are lots of computer illiterate
people around the world, our project can help them to create smart looking
computer designs by only typing simple sentences.
Since a formal language and a natural language can be
examples to the incommensurability theory of Thomas Khun,
we have achieved a very important improvement as a bilingual program implementation;
there is still much work to handle.
7. Acknowledgement
Special thanks to Prof. Dr. A. C. Cem
SAY from
References
[1] M.U.Karakas,
E. Inan, Current
Status in Turkish Code Table Problem, Bilisim,
[2] O. N. Darcan, An
intelligent database interface for Turkish, M.Sc.
Thesis,
[3] S. Demir, Improved
treatment of word meaning in a Turkish conversational agent, M.Sc. Thesis,
[4] S.E. Seker, Türkçe Doğal Dil Arayüzlü Bir
Kişisel Takvim Programının, Tasarım
ve Kodlamasi, TAINN 2003,
[5] S.E. Seker, A
Personal Assistant with A Natural Language Interface in Turkish, M.Sc. Thesis, Yeditepe
University, Istanbul, 2003.
[6] K. Köymen
[7] M.A. Covington, Natural Language Programming for Prolog Programmers, (Englewood
Cliffs, NJ:Prentice-Hall, 1994)
[8] Cetinoglu, A., Prolog Based Natural Language
Processing Infrastructure for Turkish, M.Sc. Thesis,
[9] J. Weizenbaum, ELIZA: A Computer Program for the Study of
Natural Language Communication between Man and Machine, ACM Press, NY, USA,
1983, 23-28