GENERATING JAVA CLASS SKELETON USING

A NATURAL LANGUAGE INTERFACE

 

Ender ÖZCAN, Şadi Evren ŞEKER, Zeynep İlknur KARADENİZ

{eozcan, seseker, ikaradeniz}@cse.yeditepe.edu.tr

Yeditepe University, Department of Computer Engineering

Artificial Intelligence Laboratory (AR+I)

26 Ağustos Yerleşkesi Kayışdağı/İstanbul

Turkey

 


Abstract

 

An intelligent natural language interface based on Turkish Language is designed for creating Java class skeleton, listing the class and its members. This interface is developed as a part of a project named as TUJA, a tool for producing Java programs using Turkish sentences. Turkish sentences are converted into instances of schemata. There are three types of schemata: class definition schema, member method schema and member attribute schema.  Concept hierarchies are utilized for building the classes and their hierarchical representation for Java class skeleton generation. In this paper, the details of the design and the implementation are described.

 

Key Words

Natural Language Processing, Class Skeleton Generator, JAVA, Turkish, Concept hierarchies, Artificial Intelligence

 

1.     Introduction

 

Programming languages are machine processible, precise and mostly unambiguous with predefined syntax and semantics. Still, a novice programmer spends a lot of effort in learning syntactic rules and at the same time developing general programming skills.  Even an experienced programmer may face the very same problems, if the programming language is a new one. On the other hand, natural languages are more declarative, flexible, powerful and richer, being useful even for occasional users. Also, the programmer may not know the language used in the resources, such as books, to learn a new programming language.

 

There are visual tools for creating object oriented designs, furthermore, generating Java/C++ skeletal programs, such as Rational Rose (an IBM product). Turkish to Java (TUJA) is a natural language processing (NLP) application, designed with two modes of operation, where each mode is to be implemented as a phase. First phase involves in building an interface for creating a skeletal Java program, including all classes, their attributes (data) and prototypes of member methods of each class. Second phase involves in enlarging the functionality of the same interface to convert each skeletal class into full Java programs by allowing users to express them in Turkish sentences. In this paper, details of the first phase of TUJA projects are described. TUJA accepts Turkish sentences, describing a class, a member method or a member attribute of a class, using a conversational front end. Then the input is fed into an augmented transition network (ATN) for parsing and semantic analysis. At the end of this process knowledge database is updated using the current command. Knowledge is represented using schemata. At any instant, the user can ask TUJA to produce the Java skeletal code, saving it into a file. Architecture of TUJA is illustrated in Figure 1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 1. Framework of TUJA

2.     Natural Language Processing using Turkish

 

NLP consists of 5 layers: morphology, syntax, semantics, pragmatics and phonetics. Due to our scope and purposes we have limited our work in morphology and especially in syntax and semantics layers. 

 

Turkish is one of the most widely spoken languages in the world, distributed over a large geographical region in Europe and Asia, as pointed in [1] based on United Nations sources. Note that there are many Turkish dialects, such as the Azeri, the Türkmen, the Tartar, the Uzbek, the Baskurti, the Nogay, the Kyrgyz, the Kazakh, the Yakuti, the Cuvas. Turkish is similar to Mongolian, Manchu-Tungus, Korean belonging to the same family of languages: the Altaic branch of the Ural-Altaic family.  There are 29 letters from the Latin alphabet in Turkish, including 8 vowels. There is a vowel harmony in Turkish words. Words do not have gender. In Turkish sentences adjectives precede nouns.  It is unfortunate that there are a few number of natural language applications ([2]-[5]) based on Turkish language due to its agglutinative nature.

 

The same suffix can be attached to different words in different ways. Sometimes, a vowel or a consonant towards the end of a word may deform. For this reason, morphological analysis in Turkish is not straightforward as shown in Table 1.

 

Word

(Stem) + Suffixes

görünürlerde

(in sight)

(gör) + ün + ür + ler +de

(görmek - to see)

ağaca

(towards the tree)

(ağaç) + a

(tree)

ağlıyor

(he/she/it is crying)

(ağla) + yor

(ağlamak – to cry)

Table 1. Some deformation examples in Turkish words due to suffixes.

There are seven morphological categories in Turkish: nouns, private nouns, compound nouns, adjectives, verbs, adverbs and conjunctions.

 

In Turkish, another difficulty rises due to the syntax. Sentences with different syntaxes using the same words are allowed in Turkish, yielding a group of sentences with the same meaning as illustrated in Table 2. The common property in all these three sentences is a feature of Turkish  language, that is, the verb appears at the end of the sentences.

 

Sentence

Çocuğa kitabı ben verdim

Çocuğa ben kitabı verdim

Ben çocuğa kitabı verdim

Table 2. Turkish sentences with different syntax having the same meaning: I gave the book to the child”.

3.     Morphology, Syntax and Semantics

 

Morphology of TUJA is inherited from a previous project, TUSA [1] based on PROLOG. All the possible syntax types are supported to create an abstract model representing the classes. The sentences are categorized into four different groups:

– Class Declaration Sentences

– Attribute Declaration Sentences

– Method Declaration Sentences

– Hierarchy Declaration Sentences

An augmented transition network (ATN) is developed for TUJA interface. HASA relationship is used for composing classes and ISA relationship is used for building the class hierarchy.

 

3.1 Class Declaration Sentences

 

This group of sentences is used to create a new class or name an existing class as shown in Figure 2. Note that declaration of abstract classes; Java interfaces are also supported.

 

 

 

 

 

 


Figure 2. An example of a class definition sentence

Part of the ATN for TUJA detects class declarations as illustrated in Figure 3.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 3. ATN for class declaration

3.2 Attribute Declaration Sentences

 

This group of sentences is used to define the attributes of an existing class or to define a new class with specified attributes as shown Figure 4.

HAS relation represents the inclusion relationship, determining the elements included by an object. In other words, HAS relation is used to define the members of a class. Part of the ATN for TUJA detects attribute declarations as shown in Figure 5.

 

 

 

 

 

 

 

 

 

 


Figure 4. An example of an attribute declaration sentence

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 5.  ATN for attribute declaration

3. 3 Method Declaration Sentences

 

This group of sentences is used to define the methods of predefined classes or to define a new class with specified member methods as shown in Figure 6.

 

 

 

 

 

 

 

 


Figure 6. An example of a method declaration sentence

Part of the ATN for TUJA determines method declarations as shown in Figure 7.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 7. ATN for method declaration

3.4 Hierarchy Declaration Sentences

 

Our knowledge about the world can be organized hierarchically using a naming convention for each class including a set of objects with common properties. For example, cows and horses represent two different set of objects, and mammals contains both of them as a super class. Note that cows and horses carry the properties of mammals. Similarly objects defined by a formal object oriented programming language can be organized hierarchically, forming a class hierarchy, supporting inheritance.

 

ISA hierarchy is used to represent inclusion relationship between classes. A class can be defined to be a subset of two or more super classes (multiple inheritance). Since the goal is generating JAVA class skeletal codes and JAVA does not support multiple inheritance, such sentences are converted into JAVA class templates assuming that at least one of them is a class and the rest are interfaces.

 

 

 

 

 

 

 

 


Figure 8. An example of a hierarchy declaration sentence

Part of the ATN for TUJA detects hierarchy declarations as shown in Figure 7.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 9. ATN for ISA relationship declaration

4.     Knowledge Database

 

Knowledge database keeps all the class hierarchy resulting from the object oriented design and the skeleton of each class. There are several syntax rules supported to represent the relationship between the classes and their members. After parsing and understanding the command, TUJA converts the new input into an appropriate instance of a schema or modifies an existing schema in the knowledge database. PROLOG provides the advantage of keeping the instances in a relational database form. There are three basic schemata, supported by TUJA:

Class Schema

Method Schema

Attribute Schema

 

ISA hierarchy is embedded into the Class schema. TUJA assumes that in general, a noun in a sentence refers to a class or an object, and a verb refers to a method.

 

4.1 Method Schema

 

In order to retrieve the prototype of a method fully, verbs are categorized into four groups:

Consuming verbs, identifying the methods requiring a list of parameters with no return value

Example 1:

Ekmek yiyor” (He/she/it is eating bread)

Parameters: Ekmek (bread)

Return: void

 

Producing verbs, identifying the methods requiring no parameters and returning a value

Example 2:

“At tay doğurur” (A horse gives birth to a foal)

Parameters: none

Return: tay (foal)

 

Unaffecting verbs, identifying the methods requiring no parameters with no return value

Example 3:

“At hızlanır” (The horse speeds up)

Parameters: none

Return: void

 

Modifiying verbs, identifying the methods requiring a list of parameters and returning a value

Example 4:

İnsan undan sudan ekmek pişirir” (Man cooks bread from flour and water)

Parameters: un, su (flour, water)

Return: ekmek (bread)

 

There is a special case for modifying verbs:

“İnsan kuş avlar” (Man hunts for bird)

In this example, the question “what is the result of the hunt?” can be easily answered as “bird”. Similarly, the question “what does man hunt for?” can be answered as “bird” again. It is obvious that both the return value and parameter are the same, which is “bird”.

 

ATN will parse Turkish sentences, and if a sentence defines a method, during the categorization, nouns will be identified as a parameter or a return value of the related method. Then the data structure shown in will be generated. Using Example 4, TUJA yields:

method(cook, [flavour,water],[bread],public)

 

Since in Java, a method may have only one return value, it is assumed that if there are more than one return values as a list, because in natural languages we may use more than one return value in a sentence like “insan undan sudan ekmek borek pisirir” (Man cooks bread and pastry from flavour and water). In such a case we need to produce two functions, one with return type of bread, second with return type of pastry, which is function overloading. Still JAVA can not understand function overloading with different return types and same parameters, we need to distinguish these two functions, and we have solved this problem by implementing different function names.

method(                MethodName,

ListofParameters,

ListofReturnValues,

MethodSpecifier )

 

 
 

 

 

 

 


Figure 10. Data structure for methods

 

4.2 Attribute Schema and Class Schema

 

XXXX EXPLAIN how attributes are identified!!!

attribute( AttributeName,

 AttributeType,

 AttributeSpecifier )

 
 

 

 

 

 


Figure 11. Data structure for attributes

 

In this group of semantics, we have kept the information about classes. The general schema of a class is shown below:

class(     ClassName,

InheritedClass,

ListofAttributes,

ListofMethods,

ListofImplementedIntefaces,

               ClassSpecifier )

 

 

 
 

 

 

 

 

 

 

 


Figure 12. Data structure for classes

 

Here, ClassName keeps the name of class, InheritedClass keeps the name of class which this class inherits.  ListofAttributes is the attributes of this class in the form of List, because a class may have more than one attribute. Similiarly, we keep the methods in a list form. As we have previously mentioned in “3.4. Hierarchy Declaration  Sentences” section, we have implemented interfaces and we can keep the interfaces in this data field. In the ClassSpecifier field, we keep the access specifier of this class (e.g. public, private, protected).

As an example, the following input to TUJA generates the output in Figure 13:

insan diye bir kavram vardır” (there is a concept called human)

insan ekmek et ve balık yer” (human eats bread meat and fish)

insan balık avlar” (human hunts for fish)

insanin kilosu boyu vardır” (human has weight and height)

insan bir canlıdır” (human is a living thing)

insan trafik kurallarına uyar” (human obeys the traffic rules)

class(insan, canli, [attribute( weight, weight, _), attribute( height, height, _)], [method(eat, [bread, meat, fish], []. _), method(hunt, [fish], [fish], _)],traffic,_)

 
 

 

 

 

 

 


Figure 13. An example data structure generated TUJA using a given input.

5.     Experiments

 

A sample run is performed to test TUJA. Turkish sentences describing a Linked List are entered into the system as shown in . Then an output is generated in.

Liste soyut bir kavramdır.

(List is an abstract contept)

[list-interface]^definition

 

Liste elemanı sonaekle diye bir metoda sahiptir.

(List inserts element into tail)

[list-[ element]- (insertLast-al)]^method_interface

 

Liste elemanı başaekle diye bir metoda sahiptir.

(List inserts element into head)

[list-[ element]- (insertFirst-al)]^method_interface

 

Liste elemanı sıralıekle diye bir metoda sahiptir.

(List inserts element sorted)

[list-[ element]- (insertSorted-al)]^method_interface

 

Liste elemanı sonrakineekle diye bir metoda sahiptir.

(List inserts next element)

[list-[ element]- (insertNext-al)]^method_interface

 

Liste eleman çıkarır.

(List removes element)

[list-[ element]- (remove-al)]^method_interface

 

Liste durumdan eleman bulur.

(List finds element from position)

[list-[position]- element - (find-ver)]^method_interface

 

Liste yazar.

(List prints)

[list-(print-void)]^method_interface

 

 

Bağlıliste bir kavramdır.

(Linkedlist is a concept)

[linkedlist-class]^definition

 

Her bağlıliste bir listedir.

(All linkedlists are a list)

[linkedlist-list]^relation

 

Bağlıliste head adında Element tipinde korunan bir özelliğe sahiptir.

(Linkedlist has a protected attribute whose name is head and the type is Element)

[linkedlist-[head-element -protected]-class]^attribute

 

Bağlıliste tail adında Element tipinde korunan bir özelliğe sahiptir.

(Linkedlist has a protected attribute whose name is tail and the type is element)

[linkedlist-[tail-element-protected]-class]^attribute

 

Bağlıliste next adında Element tipinde korunan bir özelliğe sahiptir.

(Linkedlist has a protected attribute whose name is next and the type is element)

[linkedlist-[next-element-protected]-class]^attribute

 

Bağlıliste previous adında Node tipinde korunan bir özelliğe sahiptir.

(Linkedlist has a protected attribute whose name is previous and the type is element)

[linkedlist-[previous-element-protected]-class]^attribute

 

 

Bağlı-liste ölçü ile oluşur.

(Linkedlist is composed of size)

[linkedlist-[size-size-protected]-class]^attribute

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Element bir kavramdır.

(Element is a concept)

[element-class]^definition

 

Elemanın eleman adında korunan bir tamsayısı vardır.

(Element has a protected integer whose name is element)

[element-[element-int-protected]-class]^attribute

 

Elemanın next adında eleman tipinde korunan bir özelliği vardır.

(Element has a method whose name is next and the type is Element)

[element-[next-element-protected]-class]^attribute

 

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


6.     Conclusion

 

Natural language is far away from human created computer programming language. Since there are lots of computer illiterate people around the world, our project can help them to create smart looking computer designs by only typing simple sentences.

Since a formal language and a natural language can be examples to the incommensurability theory of Thomas Khun, we have achieved a very important improvement as a bilingual program implementation; there is still much work to handle.

 

7.     Acknowledgement

Special thanks to Prof. Dr. A. C. Cem SAY from Boğaziçi University for providing the morphological analyzer and great support.

 

References

 

[1]      M.U.Karakas, E. Inan, Current Status in Turkish Code Table Problem, Bilisim, Bogazici University, Istanbul, 1996

[2]      O. N. Darcan, An intelligent database interface for Turkish, M.Sc. Thesis, Bogazici University, Istanbul, 1991

[3]      S. Demir, Improved treatment of word meaning in a Turkish conversational agent, M.Sc. Thesis, Bogazici University, Istanbul, 2003.

[4]      S.E. Seker, Türkçe Doğal Dil Arayüzlü Bir Kişisel Takvim Programının, Tasarım ve Kodlamasi, TAINN 2003, Canakkale, Turkey, accepted.

[5]      S.E. Seker, A Personal Assistant with A Natural Language Interface in Turkish, M.Sc. Thesis, Yeditepe University, Istanbul, 2003.

[6]      K. Köymen

[7]      M.A. Covington, Natural Language Programming for Prolog Programmers, (Englewood Cliffs, NJ:Prentice-Hall, 1994)

[8]      Cetinoglu, A., Prolog Based Natural Language Processing Infrastructure for Turkish, M.Sc. Thesis, Bogazici University, 2001

[9]      J. Weizenbaum, ELIZA: A Computer Program for the Study of Natural Language Communication between Man and Machine, ACM Press, NY, USA, 1983, 23-28