GENERATING JAVA CLASS SKELETON USING A NATURAL LANGUAGE INTERFACE

Ender ÖZCAN, Şadi Evren ŞEKER, Zeynep İlknur KARADENİZ

Yeditepe University, Department of Computer Engineering

Artificial Intelligence Laboratory (AR+I)

26 Ağustos Yerleşkesi Kayışdağı/İstanbul

Turkey

Abstract. An intelligent natural language interface based on Turkish Language is designed for creating Java class skeleton, listing the class and its members. This interface is developed as a part of a project named as TUJA, a tool for producing Java programs using Turkish sentences. Turkish sentences are converted into instances of schemata, representing classes and their members.  Concept hierarchies are utilized for building the classes and their hierarchical representation for Java class skeleton generation. In this paper, the details of the design and the implementation are described and a sample run is provided.

1   Introduction

Programming languages are machine processible, precise and mostly unambiguous with predefined syntax and semantics. Still, a novice programmer spends a lot of effort in learning syntactic rules and at the same time developing general programming skills.  Even an experienced programmer may face the same problems, if the programming language is a new one. On the other hand, natural languages are more declarative, flexible, powerful and richer, being useful even for occasional users. Also, the programmer may not know the language used in the resources, such as books, to learn a new programming language.

There are visual tools for creating object oriented designs, furthermore, generating Java/C++ skeletal programs, such as Rational Rose (an IBM product). Turkish to Java (TUJA) is a natural language processing (NLP) application, designed with two modes of operation, where each mode is to be implemented as a phase. First phase involves in building an interface for creating a skeletal Java program, including all classes, their attributes (data) and prototypes of member methods of each class. Second phase involves in enlarging the functionality of the same interface to convert each skeletal class into full Java programs by allowing users to express them in Turkish sentences. In this paper, details of the first phase of TUJA projects are described. TUJA accepts Turkish sentences, describing a class, a member method or a member attribute of a class, using a conversational front end. Then the input is fed into an augmented transition network (ATN) [1] for parsing and semantic analysis. At the end of this process knowledge database is updated using the current command. Knowledge is represented using schemata. At any instant, the user can ask TUJA to produce the Java skeletal code, saving it into a file. Architecture of TUJA is illustrated in Fig. 1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig. 1. . Framework of  TUJA.

2   Natural Language Processing using Turkish

NLP consists of 5 layers: morphology, syntax, semantics, pragmatics and phonetics ([2], [10]). Due to our scope and purposes we have limited our work in morphology and especially in syntax and semantics layers.

Turkish is one of the most widely spoken languages in the world, distributed over a large geographical region in Europe and Asia, as pointed in [6] based on United Nations sources. Note that there are many Turkish dialects, such as the Azeri, the Türkmen, the Tartar, the Uzbek, the Baskurti, the Nogay, the Kyrgyz, the Kazakh, the Yakuti, the Cuvas. Turkish is similar to Mongolian, Manchu-Tungus, Korean belonging to the same family of languages: the Altaic branch of the Ural-Altaic family.  There are 29 letters based on the Latin alphabet in Turkish, including 8 vowels. There is a vowel harmony in Turkish words. Words do not have gender. In Turkish sentences adjectives precede nouns.  It is unfortunate that there are a few number of natural language applications ([1], [3],[5], [7]-[9]) based on Turkish language due to its agglutinative nature.

The same suffix can be attached to different words in different ways. Sometimes, a vowel or a consonant towards the end of a word may deform. For this reason, morphological analysis in Turkish is not straightforward as shown in Table 1.

Table 1. Some deformation examples in Turkish words due to suffixes.

 

Word

(Stem) + Suffixes

görünürlerde   (in sight)

(gör) + ün + ür + ler +de   (görmek - to see)

ağaca   (towards the tree)

(ağaç) + a   (tree)

ağlıyor   (he/she/it is crying)

(ağla) + yor   (ağlamak – to cry)

 

There are seven morphological categories in Turkish: nouns, private nouns, compound nouns, adjectives, verbs, adverbs and conjunctions. In Turkish, another difficulty rises due to the syntax. Sentences with different syntaxes using the same words are allowed in Turkish, yielding a group of sentences with the same meaning as illustrated in Table 2. The common property of all these three sentences is a feature of Turkish  language, that is, the verb appears at the end of the sentences.

Table 2. Turkish sentences with different syntax having the same meaning.

 

Sentence (I gave the book to the child)

Çocuğa kitabı ben verdim

Çocuğa ben kitabı verdim

Ben çocuğa kitabı verdim

3   Morphology, Syntax and Semantics

It is assumed that Object Oriented Programming terminology is known. Morphology of TUJA is inherited from a previous project, TUSA [6] based on PROLOG. The sentences are categorized into four different groups: (a) Class Declaration Sentences, (b) Attribute Declaration Sentences, (c) Method Declaration Sentences, (d) Relation Declaration Sentences. All possible syntax types are supported to create an abstract model representing the classes. An augmented transition network (ATN) is developed for TUJA interface. HASA relationship is used for composing classes and ISA relationship is used for building the class hierarchy. TUJA assumes that in general, a noun in a sentence refers to a class, interface or an object, and a verb refers to a method.

3.1   Class Declaration Sentences

This group of sentences is used to create a new class or name an existing class as shown in Fig. 2b. Note that declaration of abstract classes; Java interfaces are also supported.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a)

 

(b)

 
 


Fig. 2 (a) ATN, (b) some sample sentences for class declaration sentences.

Part of the ATN for TUJA detects class declarations as illustrated in Fig. 2a. Nominalverb component alone and combined with the Modifier component in the ATN determines whether a class is abstract or not.

3.2 Attribute Declaration Sentences

This group of sentences is used to define the attributes of an existing class or to define a new class with specified attributes as shown Fig. 3. HAS relation represents the inclusion relationship, determining the elements included by an object. In other words, HAS relation is used to define the members of a class. Part of the ATN for TUJA detects attribute declarations as shown in Fig. 3.

3. 3 Method Declaration Sentences

This group of sentences is used to define the methods of predefined classes or to define a new class with specified member methods as shown in Fig. 4b. Part of the ATN for TUJA determines method declarations as shown in Fig. 4a.