Donnerstag, 21. Juni 2007

Different Ways to Define the Operational Semantics of a Language Based on its Meta-Model

Since meta-modelling became a successful method to define abstract language syntax, many research groups started to define the operational semantics of language also based on a meta-model. A variety of methods have been developed. This article briefly explains three different categories of methods and asks the question of which of the three approaches is the best one.

1. Structural Operational Semantics and Meta-Modelling

What is the abstract structure of operational semantics definitions: Plotkin's [1] classical approach to operational semantics is based on state transition systems. All the possible configurations of a runnable system are defined through states. There are transitions between states; they define which configuration may be followed by which other configuration. A system behaviour is one path leading from a starting configuration, following transitions. A language semantics can then be defined with (1) a set of configurations/states (this includes all possible programs + runtime information like variable assignments, memory, etc.); (2) possible transitions between configurations; and (3) something that selects one (or more [non-determinism]) behaviours, where the initial configuration/state is given by a program or model written in the language.

How can operational semantics be defined based on a meta-model? Meta-modelling allows to define configurations/states. A meta-model is an object-oriented data model that describes a set of models consisting of a finite number of objects, links between object, and attributes of objects. Meta-models can be used to define the abstract syntax of a language (remember that the model or program is one part of a configuration/state) and it can also be used to define data-structures that hold runtime information (like variable assignments, memory, etc.).

2. Three Ways of Defining Transitions and Behaviours Based on a Meta-Model

Even though we know that configurations can be described by meta-models, there are still several ways to define possible transitions and system behaviours. We want to briefly introduce three different approaches.

2.1 Code-Generation

Based on the language concepts given in a meta-model, a code generation can be used to generate executable code from a model. This code can use a mixture of model and programming language variables/states to define a system configuration. The executed code describes state changes by changing the model and/or changing variables within the generated program.

2.2 Action Languages

Meta-modelling defines a set of predefined actions. These actions are things like: create a object, set an attribute, create a link, delete an object, etc. These actions describe very small, atomic changes in a model. Actions can be scheduled from various forms of action languages. Examples are UML activities or most imperative programming languages. Such languages use expressions over models to create decisions and parameters for the actions. Blocks written in those action languages can be linked to the meta-model on two modularisation levels. One possibility is that operations are implemented in an action languages. In this case one operation implementation can call other operations. The whole semantics descriptions feels and behaves like a normal object-oriented program. The other possibility is to provide a behaviour implementation for classes, where each object (class-instance) runs its behaviour once it got created/instantiated.

2.3 Model Transformations

Another approach is model transformation. Using a transformation language, one can define more complicated model transitions. We are here focusing on rule-based transformations, where each rule describes a transition between two models. Such a rule usually selects a part in model that shall be replaced (lhs) and a sub-model (rhs) that is used to replace the lhs. Model transformation allow more complex, user defined transitions between configurations/states.

For rule-based transformations also several ways of scheduling exists. For example, transformations can also be choreographed using action languages, where the execution of a transformation rule becomes an action. Transformation rules can also be grouped with logical operations: one rule can only be applied when another rule could already be applied, or several rules that can be fired are alternatives to each other, where one of the rules is either selected non-deterministically or via rule prioritisation.

3. What is the Best Way of Describing Operational Semantics?

Maybe we should first ask: is there a best way of describing operational semantics? All the methods that we just described, with all their variations that exist out there, have their own advocates, convinced of the advantages of their method. Also do all research groups have their own examples that seem to proof how efficient and elegant their methods are. But how can we really determine what the characteristics of a method really are and for what kind of languages each method is the most suitable?

We plan to take a genuine DSL, that was developed at our institute, and we want to create operational semantics for that languages with the three different approaches. The researched DSL describes stream based data processing. It already has an operational semantics based on scheme-code generation. During our studies on an agile language engineering process we will create an action language-based operational semantics with the method presented here. Furthermore, we will create a transformation-based operational semantics, using a QVT implementation.

We expect to see that different parts of the language will cause troubles when implemented with the three different methods. We hope that we can identify certain language concept characteristics that may favor the one or other method. Furthermore we expect the three solutions to have different qualities when it comes to size of the semantics description, execution performance, scalability, reusability, easyness of combining the language with other or with a concrete system platfrom, and so on and so on.

Dienstag, 19. Juni 2007

Problems for Textual Model Notations

Several frameworks and meta-tools to combine context-free text notations with meta-models exist. But the current research seems to be focused mainly on those syntactical structures that can be defined by context-free grammars solely. It is ignored that most languages, even though defined with context-free grammars, contain non context-free features. This is especially troublesome when you accept that meta-models describe certain non context-free features and therefore provide a different expressive power than grammars.

1. Textual notations are more than context free syntax

One could say that meta-models, in the sense of object-oriented data-models, e.g. class diagrams, were introduced to get a hold of the definition of graphical modelling languages. No matter if this reflects the "historical"-facts or not, meta-modelling allows abstract syntax definitions that are fundamentally different from definitions based on context-free grammars. Where such grammars basically define ordered-tree structures (better classified as term-like structures), meta-models describe graph-like structures.

Nowadays, with all the fancy meta-model based transformations, code-generation, model analysis, testing, integration, ..., and MDA tools available, we like to use meta-modelling for the definition for all kind of languages. This also includes traditional text-based languages. Therefore, meta-modelling has eventually entered the technological space of textual syntax, which makes dealing with the differences between grammars and meta-models an important research subject.

There two different approaches to meta-model based text languages. First, we still think of the language in terms of grammars and create a meta-model from a grammar. This way, we subsequently attach all meta-model benefits to a regular context-free grammar based language [xText, grammarware]. The second approach starts with the understanding that a language is best designed by concept, in a meta-modelling fashion. Here we create a textual notation for a formerly developed meta-model [TCS, TEF].

In both approaches promising results were achieved when it comes to reflect the expressive power of grammars. In other words, as long as we are concentrating on terms, creating models from text and text from models is implementable. This is usually based on some kind of mapping between grammars and meta-models.

As in traditional compiler design, syntax analysis (dealing with text based on a grammar) is followed by static semantic analysis (dealing with language features that exceed the power of context-free grammars). From the meta-modelling perspective this means two things. First, meta-models define references that allow them to describe graphs instead of trees. Second, model conditions, for example OCL constraint, that further narrow what valid meta-model instances are. Both aspects have corresponding concept s in the grammar world. The first one can be called name resolution. A resulting parse tree is augmented with attributes which connect nodes in a parse tree and basically turn this tree into a graph. The second are well-formedness rules that are based on the same principles than constraints in meta-modelling.

Consequently the next research question for both approaches (meta-models from grammars as well as textual notations for a meta-model) is how to integrate name-resolution and other static semantic checks.

2. Going beyond context free syntax

Our framework [TEF] allows to create eclipse-based text editors from a meta-model. Each of these editors continuesly parses the user input and creates a meta-model conform model from it. This part of TEF is based on work in [TCS, grammarware] and basically covers everything that is context-free. And now is the question how we incorporate the non context-free parts.

We divided the text reconciliation process into three phases. The first one is parsing, creating a model that reflects the tree structures of the text. The second phase is name-resolution. It transforms the model into a graph structure. In the third phase we check all constraints on the model. In all phases we report errors in the model back to the user: eclipse-like as a red underline with hover-message and ruler marking. Interesting for us are now the name-resolution and constraint checking phases.

2.1 Name resolution

At the moment TEF allows to provide the Java implementation of a name resolution algorithm. This implementation has to work on the following parameters : a model that does not yet contain references; a model element that describes the name that is to be resolved (a single name or a more complex identifier), and the surrounding context element. Name resolution has to return the element that is referenced by the name. For the future we try to replace this Java implementations by OCL expressions. This allows to describe name resolution on a higher level of abstraction. The challenge is to provide a technique thats is hopefully independent from the actual method of name resolution. Such methods could be symbol tables, local search, inside-out, recursive decent, etc.

2.2 Code-Completion

Closely related to name resolution is code-completion. Each name (reference, identifier, etc.) that the user may write in an editor is potentially subject to code-completion. Thus, code-completion refers to completing a name. Code-completion therefore means to provide a set of valid names/references in a given context. There are two problems to be solved.

First, from the context it is not clear of what kind the reference is to be completed. For example, think of Java: when you start to type an identifier in an expression, it is not clear if it becomes a field name or an operation name. This can be solved by parsing to the point of code-completion and then analyse the current parse stack to retrieve the possible reductions. These reduction possibilities reflect the possible kinds of elements that are requested.

The second problem is that you usually cannot successfully parse the document when the user requests code-assist, because the user is just in the middle of typing the document. Since you cannot parse the document, you cannot provide the necessary context information. You have to implement some sort of error recovery to at least generate a partial model from the text. This partial model has to serve as the source for possible name declarations. In other words, error recovery becomes an important requirement.

2.4 Syntactic sugar in the context of name resolution

Very often a language syntax provides different notations for the same model elements. Take Java again: a member variable "foo" can be referenced by "foo" from within the body of a member method, or it can be referenced as a field of the "this" variable using "". Where the more implicit notation "foo" has a concrete definition in the context-free grammar, no such thing exists in the meta-model. Therefore at some point, the implicit notation "foo" has to be resolved or replaced by its actual meaning "". This is especially troublesome because the string "foo" could also refer to a local variable. Whether "foo" means "" or just "foo" is a matter of static semantics. Currently TEF allows to define several meta-model bindings for the same syntactical constructs. Which binding is finally chosen depends on name resolution. If "foo" can be successfully resolved to a local variable name it becomes a reference to this local variable. If not, it is assumed that "foo" actually means "" and it is tried to find the name "foo" among the members of "this".

2.3 Constraint checking

This is rather straight forward. The model resulting from parsing and name resolution is checked based on all meta-model invariants and report an error when an invariant is violated.

3. Conclusion

Context-free grammars describe trees, meta-model graphs. Therefore, name-resolution is a trouble-some but very important part of creating a model from text. It is a necessity for both context-free syntax to meta-model approaches. In combination with semantic text editors it requires error recovery to provide reasonable code-completion.

Montag, 11. Juni 2007

Easy Language Interpreters

In a recently published tutorial, we demonstrate a new way of defining an operational semantics and therefore automatically create an interpreter for meta-model based languages. The tutorial models a composite state automaton language including a simulator for such automatons. This article provides a motivation and overview about the presented method.

How complicated has it to be to create an interpreter for a language? An interpreter simply realizes the operational semantics of a language and allows to execute models or programs in that language. There are a lot of methods tailored for a mathematical precise semantics definition, and some of them, for example ASMs, even allow to automatically derive an interpreter from a semantics description. But in the end these methods are suited for language understanding. They can, for example, be used to proof properties like static safety. But methods like this are usually to cumbersome to be efficient. Hence, they are debilitating the design process of a language. When you want to try new language features, test for the user acceptance of a new behavior, or simply need a language tool fast, formal methods are a burden and not help.

We propose a technique that allows you to derive a language interpreter quickly: without learning a new formalism or new methodology. We simply take existing modeling techniques and combine them towards a new application: defining the operational semantics of a language. We start with regular meta-modeling, for example with MOF. We use A MOF 2.0 for Java to do this. When you have a meta-model for a language, you can create operations within this meta-model. These operations declaration will act as the interface to the language behavior. The implementations of operations, can modify a model. Operation implementations can apply simple actions like creating an element, set a value to an attribute, etc. When executed, each operation therefore leads to a sequence of model changes. Of course, operations can call other operations and one operation as to act as the main operation.

Viewed from this operations and implementations perspective, a model is simply an object-oriented program. From the semantics perspective we have defined a meta-model for possible system states (the MOF meta-model) and the operations and implementations define a sequence of transitions (model changes lead to a sequence of models). This is analog to the classical structural operational semantics definition as introduced by Plotkin in the early 80s.

How can you implement the operations in a meta-model. You can, of course, simply provide Java implementations. Ifyou want to use a more abstract behavior description, we provide another method based on UML Activities and OCL. We propose an UML Activitiy like language that uses predefined actions (create, set, delete, call another operation, etc.) as atomic activities. Control flow in this language is created with OCL expressions, and the actions can be parameterized with OCL expressions. The activity language can deal with operation parameters and return values.

With both implementation methods, Java and activities, you have a choice. Activities provide the higher abstraction level and are more readable. Java implementations perform better when executed and allow to use other Java APIs. With Java implementations you are not restricted to model actions. You can therefore create semantics that are connected to underlying platform. This makes it easier to integrate your language and models with your target platform. You can also mix the use of activities and Java and implement one operation with Java and another with activities.

What do you gain compared to other methods. Traditional mathematical methods are to cumbersome to be effective in the language design process. Other meta-model based approaches require you to learn a new action language. We only use existing languages and methodologies. If you know meta-modeling, object-orientation, activities, OCL, and Java, you can immediately start to create language and interpreter. The use of meta-modeling, especially with the CMOF model, allows very flexible and reusable language definitions. It even allows a pattern approach for languages. The system state is a model (instance of your meta-model) and you can apply other modeling techniques to the system state: invariants and conditions, model transformations, model based tests, or XMI based persistence. We provide a pragmatic way to model operational semantics and create language interpreters. Its an ideal complement to technologies language GMF, openArchitectureWare, and other language workbenches.

If you want to learn more, try our tutorial and this paper on our method.