Dienstag, 19. Juni 2007

Problems for Textual Model Notations

Several frameworks and meta-tools to combine context-free text notations with meta-models exist. But the current research seems to be focused mainly on those syntactical structures that can be defined by context-free grammars solely. It is ignored that most languages, even though defined with context-free grammars, contain non context-free features. This is especially troublesome when you accept that meta-models describe certain non context-free features and therefore provide a different expressive power than grammars.

1. Textual notations are more than context free syntax


One could say that meta-models, in the sense of object-oriented data-models, e.g. class diagrams, were introduced to get a hold of the definition of graphical modelling languages. No matter if this reflects the "historical"-facts or not, meta-modelling allows abstract syntax definitions that are fundamentally different from definitions based on context-free grammars. Where such grammars basically define ordered-tree structures (better classified as term-like structures), meta-models describe graph-like structures.

Nowadays, with all the fancy meta-model based transformations, code-generation, model analysis, testing, integration, ..., and MDA tools available, we like to use meta-modelling for the definition for all kind of languages. This also includes traditional text-based languages. Therefore, meta-modelling has eventually entered the technological space of textual syntax, which makes dealing with the differences between grammars and meta-models an important research subject.

There two different approaches to meta-model based text languages. First, we still think of the language in terms of grammars and create a meta-model from a grammar. This way, we subsequently attach all meta-model benefits to a regular context-free grammar based language [xText, grammarware]. The second approach starts with the understanding that a language is best designed by concept, in a meta-modelling fashion. Here we create a textual notation for a formerly developed meta-model [TCS, TEF].

In both approaches promising results were achieved when it comes to reflect the expressive power of grammars. In other words, as long as we are concentrating on terms, creating models from text and text from models is implementable. This is usually based on some kind of mapping between grammars and meta-models.

As in traditional compiler design, syntax analysis (dealing with text based on a grammar) is followed by static semantic analysis (dealing with language features that exceed the power of context-free grammars). From the meta-modelling perspective this means two things. First, meta-models define references that allow them to describe graphs instead of trees. Second, model conditions, for example OCL constraint, that further narrow what valid meta-model instances are. Both aspects have corresponding concept s in the grammar world. The first one can be called name resolution. A resulting parse tree is augmented with attributes which connect nodes in a parse tree and basically turn this tree into a graph. The second are well-formedness rules that are based on the same principles than constraints in meta-modelling.

Consequently the next research question for both approaches (meta-models from grammars as well as textual notations for a meta-model) is how to integrate name-resolution and other static semantic checks.

2. Going beyond context free syntax

Our framework [TEF] allows to create eclipse-based text editors from a meta-model. Each of these editors continuesly parses the user input and creates a meta-model conform model from it. This part of TEF is based on work in [TCS, grammarware] and basically covers everything that is context-free. And now is the question how we incorporate the non context-free parts.

We divided the text reconciliation process into three phases. The first one is parsing, creating a model that reflects the tree structures of the text. The second phase is name-resolution. It transforms the model into a graph structure. In the third phase we check all constraints on the model. In all phases we report errors in the model back to the user: eclipse-like as a red underline with hover-message and ruler marking. Interesting for us are now the name-resolution and constraint checking phases.

2.1 Name resolution

At the moment TEF allows to provide the Java implementation of a name resolution algorithm. This implementation has to work on the following parameters : a model that does not yet contain references; a model element that describes the name that is to be resolved (a single name or a more complex identifier), and the surrounding context element. Name resolution has to return the element that is referenced by the name. For the future we try to replace this Java implementations by OCL expressions. This allows to describe name resolution on a higher level of abstraction. The challenge is to provide a technique thats is hopefully independent from the actual method of name resolution. Such methods could be symbol tables, local search, inside-out, recursive decent, etc.

2.2 Code-Completion

Closely related to name resolution is code-completion. Each name (reference, identifier, etc.) that the user may write in an editor is potentially subject to code-completion. Thus, code-completion refers to completing a name. Code-completion therefore means to provide a set of valid names/references in a given context. There are two problems to be solved.

First, from the context it is not clear of what kind the reference is to be completed. For example, think of Java: when you start to type an identifier in an expression, it is not clear if it becomes a field name or an operation name. This can be solved by parsing to the point of code-completion and then analyse the current parse stack to retrieve the possible reductions. These reduction possibilities reflect the possible kinds of elements that are requested.

The second problem is that you usually cannot successfully parse the document when the user requests code-assist, because the user is just in the middle of typing the document. Since you cannot parse the document, you cannot provide the necessary context information. You have to implement some sort of error recovery to at least generate a partial model from the text. This partial model has to serve as the source for possible name declarations. In other words, error recovery becomes an important requirement.

2.4 Syntactic sugar in the context of name resolution

Very often a language syntax provides different notations for the same model elements. Take Java again: a member variable "foo" can be referenced by "foo" from within the body of a member method, or it can be referenced as a field of the "this" variable using "this.foo". Where the more implicit notation "foo" has a concrete definition in the context-free grammar, no such thing exists in the meta-model. Therefore at some point, the implicit notation "foo" has to be resolved or replaced by its actual meaning "this.foo". This is especially troublesome because the string "foo" could also refer to a local variable. Whether "foo" means "this.foo" or just "foo" is a matter of static semantics. Currently TEF allows to define several meta-model bindings for the same syntactical constructs. Which binding is finally chosen depends on name resolution. If "foo" can be successfully resolved to a local variable name it becomes a reference to this local variable. If not, it is assumed that "foo" actually means "this.foo" and it is tried to find the name "foo" among the members of "this".

2.3 Constraint checking

This is rather straight forward. The model resulting from parsing and name resolution is checked based on all meta-model invariants and report an error when an invariant is violated.

3. Conclusion

Context-free grammars describe trees, meta-model graphs. Therefore, name-resolution is a trouble-some but very important part of creating a model from text. It is a necessity for both context-free syntax to meta-model approaches. In combination with semantic text editors it requires error recovery to provide reasonable code-completion.

Keine Kommentare: