Dienstag, 3. Juli 2007

From What We Code Complete

Code completion is an important part of modern development environments for textual languages. Over the past decades the proposals offered to the user during code completion have evolved from simple hippo completion (each word in the opened document is used as a proposal) to sophisticated proposals rooted in the used language semantics. But when we say code completion is based on language semantics, what semantics is suitable to provide code completion. For what language can we or can we not offer code completion?

The "Normal" Case, e.g. A Single Java File

In any case, and therefore in the normal case, we need more sophisticated model of the written text than the text itself. The edited text has to be parsed and transformed into a model that allows semantic analysis of the text. One possibility is to use a parse-tree and a library that allows to resolve variables, types, basically everything that can be referenced and therefore is possibly subject for code completion. Based on that a single edited file can be parsed and the information gathered can be used for code-completion proposals with in that file.

Using a Model That Spans Multiple Sources, e.g. A Java Project

The same technique as in the normal case, but we don't use just the model of one file. All the models of all the files organized in a project are combined. This does not only mean all the source files, but also names and types from library files or compiled object files, etc.

Using Context Information from External Models, e.g. OCL

Some languages, like OCL, do not only use the names and types that are defined using the language itselves. OCL for example is used to define expressions over models. It therefore references into these models using names from these models. Code completion in OCL often means to propose the names of properties and operations from external models.

Using Context Information from a Runtime System, e.g. Python

Some languages have only weak or no static types. This means that when you edit a program in such a language before the program runs, you don't exactly know what the types of your variables, parameters, etc. are. And with no information about the types, you don't know the properties, operations, or whatever features the values in your variables, etc. might provide. In such cases the development environment has to allow to edit the program at runtime. So you run your program and stop the execution at some point, then you start to edit your program. Now you have a completely different situation. Since you are at runtime, the environment can read the types from current variable, etc. values. From this runtime context an editor can provide code completion using runtime types.

Conclusion

Depending on the language semantics the sources for code completion proposals varies. This of course can make developing code completion less or more challenging. But you probably have to search really hard in order to find a language that would not allow any intelligent code-completion at all.