UAST - Unified Abstract Syntax Tree
UAST (Unified Abstract Syntax Tree) is an abstraction layer on the PSI of different JVM languages. It provides a unified API for working with common language elements like classes and method declarations, literal values, and control flow operators.
Motivation
Different JVM languages have their own PSI, but many IDE features like inspections, gutter markers, reference injection, and many others work the same way for all these languages.
Using UAST allows providing features that will work across all supported JVM languages using a single implementation.
Presentation Writing IntelliJ Plugins for Kotlin offers a thorough overview of using UAST in real-world scenarios.
When should I use UAST?
For plugins, that should work for all JVM languages in the same way.
Some known examples are:
Which languages are supported?
Java: full support
Kotlin: full support
Scala: beta, but full support
Groovy: declarations only, method bodies not supported
What about modifying PSI?
UAST is a read-only API. There are experimental UastCodeGenerationPlugin
and JvmElementActionsFactory
classes, but they are currently not recommended for external usage.
Working with UAST
The base element of UAST is UElement
. All common base sub-interfaces are located in the declarations and expressions directories of the uast module.
All these sub-interfaces provide methods to get the information about common syntax elements: UClass
about class declarations, UIfExpression
about conditional expressions, and so on.
PSI to UAST Conversion
To obtain UAST for given PsiElement
of one of supported languages, use UastFacade
class or UastContextKt.toUElement()
:
To convert PsiElement
to the specific UElement
, use one of the following approaches:
for simple conversion:
UastContextKt.toUElement(element, UCallExpression.class);element.toUElement(UCallExpression::class.java)for conversion to one of different given options:
UastFacade.INSTANCE.convertElementWithParent(element, new Class[]{UInjectionHost.class, UReferenceExpression.class});UastFacade.convertElementWithParent(element, UInjectionHost::class.java, UReferenceExpression::class.java)in some cases,
PsiElement
could represent severalUElement
s. For instance, the parameter of a primary constructor in Kotlin isUField
andUParameter
at the same time. When needing all options, use:UastFacade.INSTANCE.convertToAlternatives(element, new Class[]{UField.class, UParameter.class});UastFacade.convertToAlternatives(element, UField::class.java, UParameter::class.java)
UAST to PSI Conversion
Sometimes it's required to get from the UElement
back to sources of the underlying language. For that purpose, UElement#sourcePsi
property returns the corresponding PsiElement
of the original language.
The sourcePsi
is a "physical" PsiElement
, and it is mostly used for getting text ranges in the original file (e.g., for highlighting). Avoid casting the sourcePsi
to specific classes because it means falling back from the UAST abstraction to the language-specific PSI. Some UElement
are "virtual" and thus do not have sourcePsi
. For some UElement
, the sourcePsi
could be different from the element from which the UElement
was obtained.
Also, there is a UElement#javaPsi
property that returns a "Java-like" PsiElement
. It is a "fake" PsiElement
to make different JVM languages emulate Java language to keep compatibility with Java-API. For instance, when calling MethodReferencesSearch.search(PsiMethod)
, only Java natively provides PsiMethod
; other JVM languages thus provide a "fake" PsiMethod
via UMethod#javaPsi
.
Note that UElement#javaPsi
is physical for Java only. Thus UElement#sourcePsi
should be used to obtain text-range or an anchor element for inspection warnings/gutter marker placement.
In short:
sourcePsi
:
is physical: represents a real existing
PsiElement
in the sources of the original languagecan be used for highlighting, PSI modifications, creating smart-pointers, etc.
should not be cast unless absolutely required (for instance, handling a language-specific case)
javaPsi
:
should be used only as a representation of JVM-visible declarations:
PsiClass
,PsiMethod
,PsiField
for getting their names, types, parameters, etc., or to pass them to methods that accept Java-PSI declarationsnot guaranteed to be physical: could not exist in sources
is not modifiable: calling modification methods could throw exceptions for non-Java languages
Note: both sourcePsi
and javaPsi
can be converted back to the UElement
.
UAST Visitors
In UAST there is no unified way to get children of the UElement
(though it is possible to get its parent via UElement#uastParent
). Thus, the only way to walk the UAST as a tree is passing the UastVisitor
to UElement.accept()
method.
Note: there is a convention in UAST-visitors that a visitor will not be passed to children if visit*()
returns true
. Otherwise, UastVisitor
will continue the walk into depth.
UastVisitor
can be converted to PsiElementVisitor
using UastVisitorAdapter
or UastHintedVisitorAdapter
. The latter is preferable as it offers better performance and more predictable results.
As a general rule, it's recommended to abstain from using UastVisitor
: if you don't need to process many UElement
s of different types and if the structure of elements is not very important, then it is better to walk the PSI-tree using PsiElementVisitor
and convert each PsiElement
to its corresponding UAST explicitly via UastContext.toUElement()
.
UAST Performance Hints
UAST is not a zero-cost abstraction: some methods could be unexpectedly expensive for some languages, so be careful with optimizations because it could yield the opposite effect.
Converting to UElement
also could require resolve for some languages in some cases, again, possibly unexpectedly expensive. Converting to UAST should be performed only when necessary. For instance, converting the whole PsiFile
to UFile
and then walk it solely to collect UMethod
declarations is inefficient. Instead, walk the PsiFile
and convert each encountered matching element to UMethod
explicitly.
UAST is lazy when you pass visitors to UElement.accept()
or getting UElement#uastParent
.
For really hard performance optimisation consider using UastLanguagePlugin.getPossiblePsiSourceTypes()
to pre-filter PsiElement
s before converting them to UAST.
UAST Caveats
ULiteralExpression should not be used for strings
ULiteralExpression
represents literal values like numbers, booleans, and string. Although string values are also literals, ULiteralExpression
is not very handy to work with them. For instance, it doesn't handle Kotlin's string interpolations. To process string literals when evaluating their value or to perform language injection, use UInjectionHost
instead.
sourcePsi and javaPsi, psi and UElement as PSI
For historical reasons, the relations between UElement
and PsiElement
are complicated. Some UElement
s implement PsiElement
; for instance, UMethod
implements PsiMethod
. It is strongly discouraged to use UElement
as PsiElement
, and Plugin DevKit provides a corresponding inspection (Plugin DevKit | Code | UElement as PsiElement usage). This "implements" is considered deprecated and might be removed in the future.
Also, there is UElement#psi
property; it returns the same element as javaPsi
or the sourcePsi
. As it is hard to guess what will be returned, it is also deprecated.
Thus sourcePsi
and javaPsi
should be the only ways to obtain PsiElement
from UElement
. See the corresponding section.
Should I use UMethod or PsiMethod, UClass or PsiClass ?
UAST provides a unified way to represent JVM compatible declarations via UMethod
, UField
, UClass
, and so on. But at the same time, all JVM language plugins implement PsiMethod
, PsiClass
, and so on to be compatible with Java. These implementations could be obtained via UElement#javaPsi
property.
So the question is: "What should I use to represent the Java-declaration in my code?". The answer is: We encourage using PsiMethod
, PsiClass
as common interfaces for Java-declarations regardless of the JVM language and discourage exposing the UAST interfaces in the API.
Note: for method bodies, there are no such alternatives, so exposing, for instance, the UExpression
is not discouraged. Still, consider exposing the raw PsiElement
instead.
UAST/PSI Tree Structure Mismatch
UAST is an abstraction level on top of PSI of different languages and tries to build a unified tree (see Inspecting UAST Tree). It leads to the fact that the tree structure could seriously diverge between UAST and original language, so no ancestor-descendant relation preserving is guaranteed.
For instance, the results of:
could be different, not only in the number of elements, but also in their order.
Using UAST in Plugins
To register extensions applicable to UAST, specify language="UAST"
in plugin.xml.
Inspecting UAST Tree
To inspect UAST Tree, invoke internal action.
Inspections
Use AbstractBaseUastLocalInspectionTool
as base class and specify language="UAST"
in registration. If inspection targets only a subset of default types (UFile
, UClass
, UField
, and UMethod
), specify UElement
s as hints in overloaded constructor to improve performance.
Line Marker
Use UastUtils.getUParentForIdentifier()
or UAnnotationUtils.getIdentifierAnnotationOwner()
for annotations to obtain suitable "identifier" element (see Line Marker Provider for details).