Custom language plugin - generating parser with PsiGen?

Hi,

I'm looking into writing a plugin for adding custom language support.

I have looked through the documentation for generating "tokens / lexer / parser" compatible with the ReSharper api.

So far I have only been able to generate Tokens and Lexer, but I'm having trouble understanding how the .psi files work - Is there any documentation on how these work? (btw, I have also installed the PsiPlugin, so the editing in visual studio is pretty cool :) )

I do understand that it expresses grammars (in combination with ReSharper api concepts), but I haven't been able to run the PsiGen to generate any successfull output so far :/


I wrote this simple psi testfile to check how it works:

options {
  parserPackage="SimplePsiTest.Psi.Psi.Parsing";
  parserClassName="SimpleParserGenerated";
  psiInterfacePackageName="SimplePsiTest.Psi.Psi.Tree";
  psiStubsPackageName="SimplePsiTest.Psi.Psi.Tree.Impl";
  psiStubsBaseClass="SimpleCompositeElement";
  tokenTypePrefix="";
  parserTargetSubdir="Parsing/Psi";
  psiInterfacesTargetSubdir="Psi/Psi/Tree";
  psiStubsTargetSubdir="Psi/Psi/Tree/Impl";
  elementTypePrefix="";
  visitorClassName="TreeNodeVisitor";
  visitorMethodSuffix="";
  testTargetSubdir="Psi/Test/Psi";
  disableReflection;
  separateHierarchies;
  "treeElementClassFQName"="JetBrains.ReSharper.Psi.ExtensionsAPI.Tree.TreeElement";
  "compositeElementClassFQName"="SimpleCompositeElement";
  "psiElementVisitorClassFQName"="SimplePsiTest.Psi.Psi.Tree.TreeNodeVisitor";
  "tokenTypeClassFQName"="SimplePsiTest.Psi.Psi.Parsing.SimpleTokenType";
  "visitMethodPrefix"="Visit";
  "lexerClassName"="JetBrains.ReSharper.Psi.Parsing.ILexer";
  "psiClassesPrefix"="";
  "psiElementClassFQName"="SimplePsiTest.Psi.Psi.Tree.IPsiTreeNode";
  customImplPackage="SimplePsiTest.Psi.Psi.Tree.Impl.Custom";
  customInterfacePackage="SimplePsiTest.PsiPlugin.Psi.Psi.Tree.Custom";
  "interfaceNamePrefix"="I";
  "tokenElementClassFQName"="JetBrains.ReSharper.Psi.Tree.ITokenNode";   
  "customImplSuffix"="";
  "objectClassFQName"="System.Object";
  tokenBitsetThreshold=4;
  elementTypeBaseClass="SimplePsiTest.Psi.Psi.Tree.SimpleCompositeNodeType";
  parserMessagesClass="SimplePsiTest.Psi.Psi.Parsing.SimpleParserMessages";
  generateWorkingPsi;
}


errorhandling simpleFile options {
stubBase="SimpleFileElement";
}
:
    PUBLIC
    CLASS
    IDENTIFIER
    LBRACE
    RBRACE
  ;



The following are defined in tokens.xml:

PUBLIC
CLASS
IDENTIFIER
LBRACE
RBRACE



However, when running this with the PsiGen tool, i'm getting the following error (I'm not sure, but perhaps):

PS E:\csharp\libs\resharper sdk\ReSharperSDK-8.0.952\tools\psigen> .\PsiGen -v -s "tmppsi\" "tmppsi\simple.psi"
Follows for simpleFile:System.NullReferenceException: Object reference not set to an instance of an object.
   at JetBrains.ReSharper.PsiGen.Grammar.PrintFollows(TextWriter outputStream)
   in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Grammar.cs:line 375
          at JetBrains.ReSharper.PsiGen.Grammar.GenerateParser(CodeWriter printer)
          in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Grammar.cs:line 117
               at JetBrains.ReSharper.PsiGen.Program.GenerateParser()
               in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Program.cs:line 209
                    at JetBrains.ReSharper.PsiGen.Program.CompileGrammar()
                    in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Program.cs:line 108
                         at JetBrains.ReSharper.PsiGen.Program.Execute()
                         in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Program.cs:line 50
                         
tmppsi\simple.psi: error :An internal error has occurred. The following exception has been thrown:System.NullReferenceException: Object reference not set to an instance of an object.
   at JetBrains.ReSharper.PsiGen.Grammar.PrintFollows(TextWriter outputStream)
   in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Grammar.cs:line 375
          at JetBrains.ReSharper.PsiGen.Grammar.GenerateParser(CodeWriter printer)
          in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Grammar.cs:line 117
               at JetBrains.ReSharper.PsiGen.Program.GenerateParser()
               in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Program.cs:line 209
                    at JetBrains.ReSharper.PsiGen.Program.CompileGrammar()
                    in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Program.cs:line 108
                         at JetBrains.ReSharper.PsiGen.Program.Execute()
                         in c:\ReSharper\Psi.Features\tools\PsiGen\src\PsiGen\src\Program.cs:line 50
1 error(s)


Do I need to make sure the generated token classes are included in the args for the PsiGen tool?
Could someone point out what steps I need to do? (are there any dependencies between the steps?)


Also while I'm at it, perhaps someone could help me understand what the following means in the PSI files?:

errorhandling psiFile options { //What does the "errorhandling" mean?
     //These seem to be general options regarding the generation of the "class/s" that will be generated?
     stubBase="PsiFileElement";
}
extras{
     //These seem to generate extrea "properties" for easy access to specific nodes in the node Tree?
     get {methodName = "Interfaces" path = <psiFile:PSI_INTERFACES> }; //the <psiFile:PSI_INTERFACES> seems to specify the "name" of the element to be searched
     get {methodName = "Paths" path = <psiFile:PSI_PATHS> }; //the <psiFile:PSI_PATHS> seems to specify the "name" of the element to be searched
}
       :
       pathsDeclaration<PSI_PATHS> //What does the "<PSI_PATHS>" mean? - will this grammar rule be identified "PSI_PATHS" - so that it can be searched for by this name?
       interfacesDefinition<PSI_INTERFACES>
  ;

errorhandling interface ruleDeclaration {...} // What does the "errorhandling" and "interface" keywords mean?

private ruleDeclaration {...} // Same here? what does "private" mean?



Thank you for reading this far :) - and a bunch more if you have some pointers for me ;)

Regards
/Peter

2 comments
Comment actions Permalink

Hi,

As to my first question - the one with the exception beeing thrown:
    - I decompiled the PsiGen tool - and it is an issue that happens when using the "-v", verbose printing... since I don't really need it I just skipped this flag and it works and generates the classes just fine.

So - if you are having the same issue, try running the tool without the "-v" (simple enough heh :) )

As to my other question about the format - I'm still not clear on the syntax... would be nice with some help on this :)
    - The "general" syntax is close enough to "antrl syntax" - so that is just fine for me, but I'm still not sure about the "extra special stuff"

So - if you also are wondering about the syntax, you can start off by looking upp syntax for "antlr", just google it and you will find lots of info on that.

Thank you,
/Peter

0
Comment actions Permalink

Hi Peter. Unfortunately, PsiGen is currently undocumented. The best documentation is the .psi file in the psiplugin sample in the SDK. I am actively looking at improving this situation.

As for some of your questions:

  • errorhandling changes what gets generated. If it's set, it looks to see if the currently found token is part of the "follows" list. If it isn't, it returns an error message.
  • IIRC, the PSI_PATHS and PSI_INTERFACES are the child role node types (defined as integers) that these properties are exposing. So, the PsiFile type has a property "Interfaces", which returns all child nodes that are declared as PSI_INTERFACES. Similarly, Paths returns all nodes that are of type PSI_PATHS.
  • errorhandling interface ruleDeclaration - means generate error messages + create an interface for this rule, not just a type
  • private ruleDeclarations are rule declarations that are private to the parsing process, and do not expose a node in the AST


Sadly, using psiGen to create custom file parsers is currently very tricky. I'm hoping we can improve things - at least in terms of documentation - in the short-to-medium term future.

Thanks
Matt

0

Please sign in to leave a comment.