4

Our company uses own (built in here) scripting language for programming, but they would like to create interpreter that will translate this script codes to Java. This scripting language is quite serious, so it's no small thing.

I've been asked about this task, but it doesn't seem like trivial challenge. Now before I do anything stupid and start writing billions of lines of parsing, what should I know? Where should I start to make this properly?

PS: I want to translate script files to .java sources, not directly to bytecode.

4
  • 1
    Any particular reason for converting directly to Java and not running via a Java interpreter (i.e. a Java-based script engine for your scripting language)? Commented Jul 28, 2011 at 7:29
  • @Charles yes, but it's hard to explain Commented Jul 28, 2011 at 7:32
  • Well then it's hard to answer... Commented Jul 28, 2011 at 7:54
  • Why do you expect "billions of lines"? A trivial source to source compiler is almost always simpler than an interpreter. Commented Jul 28, 2011 at 10:49

5 Answers 5

5

If you want to translate your script to Java, it's not an interpreter, but a compiler. If you are thinking about just executing the script during reading, it is interpreter.

However, you should look at JavaCC or Antlr. They are both suitable even for compiling or interpreter tasks. You have to specify the language's syntax rules and you have to write some additional logic in Java, implementing semantics of your script language. If you want to make an interpreter, the Java code you write, will generate further Java (or any) code. If you want a compiler, the Java code you write will directly execute the script.

One more concept to good to know about is Abstract Syntax Tree.

Here is a comprehensive list about more lexer and parser generators.

Sign up to request clarification or add additional context in comments.

6 Comments

So converting e.g. Groovy to Java is a compiler? Even though you must then compile Java to byecode?
Yes, it is. From Wikipedia: 'A compiler is a computer program that transforms source code written in a programming language into another computer language.' '...an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language.'
@Charles Well, the question is a bit academic. If you "convert" from one "high level" language to the other, it can also be called "source to source translation". That means it is a "kind of compiler" but usually not as complex as a true compiler. E.g. if you translate from C to Java, you can "assume" that the C source compiles in a C compiler. Based on that asumption you could ommit various semantic checks (e.g. type checks) that a true compiler would need to do.
@Xorty: I doubt that! Compiler building technology made huge enhancements the last decades! You should really check out ANTLR!
@Xorty, there are no languages that can't be represented as trees (with symbolic back references if you need cycles).
|
3

It sounds like an interesting task :-) Could you describe the scripting language a bit?

I would look at the package javax.script, possibly there is a similar scripting language (I know about Scala used as a scripting language). Also, I would look at javax.tools.JavaCompiler. I'm building a Java source generator right now (to create and compile a class proxy at runtime). Generating Java source code is a lot easier than generating bytecode, that is for sure.

As for parsing, I would first create a good BNF for your language. There is a tool to generate HTML railroad diagrams out of that. You will make mistakes when writing the BNF, but you will find them if you look at the railroad diagrams. And it will ensure you don't make something that can't be parsed.

I know most people will suggest to use ANTLR or JavaCC, but I would write your own recursive-descend parser, because I think it's easier and more flexible (I have done both a few times and know what I talk about). One example is the Jackrabbit SQL-2 parser.

2 Comments

Hi, it's procedural scripting language - basic language constructs (loops, structures, conditions) with lot of functions and procedures to call. No methods, no classes ... it's procedural, not OOP
@Xorty, if there is no reflection in that language, no eval--like functionality, no macros of any kind, than your task is rather trivial. If you've got a working interpreter already, chances are you can easily modify it to emit Java code instead of executing anything. Your biggest pain will be moving the runtime (libraries, FFI, etc.) into Java, not the language itself.
2

You can try javacc parser.

Comments

0

I recommend you to use antlr java library that is used for Language Recognition. It's the same library used with most of JVM languages. I have not used it personnaly but I know that Groovy was built using this library.

Comments

0

I'd recommend you to get a book on wrting compilers/interpreters in java. Thre are quite some ie: Writing Compilers and Interpreters

It's better to see the big picture first before starting off with lexer/parser etc

Or if you want to jump in directly try antlr

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.