Using GraalVM to Implement Java Calling Python Scripts

Background Description

In many scenarios, we have the ability to call scripts from Java, commonly including groovy scripts, Python scripts, and JavaScript. In the previous article, we listed several solutions for Java to call Python scripts, and ultimately chose the JEP approach. However, as business improves, JEP can only execute scripts once due to GIL locking issues. In high concurrency situations, due to resource competition, JEP becomes very slow, so new solutions are needed to solve this problem.

Introduction to GraalVM

By consulting materials, we have found a feasible solution. First, let's understand what GraalVM is.

GraalVM is a new full stack virtual machine based on Java Virtual Machine (JVM), developed and maintained by Oracle GraalVM not only supports Java language, but also multiple programming languages such as JavaScript, Python, Ruby, R, etc.

The main features of GraalVM include:

  • High performance: GraalVM is based on JIT (Just in time) compilation technology, which dynamically generates local code at runtime, significantly improving program performance
  • Multi language support: GraalVM supports multiple programming languages and can interoperate between different languages
  • Low memory usage: GraalVM uses a technology called GraalVM Native Image, which can compile applications into local executable files, thereby reducing memory usage and startup time
  • Scalability: GraalVM provides multiple plugins and extension points, making it easy to extend the functionality and performance of virtual machines
  • GraalVM also provides various tools and libraries, such as GraalVM Compiler, GraalVM Truffle, GraalVM Polygon, etc., which can help developers better use and optimize GraalVM

Implementing Python scripts using GraalVM

GraalPython is a Python interpreter based on the GraalVM platform, developed and maintained by Oracle and the community. It supports the Python 3.7 language specification and is compatible with CPython, allowing it to run most Python code.

The main features of GraalPython include:

  • High performance: GraalPython utilizes the JIT compiler technology of the GraalVM platform, which can compile Python code into local code and improve program execution efficiency
  • Multilingual Interoperability: GraalPython supports the Polyglot API of the GraalVM platform and can interoperate with other programming languages
  • Execution security: GraalPython supports sandbox mechanism, which can restrict access to Python code and ensure execution security
  • Scalability: GraalPython provides multiple extension points and plugin mechanisms to easily extend its functionality and performance

To use GraalPython, we first need to set up the GraalPython environment.
Install GraalVM: Download the GraalVM installation package and extract it, then add it to the environment variable. You can refer to the official documentation: https://www.graalvm.org/docs/getting-started/linux/.

Install GraalPython: Open the terminal and execute the following command to install GraalPython:

gu install python

This command will automatically install the latest version of GraalPython

python --version

If the version information of Python is output, it indicates that GraalPython has been successfully installed.

Using GraalPython: Execute the following command in the terminal to start the GraalPython interpreter:

graalpython

Then you can run Python code in the GraalPython environment.

After installing GraalPython, we need to build a venv environment. venv is a standard library included in Python 3, which can help users create virtual environments, isolating dependencies and library versions between different projects, thereby avoiding library conflicts and version incompatibilities between different projects.

Creating a virtual environment using venv is very simple, and you can follow the following steps:

Open the terminal, enter the root directory of the project, and execute the following command to create a virtual environment:

graalpy -m venv /home/appuser/venv

The above command will create a virtual environment named myenv in the current directory.

Activate virtual environment: Execute the following command to activate the virtual environment:

source /home/appuser/venv/bin/activate

After executing the above command, the prompt on the terminal will change to (myenv), indicating that the virtual environment has been activated

Install dependencies in virtual environment: Execute the following command to install the required dependencies:

pip install package_name

The dependency packages installed in the virtual environment are only valid for the current project and will not affect other projects in the system.

Exit Virtual Environment: Execute the following command to exit the virtual environment:

deactivate

After executing the above command, the terminal prompt will return to its original state.

It should be noted that every time you enter the development environment of a project, you need to activate the virtual environment, otherwise you cannot use the dependencies and libraries in the virtual environment.

Once the environment is ready, we can use the Java environment to call Python scripts.

  1. First, import the SDK package
        <dependency>
      <groupId>org.graalvm.sdk</groupId>
      <artifactId>graal-sdk</artifactId>
      <version>22.3.1</version>
  </dependency>
  1. Then write the code
    private static final String PYTHON = "python";
private static final String PYTHON_PYTHON_PATH = "python.PythonPath";
private static final String PYTHON_EXECUTABLE = "python.Executable";
private static final String PYTHON_FORCE_IMPORT_SITE = "python.ForceImportSite";

private static Context createContext(String modulePath){
  Engine engine = Engine.create();
  Context context =  Context.newBuilder(PYTHON).allowAllAccess(true).engine(engine)
      .option(PYTHON_FORCE_IMPORT_SITE, "true")
      .option(PYTHON_PYTHON_PATH, modulePath)
      .option(PYTHON_EXECUTABLE, customizeConfig.getPythonExecutable())
      .build();
  return context;
}
public static void main(String[] args) {
  Context context = createContext();
  Value bindings = context.getBindings(PYTHON);
  bindings.putMember("a", 1);
  bindings.putMember("b", 2);
  int i = context.eval(PYTHON, "a+b").asInt();
  System.out.println(i);
}

Explain the key parameters:

  • Python Python Path imports the address of the external py file directory, such as a/a.py
  • Python The address of the Executable isolation environment, as mentioned in/home/appuser/venv above
  • Python ForceImportSite forces the activation of the site, which mainly requires the use of third-party libraries, such as requests. In this configuration parameter, the sitepackge directory needs to be opened. Of course, we can also enable it by running "import site" in the script

So far, we have realized the call of python script in Java through grailpy. In the test process, when we are not applicable to third-party libraries, the performance is objective. But when we use ForceImportSite, the performance is significantly reduced. Of course, the overall performance is much better than JEP.

Pitfall point

During the prediction process, it was found that creating a context context is very time-consuming, taking more than 4 seconds to execute each time. Later, it was not supported due to the CPU kernel. I used Mac m1 as the environment, and downloaded GraalVM (amd64), while those above m1 needed to download GraalVM (aarch64) version. After the change, the speed was instantly increased to the millisecond level

Python
Keras_Preprocessing Markdown Pillow PyrYAML Werkzeug Absl_Py
astron
atomicwrites
attrs
castle
certificate
charge
cppy
cycle
datautil
gas
h5py
hyphysis
idna
joblib
kiwisolver
lightfm
matplotlib
mock
more_Itertools
numpy
packaging
pandas
pkgconfig
plugin
protobuf
py
pybind11
pyparsing
pytest
pytest_Parallel
Python_Updateutil
pythran
pytz
requests
scikit_Learn scipy setuptools_Scm
six
sortedcontainers
threadpoolctl
tox
urllib3
wcwidth
wheel
zip.

summary .

Whether to use GraalVM or not should be evaluated based on your specific usage scenario. If it is just a simple script call, it is very suitable and the performance is objective. However, there are a large number of third-party and custom libraries that can affect the performance, and colleagues need to consider compatibility issues.