Skip to content

Latest commit

 

History

History
138 lines (99 loc) · 5.58 KB

File metadata and controls

138 lines (99 loc) · 5.58 KB

Environment Variables

The following environment variables can be used to affect the behavior of Arrow C++ at runtime. Many of these variables are inspected only once per process (for example, when the Arrow C++ DLL is loaded), so you cannot assume that changing their value later will have an effect.

.. envvar:: ARROW_DEBUG_MEMORY_POOL

   Enable rudimentary memory checks to guard against buffer overflows.
   The value of this environment variable selects the behavior when a
   buffer overflow is detected:

   - ``abort`` exits the processus with a non-zero return value;
   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
   - ``warn`` prints a warning on stderr and continues execution;

   If this variable is not set, or has empty an value, memory checks are disabled.

   .. note::
      While this functionality can be useful and has little overhead, it
      is not a replacement for more sophisticated memory checking utilities
      such as `Valgrind <https://valgrind.org/>`_ or
      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.

.. envvar:: ARROW_DEFAULT_MEMORY_POOL

   Override the backend to be used for the default
   :ref:`memory pool <cpp_memory_pool>`. Possible values are among ``jemalloc``,
   ``mimalloc`` and ``system``, depending on which backends were enabled when
   :ref:`building Arrow C++ <building-arrow-cpp>`.

.. envvar:: ARROW_IO_THREADS

   Override the default number of threads for the global IO thread pool.
   The value of this environment variable should be a positive integer.

.. envvar:: ARROW_LIBHDFS_DIR

   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
   Alternatively, one can set :envvar:`HADOOP_HOME`.

.. envvar:: ARROW_TRACING_BACKEND

   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
   execution traces.  Possible values are:

   - ``ostream``: emit textual log messages to stdout;
   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
     URL is "http://localhost:4318/v1/traces");
   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
   - ``arrow_otlp_stderr``: emit JSON traces to stderr.

   If this variable is not set, no traces are exported.

   This environment variable has no effect if Arrow C++ was not built with
   tracing enabled.

   .. seealso::

      `OpenTelemetry configuration for remote endpoints
      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__

.. envvar:: ARROW_USER_SIMD_LEVEL

   The SIMD optimization level to select.  By default, Arrow C++ detects
   the capabilities of the current CPU at runtime and chooses the best
   execution paths based on that information.  One can override the detection
   by setting this environment variable to a well-defined value.
   Supported values are:

   - ``NONE`` disables any runtime-selected SIMD optimization;
   - ``SSE4_2`` enables any SSE2-based optimizations until SSE4.2 (included);
   - ``AVX`` enables any AVX-based optimizations and earlier;
   - ``AVX2`` enables any AVX2-based optimizations and earlier;
   - ``AVX512`` enables any AVX512-based optimizations and earlier.

   This environment variable only has an effect on x86 platforms.  Other
   platforms currently do not implement any form of runtime dispatch.

   .. note::
      In addition to runtime dispatch, the compile-time SIMD level can
      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
      changed at runtime (for example, if you compile Arrow C++ with AVX512
      enabled, the resulting binary will only run on AVX512-enabled CPUs).

.. envvar:: GANDIVA_CACHE_SIZE

   The number of entries to keep in the Gandiva JIT compilation cache.
   The cache is in-memory and does not persist accross processes.

.. envvar:: HADOOP_HOME

   The path to the Hadoop installation.

.. envvar:: JAVA_HOME

   Set the path to the Java Runtime Environment installation. This may be
   required for HDFS support if Java is installed in a non-standard location.

.. envvar:: OMP_NUM_THREADS

   The number of worker threads in the global (process-wide) CPU thread pool.
   If this environment variable is not defined, the available hardware
   concurrency is determined using a platform-specific routine.

.. envvar:: OMP_THREAD_LIMIT

   An upper bound for the number of worker threads in the global
   (process-wide) CPU thread pool.

   For example, if the current machine has 4 hardware threads and
   ``OMP_THREAD_LIMIT`` is 8, the global CPU thread pool will have 4 worker
   threads.  But if ``OMP_THREAD_LIMIT`` is 2, the global CPU thread pool
   will have 2 worker threads.