Skip to content

Participants with local on-disk storage, but without OS page cache flush #774

@puzpuzpuz

Description

@puzpuzpuz

According to the benchmark rules,

if it's a database with local on-disk storage, the first query should be run after dropping the page cache

The following local disk-based participants do not flush the OS page cache between query runs. This gives them an unfair advantage on repeated queries since data may be served from the OS cache rather than being read from disk.

The corresponding scripts should be fixed to put everyone in the same conditions.

For reference, the correct way to flush the page cache is:

sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

List

Note that the list may be incomplete.

  • chdb-dataframe | Reads parquet locally via Python chdb-dataframe: clear page cache between queries #779
  • clickhouse-datalake | Uses clickhouse local, no OS cache flush
  • clickhouse-datalake-partitioned | Uses clickhouse local, no OS cache flush
  • duckdb-dataframe | Reads parquet locally via Python
  • elasticsearch | Clears ES query cache only, not OS page cache
  • hydra | PostgreSQL-based, no cache flush
  • locustdb | Disk-based (RocksDB), no cache flush (benchmark broken)
  • mongodb | Local installation, no cache flush
  • pandas | Reads parquet locally via Python
  • polars | Reads parquet locally via Python
  • polars-dataframe | Reads parquet locally via Python
  • tembo-olap | PostgreSQL-based, no cache flush

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions