PostgreSQL implementation of the LDBC SNB BI benchmark.
The default configuration uses the ldbcsf1 database.
On a typical Ubuntu install, you might want to run:
sudo -u postgres psqlTo allow access from JDBC, you have to set a password. For example, to set the default password of user postgres to foo, issue the following command:
ALTER ROLE postgres PASSWORD 'foo';If you want to create a separate user usr with the password foo, use the following commands:
CREATE USER usr PASSWORD 'foo';
ALTER ROLE usr WITH login createdb superuser;Alternatively, you can allow any local users to access the database.
Edit the pg_hba.conf file and add the following line:
local all postgres trust
# local all postgres peer
The data set needs to be generated and preprocessed before loading it to the database. To generate it, use the CSVMergeForeign serializer classes of the Datagen project:
generator.scaleFactor:0.003
generator.mode:interactive
serializer.format:CsvMergeForeign
To run the load script, go the load-scripts directory, set the PG_ environment variables (optional) and issue the following command.
For special loading options, see below.
export PG_CSV_DIR=
export PG_DB_NAME=
export PG_USER=
export PG_PORT=
scripts/load.shThe load.sh script (re)generates PostgreSQL-specific CSV files for posts and comments, if any of the following holds:
- they don't exist
- the source CSV is newer than the generated one
- the user forces to do so by setting the environment variable
PG_FORCE_REGENERATE=yes(see below for special loading options.)
The load.sh has default options that loads the dataset in the generator's directory to the ldbcsf1 database with your current user. If these fit your needs, just run the script as scripts/load.sh.
If you get Permission denied errors, change the permissions of your home directory to 755 - but please make sure you understand its implications first:
chmod 755 ~To play around with the data, join PostgreSQL and switch to the database:
postgres=# \c ldbcsf1
You are now connected to database "ldbcsf1" as user "postgres".
ldbcsf1=# SELECT count(*) FROM person;
# ...In order to allow for easier experimenting, PostgreSQL converter (load.sh) accept a few special options using environment variables.
Most probably you won't need to touch them unless you are experimenting.
-
PG_FORCE_REGENERATE: you can force the loader to (re)generate PostgreSQL-specific CSV files for posts and comments. Possible values are:yes: (re)generate PostgreSQL-specific CSV files for posts and comments.no: (re)generate only PostgreSQL-specific CSV files if they don't exist of older than their source files. For performance reasons, this is the default.
-
PG_CREATE_MESSAGE_FILE: control creating a unifiedmessagedata file of posts and comments. Possible values:no: don't create message file, as we traditionally did. This is the default.create: create message file, with no guarantee on being sortedsort_by_date: create message file, sorted by creation date. Note: when you switch messages file to be sorted, you need to usePG_FORCE_REGENERATE=yes, orherwise regenerating depends whether the source files have changed.
-
PG_LOAD_TO_DB: controls whether we want to do or skip database loading phase. Possible values are:load: loads to the database, as one might expect. This is the default.skip: skip loading to the database. Use this to generate only CSV files.