|
| 1 | +<!-- |
| 2 | + ~ Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + ~ or more contributor license agreements. See the NOTICE file |
| 4 | + ~ distributed with this work for additional information |
| 5 | + ~ regarding copyright ownership. The ASF licenses this file |
| 6 | + ~ to you under the Apache License, Version 2.0 (the |
| 7 | + ~ "License"); you may not use this file except in compliance |
| 8 | + ~ with the License. You may obtain a copy of the License at |
| 9 | + ~ |
| 10 | + ~ http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | + ~ |
| 12 | + ~ Unless required by applicable law or agreed to in writing, |
| 13 | + ~ software distributed under the License is distributed on an |
| 14 | + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + ~ KIND, either express or implied. See the License for the |
| 16 | + ~ specific language governing permissions and limitations |
| 17 | + ~ under the License. |
| 18 | + --> |
| 19 | + |
| 20 | + |
| 21 | +> NOTE: For those deploying this database, Postgres does not by default use |
| 22 | +> UTF-8, however it is [required for the jsonb][pg-jsonb] format used in |
| 23 | +> some columns to always work. This [stackoverflow post][so-utf8] describes |
| 24 | +> how to do it for Amazon RDS. This [section of the docs][pg-charset] |
| 25 | +> states how to do it in general, i.e.: `initdb -E UTF8`. |
| 26 | +
|
| 27 | +# Benchmark database |
| 28 | + |
| 29 | +This directory contains files related to the benchmark database. |
| 30 | + |
| 31 | +- 'ddl/\*.sql' contains the database definition. |
| 32 | +- 'examples/' contain code to test the database and demonstrate its use. |
| 33 | +- 'Dockerfile' and 'docker-compose.yml' are for developing benchmarks |
| 34 | + against a testing database. |
| 35 | +- An auto-generated summary of views in the [Data model][./data_model.rst]. |
| 36 | + |
| 37 | +## Setup |
| 38 | + |
| 39 | +To create a 'machine.json' file that will uniquely identify a computer for |
| 40 | +benchmark submission, run the provided shell script and fill in the prompts |
| 41 | +to identify the GPU. |
| 42 | + |
| 43 | +> NOTE: this does not work on VMs or Windows. |
| 44 | +
|
| 45 | +```shell |
| 46 | +./make_machine_json.sh |
| 47 | +``` |
| 48 | + |
| 49 | +Submit the machine details via http using the command |
| 50 | + |
| 51 | +> NOTE: This will only work if we have selected graphql as a client |
| 52 | +> and have it running in production or if during development |
| 53 | +> you have run `docker-compose up` to create and run both a |
| 54 | +> database Docker container and graphql client Docker container. |
| 55 | +
|
| 56 | +```shell |
| 57 | +./graphql_submit.sh machine machine.json localhost:5000/graphql |
| 58 | +``` |
| 59 | + |
| 60 | +or submit after starting up the psql client from this directory, using |
| 61 | + |
| 62 | +``` |
| 63 | +\set content `cat machine.json` |
| 64 | +SELECT ingest_machine_view(:'content'::jsonb); |
| 65 | +``` |
| 66 | + |
| 67 | +> NOTE: If you don't have a "machine.json" file generated, |
| 68 | +> use the example file "examples/machine.json" instead. |
| 69 | +
|
| 70 | +## Local testing |
| 71 | + |
| 72 | +There is a file named "[.env][.env]" in this directory that is used by |
| 73 | +`docker-compose` to set up the postgres user and password for the |
| 74 | +local containers. Currentlty the name and password are both |
| 75 | +`benchmark`. This will be the password for the psql client as well. |
| 76 | + |
| 77 | +The Postgres Alpine image runs any added '\*.sql' and '\*.sh' scripts placed |
| 78 | +in '/docker-entrypoint-initdb.d/' during its startup script, so the local |
| 79 | +database will be set up automatically once the container is running. |
| 80 | + |
| 81 | +To start the containers, be sure to have [Docker installed][docker], |
| 82 | +and then run the following from this directory (arrow/dev/benchmarking). |
| 83 | + |
| 84 | + |
| 85 | +``` |
| 86 | +docker-compose up |
| 87 | +``` |
| 88 | + |
| 89 | +This will start a process that will show logs from both the running |
| 90 | +Postgres container and the running GraphQL container. |
| 91 | +To stop the running containers gracefully, background the process |
| 92 | +and run |
| 93 | + |
| 94 | +``` |
| 95 | +docker-compose down |
| 96 | +fg # To re-foreground the backgrounded process while it exits |
| 97 | +``` |
| 98 | + |
| 99 | +You will still have the container images "benchmarking_pg", |
| 100 | +"graphile/postgraphile", and "postgres:11-alpine" on your |
| 101 | +computer. You should keep them if you want to run this again. |
| 102 | +If you don't, then remove them with the command: |
| 103 | + |
| 104 | +``` |
| 105 | +docker rmi benchmarking_pg postgres:11-alpine graphile/postgraphile |
| 106 | +``` |
| 107 | + |
| 108 | +### Postgres client |
| 109 | + |
| 110 | +The `psql` shell client is bundled with the PostgreSQL core distribution |
| 111 | +available from the [Postgres download page][postgres-downloads]. |
| 112 | +Using the `PG_USER` defined in the `.env` file (currently "benchmark"), |
| 113 | +the command to connect to the container is: |
| 114 | +```shell |
| 115 | +psql -h localhost -p 5432 -U benchmark |
| 116 | +``` |
| 117 | +There is an example script in [examples/example.sql](examples/example.sql) that |
| 118 | +runs some queries against the database. To run it in the psql client, type |
| 119 | +the following in the psql command-line interface: |
| 120 | + |
| 121 | +``` |
| 122 | +\i examples/example.sql |
| 123 | +``` |
| 124 | + |
| 125 | +#### Bulk ingestion using CSV |
| 126 | + |
| 127 | +An example CSV file for bulk ingestion is in |
| 128 | +[examples/benchmark_run_example.csv](examples/benchmark_run_example.csv). |
| 129 | +The columns are listed in the same order as they are defined, to avoid having |
| 130 | +to explicitly name every column in ingestion. The "id" column is left empty |
| 131 | +and will be automatically assigned on insert. |
| 132 | + |
| 133 | +To ingest the example CSV file from the command line, |
| 134 | +use the command below: |
| 135 | + |
| 136 | +```shell |
| 137 | +CSV='examples/benchmark_run_example.csv' && \ |
| 138 | +psql -U benchmark -h localhost -p 5432 \ |
| 139 | + -c "\copy benchmark_run_view FROM '${CSV}' WITH (FORMAT csv, HEADER);" |
| 140 | +``` |
| 141 | + |
| 142 | +#### Bulk ingestion using JSON |
| 143 | + |
| 144 | +To ingest the example JSON file using the psql client, use the command below. |
| 145 | + |
| 146 | +``` |
| 147 | +\set content `cat examples/benchmark_example.json` |
| 148 | +SELECT ingest_benchmark_view(:'content'::jsonb); |
| 149 | +``` |
| 150 | + |
| 151 | +### HTTP client |
| 152 | + |
| 153 | +This section requires an actual HTTP client to be up, either |
| 154 | +for the production database or via the testing setup. |
| 155 | +(See the [local testing section](#local-testing) for how to set it up). |
| 156 | + |
| 157 | +The 'graphile/postgraphile' container provides an HTTP interface |
| 158 | +to the database via two url routes: |
| 159 | + |
| 160 | +- A GraphiQL page ([localhost:5000/graphiql][graphiql]) |
| 161 | + to aid visual exploration of the data model. |
| 162 | + (The `--watch` flag on the command line. Not recommended for production.) |
| 163 | +- An endpoint that receives POST requests only (localhost:5000/graphql). |
| 164 | + |
| 165 | +#### Ingestion |
| 166 | + |
| 167 | +The script [graphql_submit.sh](./graphql_submit.sh) simplifies submission |
| 168 | +to the database via curl. Examples: |
| 169 | + |
| 170 | +```shell |
| 171 | +./graphql_submit.sh benchmarks examples/benchmark_example.json |
| 172 | +./graphql_submit.sh runs examples/benchmark_run_example.json |
| 173 | +``` |
| 174 | + |
| 175 | +#### Querying |
| 176 | + |
| 177 | +The output of the query is a JSON object that is hard to read on the command line. |
| 178 | +Here is an example query in the shell: |
| 179 | +```shell |
| 180 | +curl -X POST \ |
| 181 | + -H "Content-Type: application/json" \ |
| 182 | + --data '{"query": "{projectDetails{ projectName }}"}' \ |
| 183 | + localhost:5000/graphql |
| 184 | +``` |
| 185 | + |
| 186 | +which (if you have previously run the "examples.sql" command) yields |
| 187 | + |
| 188 | +``` |
| 189 | +{"data":{"projectDetails":{"projectName":"Apache Arrow"}}} |
| 190 | +``` |
| 191 | + |
| 192 | +Here is an example query using Python: |
| 193 | +```python |
| 194 | +import json |
| 195 | +import requests |
| 196 | + |
| 197 | +uri = "http://localhost:5000/graphql" |
| 198 | +query = json.load(open("examples/graphql_query_environment_view.json")) |
| 199 | +response = requests.post(uri, json=query) |
| 200 | +message = "{benchmarkLanguage}: {languageImplementationVersion}, {dependencies}" |
| 201 | + |
| 202 | +for row in response.json()['data']['allEnvironmentViews']['edges']: |
| 203 | + print(message.format(**row['node'])) |
| 204 | + |
| 205 | +# result: |
| 206 | +# |
| 207 | +# Python: CPython 2.7, {"six":"","numpy":"1.14","other_lib":"1.0"} |
| 208 | +# Python: CPython 2.7, {"six":"","numpy":"1.15","other_lib":"1.0"} |
| 209 | +# Python: CPython 3.6, {"boost":"1.42","numpy":"1.15"} |
| 210 | +``` |
| 211 | + |
| 212 | +## Deployment |
| 213 | + |
| 214 | +(work in progress). |
| 215 | + |
| 216 | +> NOTE: For those deploying this database, Postgres does not by default use |
| 217 | +> UTF-8, however it is [required for the jsonb][pg-jsonb] format used in |
| 218 | +> some columns to always work. This [stackoverflow post][so-utf8] describes |
| 219 | +> how to do it for Amazon RDS. This [section of the docs][pg-charset] |
| 220 | +> states how to do it in general, i.e.: `initdb -E UTF8`. |
| 221 | +
|
| 222 | + |
| 223 | +## Quick reference |
| 224 | + |
| 225 | +- String variables `'have single quotes'` |
| 226 | +- Arrays `'{"have", "curly", "braces"}'::text[]` or `'{1, 2, 3}'::integer[]` |
| 227 | +- JSONb `'{"has":"this", "format":42}'::jsonb` |
| 228 | +- Elements inserted using JSON-formatted strings can use standard |
| 229 | + JSON-formatted arrays (`[1, 2, 3]`) and do not have to use the above |
| 230 | + string formats. |
| 231 | +- When comparing nullable values use `x IS NOT DISTINCT FROM y` rather than `x = y` |
| 232 | +- An auto-generated summary of the [Data model][./data_model.rst]. |
| 233 | + |
| 234 | +## Data model documentation |
| 235 | + |
| 236 | +To recreate the data model documentation, |
| 237 | +(1) install the [psql client][postgres-downloads] |
| 238 | +(sorry you need to download the whole thing), |
| 239 | +(2) start the docker container using `docker-compose up`, |
| 240 | +(3) and then run these scripts: |
| 241 | + |
| 242 | +``` |
| 243 | +./make_dotfile.sh |
| 244 | +./make_data_model_rst.sh |
| 245 | +``` |
| 246 | + |
| 247 | +[pg-jsonb]: https://www.postgresql.org/docs/11/datatype-json.html#id-1.5.7.22.3 |
| 248 | +[so-utf8]: https://stackoverflow.com/a/33557023 |
| 249 | +[pg-charset]: https://www.postgresql.org/docs/9.3/multibyte.html#AEN34424 |
| 250 | +[docker]: https://www.docker.com/get-started |
| 251 | +[citext-limitations]: https://www.postgresql.org/docs/11/citext.html#id-1.11.7.17.7 |
| 252 | +[postgres-downloads]: https://www.postgresql.org/download/ |
| 253 | +[graphiql]: http://localhost:5000/graphiql |
| 254 | +[postgraphile-lambda]: https://github.com/graphile/postgraphile-lambda-example |
| 255 | +[postgraphile-cli]: https://www.graphile.org/postgraphile/usage-cli/ |
0 commit comments