I'm starting a new project which requires a data warehouse, for which we'll be using postgres. (Shameless plug: swarm64 makes postgres a great DW option for datasets up to terabytes large)
I'm using apache airflow to orchestrate the workloads, but as I'm new to airflow, I'm not sure what the best practice is for the application's DB needs.
For some more context, I'm using airflow's docker-compose.yml, and I'm also an airflow newbie.
Noticing that the docker-compose already defines a postgres db:
...
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
...
I'm immediately wondering if it would be a better idea to add another postgres service or if to configure the existing one to have 2 users, 2 databases...
Eventually, I'll move this project to a cloud, and will probably use a AWS postgres RDS or similar.
My question then is:
- What is the best practice here?
- If there isn't one, what are the tradeoffs between the different approaches?