| id | home |
|---|---|
| title | Bossphorus |
| sidebar_label | home |
bossphorus simplifies data-access patterns for data that do not fit into RAM. When you write a 100-gigabyte file, bossphorus automatically slices your dataset up to fit in bite-sized pieces.
When you request small pieces of your data for analysis, bossphorus intelligently serves only the parts you need, leaving the rest on disk.
You can either run bossphorus using Python on your host machine, or use the provided Dockerfile to run bossphorus in a Docker container.
docker build -t bossphorus .mkdir ./uploadsThis exposes a simplified wrapper to run bossphorus in a container.
source aliasbossphorus $(pwd)/uploadsYou can run bossphorus in demo-mode by omitting the path to your uploads directory. Data saved to bossphorus using this method will be destroyed when you end the bossphorus process! Use only when testing bossphorus out.
pipenv install
mkdir ./uploads
python3 ./run.pyYou can modify the top-level variables in bossphorus/config.py in order to change where bossphorus stores its data by default, and what size each file is by default.
A word of warning: While larger values of BLOCK_SIZE will reduce the amount of parallel threads in order to read a small file, it will also increase RAM usage per read. 2563 is probably a good default, unless you have a very good reason to change it.
That's a great question! bossphorus is certainly not the most performant, nor is it the most secure. And it's not versioned or distributed. If you're looking for a volumetric datastore, I would recommend looking below at the Alternatives section for some really well-engineered systems.
The primary advantage of bossphorus is that it uses an identical API to that of bossDB — and so if you anticipate your data growing from a few gigabytes now to a few terabytes later, you can get used to the bossDB ecosystem (intern, ingest, and many more tools) now, and then invest in real bossDB architecture later on with a seamless transition.
bossphorus borrows its indexing pattern from bossDB, a cloud-native database that can store way more data than bossphorus ever could. If your day-to-day routine includes multiple terabytes of volumetric data, bossDB may be for you.
| Project | Description |
|---|---|
| bossDB | Petabyte-scale, Cloud-Native Volumetric Database |
| DVID | Distributed, Versioned, Image-oriented Dataservice |