Showing posts with label databases. Show all posts
Showing posts with label databases. Show all posts

Sunday, 11 September 2011

Minimal Sqlite library size

The Sqlite website states that the library size should be around 300KB, and that by omitting optional features it was possible to get the file size down to 180KB.

SQLite is a compact library. With all features enabled, the library size can be less than 300KiB, depending on compiler optimization settings. (Some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 180KiB.
There are also disclaimers that omitting optional features is unsupported and not guaranteed to work.
Important Note: The SQLITE_OMIT_* compile-time options are unsupported.
And there are three out of the many which do not work. When they are specified, compilation errors due to other external code depending on parts of them.
  • SQLITE_OMIT_COMPLETE
  • SQLITE_OMIT_DISKIO
  • SQLITE_OMIT_GET_TABLE
Which leaves the following options that do work:
nmake "OPTS=-DSQLITE_OMIT_ALTERTABLE -DSQLITE_OMIT_ANALYZE -DSQLITE_OMIT_ATTACH -DSQLITE_OMIT_AUTHORIZATION -DSQLITE_OMIT_AUTOINCREMENT -DSQLITE_OMIT_AUTOINIT -DSQLITE_OMIT_AUTOMATIC_INDEX -DSQLITE_OMIT_AUTORESET -DSQLITE_OMIT_AUTOVACUUM -DSQLITE_OMIT_CAST -DSQLITE_OMIT_BETWEEN_OPTIMIZATION -DSQLITE_OMIT_BLOB_LITERAL -DSQLITE_OMIT_BUILTIN_TEST -DSQLITE_OMIT_BTREE_COUNT -DSQLITE_OMIT_CHECK -DSQLITE_OMIT_COMPILEOPTION_DIAGS -DSQLITE_OMIT_COMPOUND_SELECT -DSQLITE_OMIT_DATETIME_FUNCS -DSQLITE_OMIT_DECLTYPE -DSQLITE_OMIT_EXPLAIN -DSQLITE_OMIT_FLAG_PRAGMAS -DSQLITE_OMIT_FLOATING_POINT -DSQLITE_OMIT_FOREIGN_KEY -DSQLITE_OMIT_INCRBLOB -DSQLITE_OMIT_INTEGRITY_CHECK -DSQLITE_OMIT_LIKE_OPTIMIZATION -DSQLITE_OMIT_LOAD_EXTENSION -DSQLITE_OMIT_LOCALTIME -DSQLITE_OMIT_LOOKASIDE -DSQLITE_OMIT_MEMORYDB -DSQLITE_OMIT_OR_OPTIMIZATION -DSQLITE_OMIT_PAGER_PRAGMAS -DSQLITE_OMIT_PRAGMA -DSQLITE_OMIT_PROGRESS_CALLBACK -DSQLITE_OMIT_QUICKBALANCE -DSQLITE_OMIT_REINDEX -DSQLITE_OMIT_SCHEMA_PRAGMAS -DSQLITE_OMIT_SCHEMA_VERSION_PRAGMAS -DSQLITE_OMIT_SHARED_CACHE -DSQLITE_OMIT_SUBQUERY -DSQLITE_OMIT_TCL_VARIABLE -DSQLITE_OMIT_TEMPDB -DSQLITE_OMIT_TRACE -DSQLITE_OMIT_TRIGGER -DSQLITE_OMIT_TRUNCATE_OPTIMIZATION -DSQLITE_OMIT_VACUUM -DSQLITE_OMIT_UTF16 -DSQLITE_OMIT_VIEW -DSQLITE_OMIT_VIRTUALTABLE -DSQLITE_OMIT_WAL -DSQLITE_OMIT_XFER_OPT" -f makefile.msc all dll
Additionally there are some other features that can be disabled by modifying the makefile before compilation.
MAKEFILE_NAME = "Makefile.msc"
s = open(MAKEFILE_NAME, "r").read()
s = s.replace("OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_FTS3=1",
    "# OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_FTS3=1")
s = s.replace("OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_RTREE=1",
    "# OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_RTREE=1")
s = s.replace("OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_COLUMN_METADATA=1",
    "# OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_COLUMN_METADATA=1")
s = s.replace("-O2", "-Os")
open(MAKEFILE_NAME, "w").write(s)
And gives the minimal file sizes of:
381,162 libsqlite3.lib
373,588 sqlite3.lo
382,464 sqlite3.exe
327,680 sqlite3.dll
This is not quite as simple (or perhaps achievable) as advertised. My current assumption is that the Sqlite website is four or five years out of date in describing what is possible.

Friday, 3 June 2011

Wanted: Memory conservative key-value store

I would really like to find a key value store that is memory conservative. What we currently have is like a souped up version of the dumbdbm standard library module, but it has a cache budget and as it loads in new values flushes older ones to make room. However, as the amount of data managed increases so does the amount of key metadata indicating whereabouts the values lie on disk. So now the next step is to either add some form of caching for key metadata, or find a suitable free open source solution.

Does anyone know of a suitable one that is not constrained by the GPL? It doesn't have to be Python, but Python bindings are a bonus.

Considering Sqlite

When thinking of low memory database solutions, Sqlite is one that comes to mind, and even better it comes as part of the Python distribution these days. And even betterer, there's a custom port for my uncommon platform of choice. And even.. bettererer it has an IO abstraction layer that allows it to work with custom IO solutions with minimal additional work. Additionally, reading the spiel makes it sound appealing memory-wise:

SQLite is a compact library. With all features enabled, the library size can be less than 300KiB, depending on compiler optimization settings. (Some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 180KiB. SQLite can also be made to run in minimal stack space (4KiB) and very little heap (100KiB), making SQLite a popular database engine choice on memory constrained gadgets such as cellphones, PDAs, and MP3 players. There is a tradeoff between memory usage and speed. SQLite generally runs faster the more memory you give it. Nevertheless, performance is usually quite good even in low-memory environments.
But you know what? I am as yet unable to get it down to 180KiB no matter how many features I compile out of it using the handy SQLITE_OMIT... options. And not all options can be omitted if I want to use pysqlite, as it does not suit the maintainer to support them.

Here's a clipped table of the code sizes for various cross-compilations:

Overall
libsqlite3.a
libpysqlite3.a
Description
1
0
0
Without sqlite
2
487536
425060
49860
Sqlite with optimise for size
3
365212
308730
46740
Sqlite with optimise for size + code omissions
4
536608
472290
50560
Sqlite with full optimise
5
402492
344440
47430
Sqlite with full optimise + code omissions

In the Windows Python 2.7 installation _sqlite3.pyd is 48KB and sqlite3.dll is 417 KB. So the sizes above, are still comparatively above that expecting both to be done with no omissions and full optimisation. But more or less close enough.

Considering home grown

Any third party solution would need to be adapted to deal with the custom IO needs, unless it was written in pure Python. At this point, the simplest solution is just to extend what I already have.

Edit: Just a note, the key desired feature is memory management. It should be possible to put hard constraints on the amount of memory it uses, both for the cached records read from disk, and for the lookup information that maps keys to location of records on disk. Most key value stores I have looked at either claim to keep all keys in memory as a feature, or just keep them all in memory because it is the simple thing to do.

Wednesday, 28 January 2009

Databases: Information System Base Language (ISBL)

Link: Thread on reddit.com

A year ago, someone named weavejester on reddit.com posted an interesting reference to a query language called ISBL.

Some excerpts from the post to illustrate why it is interesting:

Compare this SQL query:
SELECT DISTINCT username, comment_body FROM stories
INNER JOIN comments ON stories.comment_id = comments.comment_id
INNER JOIN users ON comments.user_id = users.user_id
WHERE stories.story_id = 10
And compare it to the equivalent ISBL query:
stories * comments * users % (username, comment_body) : story_id = 10
..., ISBL manages to support the whole range of relational database theory with only 6 basic table operations:
  • union +
  • difference -
  • natural join *
  • intersection .
  • projection %
  • restriction :
Links of interest:
HOPL: The History of Programming Languages entry.
Paper: Optimization of Single Expressions in a Relational Database System.
Paper: The Peterlee Relational Test Vehicle - A system overview.
Powerpoint presentation: CS319 Theory of Databases / Relational Database Models.