Advance SQLite
Normal Form
Normalization is the process of removing data duplication, more clearly defining key
relationships,and generally moving towards a more idealized database form. It is possible for
different tables in the same database to be at different levels.
Most people recognize five normal forms simply referred to as the First Normal Form
through the Fifth Normal Form. These are often abbreviated 1NF through 5NF. There
are also a few named forms, such as the Boyce-Codd Normal Form (BCNF). Most of
these other forms are roughly equivalent to one of the numbered forms. For example,
BCNF is a slight extension to the Third Normal Form. Some folks also recognize higher
levels of normalization, such as a Sixth Normal Form and beyond, but these extreme
levels of normalization are well beyond the practical concerns of most database
designers.
Normalization
The normalization process is useful for two reasons. First, normalization specifies design
criteria that can act as a guide in the design process. If you have a set of tables that
are proving to be difficult to work with, that often points to a deeper design problem
or assumption. The normalization process provides a set of rules and conditions that
can help identify trouble spots, as well as provide possible solutions to reorganize the
data in a more consistent and clean fashion.
Denormalization
Normalizing a database and spreading the data out into different tables means that
queries usually involve joining several tables back together. This can occasionally lead
to performance concerns, especially for complex reports that require data from a large
number of tables. These concerns can sometimes lead to the process of denormalization,
where duplicate copies of the same data are intentionally introduced to reduce the
number of joins required for common queries. This is typically done on systems that
are primarily read-only, such as data-warehouse databases, and is often done by computing
temporary tables from properly normalized source data.
The First Normal Form
The First Normal Form, or 1NF, is the lowest level of normalization. It is primarily
concerned with making sure a table is in the proper format. There are three conditions
that must be met for a table to be in 1NF.
The first condition relates to ordering. To be in 1NF, the individual rows of a table
cannot have any meaningful or inherent order. Each row should be an isolated, standalone
record
The Second Normal Form
The Second Normal Form, or 2NF, deals with compound keys (multicolumn keys) and
how other columns relate to such keys. 2NF has only one condition: every column that
is not part of the primary key must be relevant to the primary key as a whole, and not
just a sub-part of the key.
The Third Normal Form
The Third Normal Form, or 3NF, extends the 2NF to eliminate transitive key dependencies.
A transitive dependency is when A depends on B, and B depends on C, and
therefore A depends on C. 3NF requires that each nonprimary key column has a direct
(nontransitive) dependency on the primary key
For example, consider an inventory database that is used to track laptops at a small
business. The laptop table will have a primary key that uniquely identifies each laptop,
such as an inventory control number. It is likely the table would have other columns
that include the make and model of the machine, the serial number, and perhaps a
purchase date. For our example, the laptop table will also include a responsible_
person_id column. When an employee is assigned a laptop, their employee ID number
is put in this column
Higher Normal Forms
We’re not going to get into the details of BCNF, or the Fourth or Fifth (or beyond)
Normal Forms, other than to mention that the Fourth and Fifth Normal Forms start
to deal with inter-table relationships and how different tables interact with each other.
Most database designers make a solid effort to get everything into 3NF and then stop
worrying about it. It turns out that if you get the hang of things and tend to turn out
table designs that are in 3NF, chances are pretty good that your tables will also meet
the conditions for 4NF and 5NF, if not higher. To a large extent, the higher Normal
Forms are formal ways of addressing some edge cases that are somewhat unusual,
especially in simpler designs.
Indexes
Indexes (or indices) are auxiliary data structures used by the database system to enforce
unique constraints, speed up sort operations, and provide faster access to specific records.
They are often created in hopes of making queries run faster by avoiding table
scans and providing a more direct lookup method.
How They Work
Each index is associated with a specific table. Indexes can be either single column or
multicolumn, but all the columns of a given index must belong to the same table. There
is no limit to the number of indexes a table can have, nor is there a limit on the number
of indexes a column can belong to. You cannot create an index on a view or on a virtual
table.
Internally, the rows of a normal table are stored in an indexed structure. SQLite uses
a B-Tree for this purpose, which is a specific type of multi-child, balanced tree. The
details are unimportant, other than understanding that as rows are inserted into the
tree, the rows are sorted, organized, and optimized, so that a row with a specific, known
ROWID can be retrieved relatively directly and quickly.
When you create an index, the database system creates another tree structure to hold
the index data. Rather than using the ROWID column as the sort key, the tree is sorted
and organized using the column or columns you’ve specified in the index definition.
The index entry consists of a copy of the values from each of the indexed columns, as
well as a copy of the corresponding ROWID value. This allows an indexed entry to be
found very quickly using the values from the indexed columns.
If we have a table like this:
CREATE TABLE tbl ( a, b, c, d );
And then create an index that looks like this:
CREATE INDEX idx_tbl_a_b ON tbl ( a, b );
The database generates an internal data structure that is conceptually similar to:
SELECT a, b, ROWID FROM tbl ORDER BY a, b;
If SQLite needs to quickly find all the rows where, for example, a = 45, it can use the sorted index
to quickly jump to that range of values and extract the relevant index entries. If it’s looking for
the value of b, it can simply extract that from the index and be done. If we need any other value
in the row, it needs to fetch the full row. This is done by looking up the ROWID. The last value of
any index entry is the ROWID of its corresponding table row. Once SQLite has a list of ROWID
values for all rows where a = 45, it can efficiently look up those rows in the original table and
retrieve the full row. If everything works correctly, the process of looking up a small set of index
entries, and then using those to look up a small set of table rows, will be much faster and more
efficient than doing a full table scan.
INTEGER PRIMARY KEYs
When you declare one or more columns to be a PRIMARY KEY, the database system
automatically creates a unique index over those columns. The fundamental purpose of
this index is to enforce the UNIQUE constraint that is implied with every PRIMARY KEY. It
also happens that many database operations typically involve the primary key, such as
natural joins, or conditional lookups in UPDATE or DELETE commands. Even if the index
wasn’t required to enforce the UNIQUE constraint, chances are good you would want an
index over those columns anyway.
Transferring Design Experience
If you have some experience designing application data structures or class hierarchies,
you may have noticed some similarities between designing runtime structures and
database tables. Many of the organization principles are the same and, thankfully, much
of the design knowledge and experience gained in the application development world
will transfer to the database world.
Tables Are Types
The most common misconception is to think of tables as instances of a compound data
structure. A table looks a whole lot like an array or a dynamic list, so it is easy to make
this mistake
Keys Are Backwards Pointers
Another stumbling block is the proper use of keys. Keys are very similar to pointers. A
primary key is used to identify a unique instance of a data structure. This is similar to
the address of a record. Anything that wants to reference that record needs to record
its address as a pointer or, in the cases of databases, as a foreign key. Foreign keys are
essentially database pointers.
Closing
As with application development, database design is part science and part art. It may
seem quite complex, with keys to set up, different relationships to define, Normal
Forms to follow, and indexes to create.
Thankfully, the basics usually fall into place fairly quickly. If you start to get into larger
or more complex designs, some reading on more formal methods of data modeling and
database design might be in order, but most developers can get pretty far by just leveraging
their knowledge and experience in designing application data structures. In the
end, you’re just defining data structures and hooking them together.

Advance Sqlite3

  • 1.
  • 2.
    Normal Form Normalization isthe process of removing data duplication, more clearly defining key relationships,and generally moving towards a more idealized database form. It is possible for different tables in the same database to be at different levels.
  • 3.
    Most people recognizefive normal forms simply referred to as the First Normal Form through the Fifth Normal Form. These are often abbreviated 1NF through 5NF. There are also a few named forms, such as the Boyce-Codd Normal Form (BCNF). Most of these other forms are roughly equivalent to one of the numbered forms. For example, BCNF is a slight extension to the Third Normal Form. Some folks also recognize higher levels of normalization, such as a Sixth Normal Form and beyond, but these extreme levels of normalization are well beyond the practical concerns of most database designers.
  • 4.
    Normalization The normalization processis useful for two reasons. First, normalization specifies design criteria that can act as a guide in the design process. If you have a set of tables that are proving to be difficult to work with, that often points to a deeper design problem or assumption. The normalization process provides a set of rules and conditions that can help identify trouble spots, as well as provide possible solutions to reorganize the data in a more consistent and clean fashion.
  • 5.
    Denormalization Normalizing a databaseand spreading the data out into different tables means that queries usually involve joining several tables back together. This can occasionally lead to performance concerns, especially for complex reports that require data from a large number of tables. These concerns can sometimes lead to the process of denormalization, where duplicate copies of the same data are intentionally introduced to reduce the number of joins required for common queries. This is typically done on systems that are primarily read-only, such as data-warehouse databases, and is often done by computing temporary tables from properly normalized source data.
  • 6.
    The First NormalForm The First Normal Form, or 1NF, is the lowest level of normalization. It is primarily concerned with making sure a table is in the proper format. There are three conditions that must be met for a table to be in 1NF. The first condition relates to ordering. To be in 1NF, the individual rows of a table cannot have any meaningful or inherent order. Each row should be an isolated, standalone record
  • 7.
    The Second NormalForm The Second Normal Form, or 2NF, deals with compound keys (multicolumn keys) and how other columns relate to such keys. 2NF has only one condition: every column that is not part of the primary key must be relevant to the primary key as a whole, and not just a sub-part of the key.
  • 8.
    The Third NormalForm The Third Normal Form, or 3NF, extends the 2NF to eliminate transitive key dependencies. A transitive dependency is when A depends on B, and B depends on C, and therefore A depends on C. 3NF requires that each nonprimary key column has a direct (nontransitive) dependency on the primary key
  • 9.
    For example, consideran inventory database that is used to track laptops at a small business. The laptop table will have a primary key that uniquely identifies each laptop, such as an inventory control number. It is likely the table would have other columns that include the make and model of the machine, the serial number, and perhaps a purchase date. For our example, the laptop table will also include a responsible_ person_id column. When an employee is assigned a laptop, their employee ID number is put in this column
  • 10.
    Higher Normal Forms We’renot going to get into the details of BCNF, or the Fourth or Fifth (or beyond) Normal Forms, other than to mention that the Fourth and Fifth Normal Forms start to deal with inter-table relationships and how different tables interact with each other. Most database designers make a solid effort to get everything into 3NF and then stop worrying about it. It turns out that if you get the hang of things and tend to turn out table designs that are in 3NF, chances are pretty good that your tables will also meet the conditions for 4NF and 5NF, if not higher. To a large extent, the higher Normal Forms are formal ways of addressing some edge cases that are somewhat unusual, especially in simpler designs.
  • 11.
    Indexes Indexes (or indices)are auxiliary data structures used by the database system to enforce unique constraints, speed up sort operations, and provide faster access to specific records. They are often created in hopes of making queries run faster by avoiding table scans and providing a more direct lookup method.
  • 12.
    How They Work Eachindex is associated with a specific table. Indexes can be either single column or multicolumn, but all the columns of a given index must belong to the same table. There is no limit to the number of indexes a table can have, nor is there a limit on the number of indexes a column can belong to. You cannot create an index on a view or on a virtual table.
  • 13.
    Internally, the rowsof a normal table are stored in an indexed structure. SQLite uses a B-Tree for this purpose, which is a specific type of multi-child, balanced tree. The details are unimportant, other than understanding that as rows are inserted into the tree, the rows are sorted, organized, and optimized, so that a row with a specific, known ROWID can be retrieved relatively directly and quickly.
  • 14.
    When you createan index, the database system creates another tree structure to hold the index data. Rather than using the ROWID column as the sort key, the tree is sorted and organized using the column or columns you’ve specified in the index definition. The index entry consists of a copy of the values from each of the indexed columns, as well as a copy of the corresponding ROWID value. This allows an indexed entry to be found very quickly using the values from the indexed columns. If we have a table like this: CREATE TABLE tbl ( a, b, c, d );
  • 15.
    And then createan index that looks like this: CREATE INDEX idx_tbl_a_b ON tbl ( a, b ); The database generates an internal data structure that is conceptually similar to: SELECT a, b, ROWID FROM tbl ORDER BY a, b; If SQLite needs to quickly find all the rows where, for example, a = 45, it can use the sorted index to quickly jump to that range of values and extract the relevant index entries. If it’s looking for the value of b, it can simply extract that from the index and be done. If we need any other value in the row, it needs to fetch the full row. This is done by looking up the ROWID. The last value of any index entry is the ROWID of its corresponding table row. Once SQLite has a list of ROWID values for all rows where a = 45, it can efficiently look up those rows in the original table and retrieve the full row. If everything works correctly, the process of looking up a small set of index entries, and then using those to look up a small set of table rows, will be much faster and more efficient than doing a full table scan.
  • 16.
    INTEGER PRIMARY KEYs Whenyou declare one or more columns to be a PRIMARY KEY, the database system automatically creates a unique index over those columns. The fundamental purpose of this index is to enforce the UNIQUE constraint that is implied with every PRIMARY KEY. It also happens that many database operations typically involve the primary key, such as natural joins, or conditional lookups in UPDATE or DELETE commands. Even if the index wasn’t required to enforce the UNIQUE constraint, chances are good you would want an index over those columns anyway.
  • 17.
    Transferring Design Experience Ifyou have some experience designing application data structures or class hierarchies, you may have noticed some similarities between designing runtime structures and database tables. Many of the organization principles are the same and, thankfully, much of the design knowledge and experience gained in the application development world will transfer to the database world.
  • 18.
    Tables Are Types Themost common misconception is to think of tables as instances of a compound data structure. A table looks a whole lot like an array or a dynamic list, so it is easy to make this mistake
  • 19.
    Keys Are BackwardsPointers Another stumbling block is the proper use of keys. Keys are very similar to pointers. A primary key is used to identify a unique instance of a data structure. This is similar to the address of a record. Anything that wants to reference that record needs to record its address as a pointer or, in the cases of databases, as a foreign key. Foreign keys are essentially database pointers.
  • 20.
    Closing As with applicationdevelopment, database design is part science and part art. It may seem quite complex, with keys to set up, different relationships to define, Normal Forms to follow, and indexes to create. Thankfully, the basics usually fall into place fairly quickly. If you start to get into larger or more complex designs, some reading on more formal methods of data modeling and database design might be in order, but most developers can get pretty far by just leveraging their knowledge and experience in designing application data structures. In the end, you’re just defining data structures and hooking them together.