summaryrefslogtreecommitdiff
path: root/src/content/blog/2021/04/29/relational-review.adoc
diff options
context:
space:
mode:
Diffstat (limited to 'src/content/blog/2021/04/29/relational-review.adoc')
-rw-r--r--src/content/blog/2021/04/29/relational-review.adoc144
1 files changed, 0 insertions, 144 deletions
diff --git a/src/content/blog/2021/04/29/relational-review.adoc b/src/content/blog/2021/04/29/relational-review.adoc
deleted file mode 100644
index 4b53737..0000000
--- a/src/content/blog/2021/04/29/relational-review.adoc
+++ /dev/null
@@ -1,144 +0,0 @@
-= A Relational Model of Data for Large Shared Data Banks - article-review
-
-:empty:
-:reviewed-article: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
-
-This is a review of the article "{reviewed-article}[A Relational Model of Data
-for Large Shared Data Banks]", by E. F. Codd.
-
-== Data Independence
-
-Codd brings the idea of _data independence_ as a better approach to use on
-databases. This is contrast with the existing approaches, namely hierarquical
-(tree-based) and network-based.
-
-His main argument is that queries in applications shouldn't depende and be
-coupled with how the data is represented internally by the database system.
-This key idea is very powerful, and something that we strive for in many other
-places: decoupling the interface from the implementation.
-
-If the database system has this separation, it can kep the querying interface
-stable, while having the freedom to change its internal representation at will,
-for better performance, less storage, etc.
-
-This is true for most modern database systems. They can change from B-Trees
-with leafs containing pointers to data, to B-Trees with leafs containing the raw
-data , to hash tables. All that without changing the query interface, only its
-performance.
-
-Codd mentions that, from an information representation standpoint, any index is
-a duplication, but useful for perfomance.
-
-This data independence also impacts ordering (a _relation_ doesn't rely on the
-insertion order).
-
-== Duplicates
-
-His definition of relational data is a bit differente from most modern database
-systems, namely *no duplicate rows*.
-
-I couldn't find a reason behind this restriction, though. For practical
-purposes, I find it useful to have it.
-
-== Relational Data
-
-:edn: https://github.com/edn-format/edn
-
-In the article, Codd doesn't try to define a language, and today's most popular
-one is SQL.
-
-However, there is no restriction that says that "SQL database" and "relational
-database" are synonyms. One could have a relational database without using SQL
-at all, and it would still be a relational one.
-
-The main one that I have in mind, and the reason that led me to reading this
-paper in the first place, is Datomic.
-
-Is uses an {edn}[edn]-based representation for datalog
-queries{empty}footnote:edn-queries[
- You can think of it as JSON, but with a Clojure taste.
-], and a particular schema used to represent data.
-
-Even though it looks very weird when coming from SQL, I'd argue that it ticks
-all the boxes (except for "no duplicates") that defines a relational database,
-since building relations and applying operations on them is possible.
-
-Compare and contrast a contrived example of possible representations of SQL and
-datalog of the same data:
-
-[source,sql]
-----
--- create schema
-CREATE TABLE people (
- id UUID PRIMARY KEY,
- name TEXT NOT NULL,
- manager_id UUID,
- FOREIGN KEY (manager_id) REFERENCES people (id)
-);
-
--- insert data
-INSERT INTO people (id, name, manager_id) VALUES
- ("d3f29960-ccf0-44e4-be66-1a1544677441", "Foo", "076356f4-1a0e-451c-b9c6-a6f56feec941"),
- ("076356f4-1a0e-451c-b9c6-a6f56feec941", "Bar");
-
--- query data, make a relation
-
-SELECT employees.name AS 'employee-name',
- managers.name AS 'manager-name'
-FROM people employees
-INNER JOIN people managers ON employees.manager_id = managers.id;
-----
-
-[source,clojure]
-----
-;; create schema
-#{{:db/ident :person/id
- :db/valueType :db.type/uuid
- :db/cardinality :db.cardinality/one
- :db/unique :db.unique/value}
- {:db/ident :person/name
- :db/valueType :db.type/string
- :db/cardinality :db.cardinality/one}
- {:db/ident :person/manager
- :db/valueType :db.type/ref
- :db/cardinality :db.cardinality/one}}
-
-;; insert data
-#{{:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441"
- :person/name "Foo"
- :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]}
- {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"
- :person/name "Bar"}}
-
-;; query data, make a relation
-{:find [?employee-name ?manager-name]
- :where [[?person :person/name ?employee-name]
- [?person :person/manager ?manager]
- [?manager :person/name ?manager-name]]}
-----
-
-(forgive any errors on the above SQL and datalog code, I didn't run them to
-check. Patches welcome!)
-
-This employee example comes from the paper, and both SQL and datalog
-representations match the paper definition of "relational".
-
-Both "Foo" and "Bar" are employees, and the data is normalized. SQL represents
-data as tables, and Datomic as datoms, but relations could be derived from both,
-which we could view as:
-
-[source,sql]
-----
-employee_name | manager_name
-----------------------------
-"Foo" | "Bar"
-----
-
-== Conclusion
-
-The article also talks about operators, consistency and normalization, which are
-now so widespread and well-known that it feels a bit weird seeing someone
-advocating for it.
-
-I also stablish that `relational != SQL`, and other databases such as Datomic
-are also relational, following Codd's original definition.