diff options
author | EuAndreh <eu@euandre.org> | 2025-04-18 02:17:12 -0300 |
---|---|---|
committer | EuAndreh <eu@euandre.org> | 2025-04-18 02:48:42 -0300 |
commit | 020c1e77489b772f854bb3288b9c8d2818a6bf9d (patch) | |
tree | 142aec725a52162a446ea7d947cb4347c9d573c9 /src/content/en/blog/2021/04/29 | |
parent | Makefile: Remove security.txt.gz (diff) | |
download | euandre.org-020c1e77489b772f854bb3288b9c8d2818a6bf9d.tar.gz euandre.org-020c1e77489b772f854bb3288b9c8d2818a6bf9d.tar.xz |
git mv src/content/* src/content/en/
Diffstat (limited to 'src/content/en/blog/2021/04/29')
-rw-r--r-- | src/content/en/blog/2021/04/29/relational-review.adoc | 144 |
1 files changed, 144 insertions, 0 deletions
diff --git a/src/content/en/blog/2021/04/29/relational-review.adoc b/src/content/en/blog/2021/04/29/relational-review.adoc new file mode 100644 index 0000000..4b53737 --- /dev/null +++ b/src/content/en/blog/2021/04/29/relational-review.adoc @@ -0,0 +1,144 @@ += A Relational Model of Data for Large Shared Data Banks - article-review + +:empty: +:reviewed-article: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf + +This is a review of the article "{reviewed-article}[A Relational Model of Data +for Large Shared Data Banks]", by E. F. Codd. + +== Data Independence + +Codd brings the idea of _data independence_ as a better approach to use on +databases. This is contrast with the existing approaches, namely hierarquical +(tree-based) and network-based. + +His main argument is that queries in applications shouldn't depende and be +coupled with how the data is represented internally by the database system. +This key idea is very powerful, and something that we strive for in many other +places: decoupling the interface from the implementation. + +If the database system has this separation, it can kep the querying interface +stable, while having the freedom to change its internal representation at will, +for better performance, less storage, etc. + +This is true for most modern database systems. They can change from B-Trees +with leafs containing pointers to data, to B-Trees with leafs containing the raw +data , to hash tables. All that without changing the query interface, only its +performance. + +Codd mentions that, from an information representation standpoint, any index is +a duplication, but useful for perfomance. + +This data independence also impacts ordering (a _relation_ doesn't rely on the +insertion order). + +== Duplicates + +His definition of relational data is a bit differente from most modern database +systems, namely *no duplicate rows*. + +I couldn't find a reason behind this restriction, though. For practical +purposes, I find it useful to have it. + +== Relational Data + +:edn: https://github.com/edn-format/edn + +In the article, Codd doesn't try to define a language, and today's most popular +one is SQL. + +However, there is no restriction that says that "SQL database" and "relational +database" are synonyms. One could have a relational database without using SQL +at all, and it would still be a relational one. + +The main one that I have in mind, and the reason that led me to reading this +paper in the first place, is Datomic. + +Is uses an {edn}[edn]-based representation for datalog +queries{empty}footnote:edn-queries[ + You can think of it as JSON, but with a Clojure taste. +], and a particular schema used to represent data. + +Even though it looks very weird when coming from SQL, I'd argue that it ticks +all the boxes (except for "no duplicates") that defines a relational database, +since building relations and applying operations on them is possible. + +Compare and contrast a contrived example of possible representations of SQL and +datalog of the same data: + +[source,sql] +---- +-- create schema +CREATE TABLE people ( + id UUID PRIMARY KEY, + name TEXT NOT NULL, + manager_id UUID, + FOREIGN KEY (manager_id) REFERENCES people (id) +); + +-- insert data +INSERT INTO people (id, name, manager_id) VALUES + ("d3f29960-ccf0-44e4-be66-1a1544677441", "Foo", "076356f4-1a0e-451c-b9c6-a6f56feec941"), + ("076356f4-1a0e-451c-b9c6-a6f56feec941", "Bar"); + +-- query data, make a relation + +SELECT employees.name AS 'employee-name', + managers.name AS 'manager-name' +FROM people employees +INNER JOIN people managers ON employees.manager_id = managers.id; +---- + +[source,clojure] +---- +;; create schema +#{{:db/ident :person/id + :db/valueType :db.type/uuid + :db/cardinality :db.cardinality/one + :db/unique :db.unique/value} + {:db/ident :person/name + :db/valueType :db.type/string + :db/cardinality :db.cardinality/one} + {:db/ident :person/manager + :db/valueType :db.type/ref + :db/cardinality :db.cardinality/one}} + +;; insert data +#{{:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" + :person/name "Foo" + :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]} + {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941" + :person/name "Bar"}} + +;; query data, make a relation +{:find [?employee-name ?manager-name] + :where [[?person :person/name ?employee-name] + [?person :person/manager ?manager] + [?manager :person/name ?manager-name]]} +---- + +(forgive any errors on the above SQL and datalog code, I didn't run them to +check. Patches welcome!) + +This employee example comes from the paper, and both SQL and datalog +representations match the paper definition of "relational". + +Both "Foo" and "Bar" are employees, and the data is normalized. SQL represents +data as tables, and Datomic as datoms, but relations could be derived from both, +which we could view as: + +[source,sql] +---- +employee_name | manager_name +---------------------------- +"Foo" | "Bar" +---- + +== Conclusion + +The article also talks about operators, consistency and normalization, which are +now so widespread and well-known that it feels a bit weird seeing someone +advocating for it. + +I also stablish that `relational != SQL`, and other databases such as Datomic +are also relational, following Codd's original definition. |