diff options
author | EuAndreh <eu@euandre.org> | 2025-03-31 21:51:40 -0300 |
---|---|---|
committer | EuAndreh <eu@euandre.org> | 2025-03-31 21:51:40 -0300 |
commit | 570ec471d1605318aeefb030cd78682ae442235b (patch) | |
tree | 51e17eabe37c6689f8799b55e6875c3480329a2c /src/content/blog/2021/04/29/relational-review.adoc | |
parent | Makefile, mkdeps.sh: Derive index.html and feed.xml from more static "sortdat... (diff) | |
download | euandre.org-570ec471d1605318aeefb030cd78682ae442235b.tar.gz euandre.org-570ec471d1605318aeefb030cd78682ae442235b.tar.xz |
src/content/: Update all files left to asciidoc
Diffstat (limited to 'src/content/blog/2021/04/29/relational-review.adoc')
-rw-r--r-- | src/content/blog/2021/04/29/relational-review.adoc | 126 |
1 files changed, 69 insertions, 57 deletions
diff --git a/src/content/blog/2021/04/29/relational-review.adoc b/src/content/blog/2021/04/29/relational-review.adoc index e15b478..cb552c3 100644 --- a/src/content/blog/2021/04/29/relational-review.adoc +++ b/src/content/blog/2021/04/29/relational-review.adoc @@ -1,62 +1,73 @@ ---- += A Relational Model of Data for Large Shared Data Banks - article-review -title: A Relational Model of Data for Large Shared Data Banks - article-review +:empty: +:reviewed-article: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf -date: 2021-04-29 +This is a review of the article "{reviewed-article}[A Relational Model of Data +for Large Shared Data Banks]", by E. F. Codd. -layout: post +== Data Independence -lang: en +Codd brings the idea of _data independence_ as a better approach to use on +databases. This is contrast with the existing approaches, namely hierarquical +(tree-based) and network-based. -ref: a-relational-model-of-data-for-large-shared-data-banks-article-review +His main argument is that queries in applications shouldn't depende and be +coupled with how the data is represented internally by the database system. +This key idea is very powerful, and something that we strive for in many other +places: decoupling the interface from the implementation. ---- +If the database system has this separation, it can kep the querying interface +stable, while having the freedom to change its internal representation at will, +for better performance, less storage, etc. -This is a review of the article "[A Relational Model of Data for Large Shared Data Banks][codd-article]", by E. F. Codd. +This is true for most modern database systems. They can change from B-Trees +with leafs containing pointers to data, to B-Trees with leafs containing the raw +data , to hash tables. All that without changing the query interface, only its +performance. -[codd-article]: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf +Codd mentions that, from an information representation standpoint, any index is +a duplication, but useful for perfomance. -## Data Independence +This data independence also impacts ordering (a _relation_ doesn't rely on the +insertion order). -Codd brings the idea of *data independence* as a better approach to use on databases. -This is contrast with the existing approaches, namely hierarquical (tree-based) and network-based. +== Duplicates -His main argument is that queries in applications shouldn't depende and be coupled with how the data is represented internally by the database system. -This key idea is very powerful, and something that we strive for in many other places: decoupling the interface from the implementation. +His definition of relational data is a bit differente from most modern database +systems, namely *no duplicate rows*. -If the database system has this separation, it can kep the querying interface stable, while having the freedom to change its internal representation at will, for better performance, less storage, etc. +I couldn't find a reason behind this restriction, though. For practical +purposes, I find it useful to have it. -This is true for most modern database systems. -They can change from B-Trees with leafs containing pointers to data, to B-Trees with leafs containing the raw data , to hash tables. -All that without changing the query interface, only its performance. +== Relational Data -Codd mentions that, from an information representation standpoint, any index is a duplication, but useful for perfomance. +:edn: https://github.com/edn-format/edn -This data independence also impacts ordering (a *relation* doesn't rely on the insertion order). +In the article, Codd doesn't try to define a language, and today's most popular +one is SQL. -## Duplicates +However, there is no restriction that says that "SQL database" and "relational +database" are synonyms. One could have a relational database without using SQL +at all, and it would still be a relational one. -His definition of relational data is a bit differente from most modern database systems, namely **no duplicate rows**. +The main one that I have in mind, and the reason that led me to reading this +paper in the first place, is Datomic. -I couldn't find a reason behind this restriction, though. -For practical purposes, I find it useful to have it. +Is uses an {edn}[edn]-based representation for datalog +queries{empty}footnote:edn-queries[ + You can think of it as JSON, but with a Clojure taste. +], and a particular schema used to represent data. -## Relational Data +Even though it looks very weird when coming from SQL, I'd argue that it ticks +all the boxes (except for "no duplicates") that defines a relational database, +since building relations and applying operations on them is possible. -In the article, Codd doesn't try to define a language, and today's most popular one is SQL. +Compare and contrast a contrived example of possible representations of SQL and +datalog of the same data: -However, there is no restriction that says that "SQL database" and "relational database" are synonyms. -One could have a relational database without using SQL at all, and it would still be a relational one. - -The main one that I have in mind, and the reason that led me to reading this paper in the first place, is Datomic. - -Is uses an [edn]-based representation for datalog queries[^edn-queries], and a particular schema used to represent data. - -Even though it looks very weird when coming from SQL, I'd argue that it ticks all the boxes (except for "no duplicates") that defines a relational database, since building relations and applying operations on them is possible. - -Compare and contrast a contrived example of possible representations of SQL and datalog of the same data: - -```sql +[source,sql] +---- -- create schema CREATE TABLE people ( id UUID PRIMARY KEY, @@ -76,12 +87,11 @@ SELECT employees.name AS 'employee-name', managers.name AS 'manager-name' FROM people employees INNER JOIN people managers ON employees.manager_id = managers.id; -``` +---- -{% raw %} -``` +---- ;; create schema -#{ {:db/ident :person/id +#{{:db/ident :person/id :db/valueType :db.type/uuid :db/cardinality :db.cardinality/one :db/unique :db.unique/value} @@ -93,7 +103,7 @@ INNER JOIN people managers ON employees.manager_id = managers.id; :db/cardinality :db.cardinality/one}} ;; insert data -#{ {:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" +#{{:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" :person/name "Foo" :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]} {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941" @@ -104,27 +114,29 @@ INNER JOIN people managers ON employees.manager_id = managers.id; :where [[?person :person/name ?employee-name] [?person :person/manager ?manager] [?manager :person/name ?manager-name]]} -``` -{% endraw %} +---- -(forgive any errors on the above SQL and datalog code, I didn't run them to check. Patches welcome!) +(forgive any errors on the above SQL and datalog code, I didn't run them to +check. Patches welcome!) -This employee example comes from the paper, and both SQL and datalog representations match the paper definition of "relational". +This employee example comes from the paper, and both SQL and datalog +representations match the paper definition of "relational". -Both "Foo" and "Bar" are employees, and the data is normalized. -SQL represents data as tables, and Datomic as datoms, but relations could be derived from both, which we could view as: +Both "Foo" and "Bar" are employees, and the data is normalized. SQL represents +data as tables, and Datomic as datoms, but relations could be derived from both, +which we could view as: -``` +.... employee_name | manager_name ---------------------------- "Foo" | "Bar" -``` - -[^edn-queries]: You can think of it as JSON, but with a Clojure taste. -[edn]: https://github.com/edn-format/edn +.... -## Conclusion +== Conclusion -The article also talks about operators, consistency and normalization, which are now so widespread and well-known that it feels a bit weird seeing someone advocating for it. +The article also talks about operators, consistency and normalization, which are +now so widespread and well-known that it feels a bit weird seeing someone +advocating for it. -I also stablish that `relational != SQL`, and other databases such as Datomic are also relational, following Codd's original definition. +I also stablish that `relational != SQL`, and other databases such as Datomic +are also relational, following Codd's original definition. |