diff options
author | EuAndreh <eu@euandre.org> | 2024-11-18 08:21:58 -0300 |
---|---|---|
committer | EuAndreh <eu@euandre.org> | 2024-11-18 08:44:57 -0300 |
commit | 960e4410f76801356ebd42801c914b2910a302a7 (patch) | |
tree | 615d379416f72956d0c1666c63ce062859041fbe /_articles/2021-04-29-a-relational-model-of-data-for-large-shared-data-banks-article-review.md | |
parent | Remove jekyll infrastructure setup (diff) | |
download | euandre.org-960e4410f76801356ebd42801c914b2910a302a7.tar.gz euandre.org-960e4410f76801356ebd42801c914b2910a302a7.tar.xz |
Diffstat (limited to '_articles/2021-04-29-a-relational-model-of-data-for-large-shared-data-banks-article-review.md')
-rw-r--r-- | _articles/2021-04-29-a-relational-model-of-data-for-large-shared-data-banks-article-review.md | 130 |
1 files changed, 0 insertions, 130 deletions
diff --git a/_articles/2021-04-29-a-relational-model-of-data-for-large-shared-data-banks-article-review.md b/_articles/2021-04-29-a-relational-model-of-data-for-large-shared-data-banks-article-review.md deleted file mode 100644 index e15b478..0000000 --- a/_articles/2021-04-29-a-relational-model-of-data-for-large-shared-data-banks-article-review.md +++ /dev/null @@ -1,130 +0,0 @@ ---- - -title: A Relational Model of Data for Large Shared Data Banks - article-review - -date: 2021-04-29 - -layout: post - -lang: en - -ref: a-relational-model-of-data-for-large-shared-data-banks-article-review - ---- - -This is a review of the article "[A Relational Model of Data for Large Shared Data Banks][codd-article]", by E. F. Codd. - -[codd-article]: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf - -## Data Independence - -Codd brings the idea of *data independence* as a better approach to use on databases. -This is contrast with the existing approaches, namely hierarquical (tree-based) and network-based. - -His main argument is that queries in applications shouldn't depende and be coupled with how the data is represented internally by the database system. -This key idea is very powerful, and something that we strive for in many other places: decoupling the interface from the implementation. - -If the database system has this separation, it can kep the querying interface stable, while having the freedom to change its internal representation at will, for better performance, less storage, etc. - -This is true for most modern database systems. -They can change from B-Trees with leafs containing pointers to data, to B-Trees with leafs containing the raw data , to hash tables. -All that without changing the query interface, only its performance. - -Codd mentions that, from an information representation standpoint, any index is a duplication, but useful for perfomance. - -This data independence also impacts ordering (a *relation* doesn't rely on the insertion order). - -## Duplicates - -His definition of relational data is a bit differente from most modern database systems, namely **no duplicate rows**. - -I couldn't find a reason behind this restriction, though. -For practical purposes, I find it useful to have it. - -## Relational Data - -In the article, Codd doesn't try to define a language, and today's most popular one is SQL. - -However, there is no restriction that says that "SQL database" and "relational database" are synonyms. -One could have a relational database without using SQL at all, and it would still be a relational one. - -The main one that I have in mind, and the reason that led me to reading this paper in the first place, is Datomic. - -Is uses an [edn]-based representation for datalog queries[^edn-queries], and a particular schema used to represent data. - -Even though it looks very weird when coming from SQL, I'd argue that it ticks all the boxes (except for "no duplicates") that defines a relational database, since building relations and applying operations on them is possible. - -Compare and contrast a contrived example of possible representations of SQL and datalog of the same data: - -```sql --- create schema -CREATE TABLE people ( - id UUID PRIMARY KEY, - name TEXT NOT NULL, - manager_id UUID, - FOREIGN KEY (manager_id) REFERENCES people (id) -); - --- insert data -INSERT INTO people (id, name, manager_id) VALUES - ("d3f29960-ccf0-44e4-be66-1a1544677441", "Foo", "076356f4-1a0e-451c-b9c6-a6f56feec941"), - ("076356f4-1a0e-451c-b9c6-a6f56feec941", "Bar"); - --- query data, make a relation - -SELECT employees.name AS 'employee-name', - managers.name AS 'manager-name' -FROM people employees -INNER JOIN people managers ON employees.manager_id = managers.id; -``` - -{% raw %} -``` -;; create schema -#{ {:db/ident :person/id - :db/valueType :db.type/uuid - :db/cardinality :db.cardinality/one - :db/unique :db.unique/value} - {:db/ident :person/name - :db/valueType :db.type/string - :db/cardinality :db.cardinality/one} - {:db/ident :person/manager - :db/valueType :db.type/ref - :db/cardinality :db.cardinality/one}} - -;; insert data -#{ {:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" - :person/name "Foo" - :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]} - {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941" - :person/name "Bar"}} - -;; query data, make a relation -{:find [?employee-name ?manager-name] - :where [[?person :person/name ?employee-name] - [?person :person/manager ?manager] - [?manager :person/name ?manager-name]]} -``` -{% endraw %} - -(forgive any errors on the above SQL and datalog code, I didn't run them to check. Patches welcome!) - -This employee example comes from the paper, and both SQL and datalog representations match the paper definition of "relational". - -Both "Foo" and "Bar" are employees, and the data is normalized. -SQL represents data as tables, and Datomic as datoms, but relations could be derived from both, which we could view as: - -``` -employee_name | manager_name ----------------------------- -"Foo" | "Bar" -``` - -[^edn-queries]: You can think of it as JSON, but with a Clojure taste. -[edn]: https://github.com/edn-format/edn - -## Conclusion - -The article also talks about operators, consistency and normalization, which are now so widespread and well-known that it feels a bit weird seeing someone advocating for it. - -I also stablish that `relational != SQL`, and other databases such as Datomic are also relational, following Codd's original definition. |