# msgid "" msgstr "" msgid "" "title: A Relational Model of Data for Large Shared Data Banks - article-" "review" msgstr "" msgid "date: 2021-04-29" msgstr "" msgid "layout: post" msgstr "" msgid "lang: en" msgstr "" msgid "" "ref: a-relational-model-of-data-for-large-shared-data-banks-article-review" msgstr "" msgid "" "This is a review of the article \"[A Relational Model of Data for Large " "Shared Data Banks](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf)\"," " by E. F. Codd." msgstr "" msgid "Data Independence" msgstr "" msgid "" "Codd brings the idea of *data independence* as a better approach to use on " "databases. This is contrast with the existing approaches, namely " "hierarquical (tree-based) and network-based." msgstr "" msgid "" "His main argument is that queries in applications shouldn't depende and be " "coupled with how the data is represented internally by the database system. " "This key idea is very powerful, and something that we strive for in many " "other places: decoupling the interface from the implementation." msgstr "" msgid "" "If the database system has this separation, it can kep the querying " "interface stable, while having the freedom to change its internal " "representation at will, for better performance, less storage, etc." msgstr "" msgid "" "This is true for most modern database systems. They can change from B-Trees " "with leafs containing pointers to data, to B-Trees with leafs containing the" " raw data , to hash tables. All that without changing the query interface, " "only its performance." msgstr "" msgid "" "Codd mentions that, from an information representation standpoint, any index" " is a duplication, but useful for perfomance." msgstr "" msgid "" "This data independence also impacts ordering (a *relation* doesn't rely on " "the insertion order)." msgstr "" msgid "Duplicates" msgstr "" msgid "" "His definition of relational data is a bit differente from most modern " "database systems, namely **no duplicate rows**." msgstr "" msgid "" "I couldn't find a reason behind this restriction, though. For practical " "purposes, I find it useful to have it." msgstr "" msgid "Relational Data" msgstr "" msgid "" "In the article, Codd doesn't try to define a language, and today's most " "popular one is SQL." msgstr "" msgid "" "However, there is no restriction that says that \"SQL database\" and " "\"relational database\" are synonyms. One could have a relational database " "without using SQL at all, and it would still be a relational one." msgstr "" msgid "" "The main one that I have in mind, and the reason that led me to reading this" " paper in the first place, is Datomic." msgstr "" msgid "" "Is uses an [edn]-based representation for datalog queries[^edn-queries], and" " a particular schema used to represent data." msgstr "" msgid "" "Even though it looks very weird when coming from SQL, I'd argue that it " "ticks all the boxes (except for \"no duplicates\") that defines a relational" " database, since building relations and applying operations on them is " "possible." msgstr "" msgid "" "Compare and contrast a contrived example of possible representations of SQL " "and datalog of the same data:" msgstr "" msgid "" "-- create schema\n" "CREATE TABLE people (\n" " id UUID PRIMARY KEY,\n" " name TEXT NOT NULL,\n" " manager_id UUID,\n" " FOREIGN KEY (manager_id) REFERENCES people (id)\n" ");\n" "\n" "-- insert data\n" "INSERT INTO people (id, name, manager_id) VALUES\n" " (\"d3f29960-ccf0-44e4-be66-1a1544677441\", \"Foo\", \"076356f4-1a0e-451c-b9c6-a6f56feec941\"),\n" " (\"076356f4-1a0e-451c-b9c6-a6f56feec941\", \"Bar\");\n" "\n" "-- query data, make a relation\n" "\n" "SELECT employees.name AS 'employee-name',\n" " managers.name AS 'manager-name'\n" "FROM people employees\n" "INNER JOIN people managers ON employees.manager_id = managers.id;\n" msgstr "" msgid "{% raw %}" msgstr "" msgid "" ";; create schema\n" "#{ {:db/ident :person/id\n" " :db/valueType :db.type/uuid\n" " :db/cardinality :db.cardinality/one\n" " :db/unique :db.unique/value}\n" " {:db/ident :person/name\n" " :db/valueType :db.type/string\n" " :db/cardinality :db.cardinality/one}\n" " {:db/ident :person/manager\n" " :db/valueType :db.type/ref\n" " :db/cardinality :db.cardinality/one}}\n" "\n" ";; insert data\n" "#{ {:person/id #uuid \"d3f29960-ccf0-44e4-be66-1a1544677441\"\n" " :person/name \"Foo\"\n" " :person/manager [:person/id #uuid \"076356f4-1a0e-451c-b9c6-a6f56feec941\"]}\n" " {:person/id #uuid \"076356f4-1a0e-451c-b9c6-a6f56feec941\"\n" " :person/name \"Bar\"}}\n" "\n" ";; query data, make a relation\n" "{:find [?employee-name ?manager-name]\n" " :where [[?person :person/name ?employee-name]\n" " [?person :person/manager ?manager]\n" " [?manager :person/name ?manager-name]]}\n" msgstr "" msgid "{% endraw %}" msgstr "" msgid "" "(forgive any errors on the above SQL and datalog code, I didn't run them to " "check. Patches welcome!)" msgstr "" msgid "" "This employee example comes from the paper, and both SQL and datalog " "representations match the paper definition of \"relational\"." msgstr "" msgid "" "Both \"Foo\" and \"Bar\" are employees, and the data is normalized. SQL " "represents data as tables, and Datomic as datoms, but relations could be " "derived from both, which we could view as:" msgstr "" msgid "" "employee_name | manager_name\n" "----------------------------\n" "\"Foo\" | \"Bar\"\n" msgstr "" msgid "" "[^edn-queries]: You can think of it as JSON, but with a Clojure taste. " "[edn]: https://github.com/edn-format/edn" msgstr "" msgid "Conclusion" msgstr "" msgid "" "The article also talks about operators, consistency and normalization, which" " are now so widespread and well-known that it feels a bit weird seeing " "someone advocating for it." msgstr "" msgid "" "I also stablish that `relational != SQL`, and other databases such as " "Datomic are also relational, following Codd's original definition." msgstr ""