aboutsummaryrefslogtreecommitdiff
path: root/site/posts
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--site/posts/2018-08-01-verifying-npm-ci-reproducibility.org12
-rw-r--r--site/posts/2018-12-21-using-youtube-dl-to-manage-youtube-subscriptions.org14
2 files changed, 15 insertions, 11 deletions
diff --git a/site/posts/2018-08-01-verifying-npm-ci-reproducibility.org b/site/posts/2018-08-01-verifying-npm-ci-reproducibility.org
index 4c01a62..da3947d 100644
--- a/site/posts/2018-08-01-verifying-npm-ci-reproducibility.org
+++ b/site/posts/2018-08-01-verifying-npm-ci-reproducibility.org
@@ -4,9 +4,9 @@ date: 2018-08-01
---
When [[https://blog.npmjs.org/post/161081169345/v500][npm@5]] came bringing [[https://docs.npmjs.com/files/package-locks][package-locks]] with it, I was confused about the benefits it provided, since running =npm install= more than once could resolve all the dependencies again and yield yet another fresh =package-lock.json= file. The message saying "you should add this file to version control" left me hesitant on what to do[fn:npm-install].
-However the [[https://blog.npmjs.org/post/171556855892/introducing-npm-ci-for-faster-more-reliable][addition of =npm ci=]] filled this gapped: it's a stricter variation of =npm install= which guarantees that "[[https://docs.npmjs.com/files/package-lock.json][subsequent installs are able to generate identical trees]]". But are they really identical? I could see that I didn't have the same problems of different installation outputs, but I didn't know for *sure* if it was really identical.
+However the [[https://blog.npmjs.org/post/171556855892/introducing-npm-ci-for-faster-more-reliable][addition of =npm ci=]] filled this gap: it's a stricter variation of =npm install= which guarantees that "[[https://docs.npmjs.com/files/package-lock.json][subsequent installs are able to generate identical trees]]". But are they really identical? I could see that I didn't have the same problems of different installation outputs, but I didn't know for *sure* if it was really identical.
** Computing the hash of a directory's content
-I quickly searched for a way to check for the hash signature of an entire directory tree, but I couldn't find one. I've made a poor man's [[https://en.wikipedia.org/wiki/Merkle_tree][Merkle tree]] implementation using =sha256sum= and a few piped comands at the terminal:
+I quickly searched for a way to check for the hash signature of an entire directory tree, but I couldn't find one. I've made a poor man's [[https://en.wikipedia.org/wiki/Merkle_tree][Merkle tree]] implementation using =sha256sum= and a few piped commands at the terminal:
#+BEGIN_SRC bash -n
merkle-tree () {
dirname="${1-.}"
@@ -24,8 +24,8 @@ Going through it line by line:
- #2 it accepts a single argument: the directory to compute the merkle tree from. If nothing is given, it runs on the current directory (=.=);
- #3 we go to the directory, so we don't get different prefixes in =find='s output (like =../a/b=);
- #4 we get all files from the directory tree. Since we're using =sha256sum= to compute the hash of the file contents, we need to filter out folders from it;
-- #5 we need to sort the output, since different filesystems and =find= implementations may return files in different orders;
-- #6 we use =xargs= to compute the hash of each file individually through =sha256sum=. Since a file may contain spaces we need to scape it with quotes;
+- #5 we need to sort the output, since different file systems and =find= implementations may return files in different orders;
+- #6 we use =xargs= to compute the hash of each file individually through =sha256sum=. Since a file may contain spaces we need to escape it with quotes;
- #7 we compute the hash of the combined hashes. Since =sha256sum= output is formatted like =<hash> <filename>=, it produces a different final hash if a file ever changes name without changing it's content;
- #8 we get the final hash output, excluding the =<filename>= (which is =-= in this case, aka =stdin=).
*** Positive points:
@@ -73,11 +73,13 @@ In this test case I'll take the main repo of [[https://lernajs.io/][Lerna]][fn:j
#+END_SRC
Good job =npm ci= :)
-#6 and #9 take some time to run (21s in my machine), but this specific use case isn't performance sensitive. The slowest step is computing the hash of each individual file.
+#6 and #9 take some time to run (21 seconds in my machine), but this specific use case isn't performance sensitive. The slowest step is computing the hash of each individual file.
** Conclusion
=npm ci= really "generates identical trees".
I'm not aware of any other existing solution for verifying the hash signature of a directory. If you know any I'd [[mailto:eu@euandre.org][like to know]].
+** /Edit/
+2019/05/22: Fix spelling.
[fn:npm-install] The [[https://docs.npmjs.com/cli/install#description][documentation]] claims =npm install= is driven by the existing =package-lock.json=, but that' actually [[https://github.com/npm/npm/issues/17979#issuecomment-332701215][a little bit tricky]].
[fn:js-repos] Finding a big known repo that actually committed the =package-lock.json= file was harder than I expected.
diff --git a/site/posts/2018-12-21-using-youtube-dl-to-manage-youtube-subscriptions.org b/site/posts/2018-12-21-using-youtube-dl-to-manage-youtube-subscriptions.org
index 824810c..594b892 100644
--- a/site/posts/2018-12-21-using-youtube-dl-to-manage-youtube-subscriptions.org
+++ b/site/posts/2018-12-21-using-youtube-dl-to-manage-youtube-subscriptions.org
@@ -10,7 +10,7 @@ I started with the basic premise that “I want to be in control of my data”.
(...)
Which leads us to YouTube. While I was able to find alternatives to Gmail (Fastmail), Calendar (Fastmail), Translate (Yandex Translate), etc, YouTube remains as the most indispensable Google-owned web service. It is really really hard to avoid consuming YouTube content. It was probably the smartest startup acquisition ever. My privacy-oriented alternative is to watch YouTube videos through Tor, which is technically feasible but not polite to use the Tor bandwidth for these purposes. I’m still scratching my head with this issue.
#+END_QUOTE
-Even though I don't use most alternative services he mentions, I do watch videos from YouTube. But I also feel uncomfortable logging in to YouTube with a Google account, watching videos, creating playlists and similar thigs.
+Even though I don't use most alternative services he mentions, I do watch videos from YouTube. But I also feel uncomfortable logging in to YouTube with a Google account, watching videos, creating playlists and similar things.
Using the mobile app is worse: you can't even block ads in there. You're in less control on what you share with YouTube and Google.
** youtube-dl
@@ -38,7 +38,7 @@ $ youtube-dl "https://www.youtube.com/channel/UClu474HMt895mVxZdlIHXEA" \
--write-description \
--output "~/Downloads/yt-dl/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s"
#+END_SRC
-This will download the latest 20 videos from the selected channel, and write down the video IDs in the =youtube-dl-seen.conf= file. Running it immediatly after one more time won't have any effect.
+This will download the latest 20 videos from the selected channel, and write down the video IDs in the =youtube-dl-seen.conf= file. Running it immediately after one more time won't have any effect.
If the channel posts one more video, running the same command again will download only the last video, since the other 19 were already downloaded.
@@ -88,7 +88,7 @@ Now, whenever you want to watch the latest videos, just run the above script and
** Tradeoffs
*** I've made it for myself, with my use case in mind
**** Offline
-My internet speed it somewhat reasonable[fn:reasonable-internet], but it is really unstable. Either at work or at home, it's not uncommom to loose internet access for 2 minutes 3~5 times every day, and stay completly offline for a couple of hours once every week.
+My internet speed it somewhat reasonable[fn:reasonable-internet], but it is really unstable. Either at work or at home, it's not uncommon to loose internet access for 2 minutes 3~5 times every day, and stay completely offline for a couple of hours once every week.
Working through the hassle of keeping a playlist on disk has payed off many, many times. Sometimes I even not notice when the connection drops for some minutes, because I'm watching a video and working on some document, all on my local computer.
@@ -98,7 +98,7 @@ If the internet connection drops during the video download, youtube-dl will resu
This is an offline first benefit that I really like, and works well for me.
**** Sync the "seen" file
-I already have a running instance of Nextcloud, so just dumping the =youtube-dl-seen.conf= file inside Nextcloud was a no brainer.
+I already have a running instance of Nextcloud, so just dumping the =youtube-dl-seen.conf= file inside Nextcloud was a no-brainer.
You could try putting it in a dedicated git repository, and wrap the script with an autocommit after every run. If you ever had a merge conflict, you'd simply accept all changes and then run:
#+BEGIN_SRC shell
@@ -117,14 +117,14 @@ We don't even have to configure the ad-blocker to keep ads and trackers away!
YouTube still has your IP address, so using a VPN is always a good idea. However, a timing analysis would be able to identify you (considering the current implementation).
**** No need to self-host
-There's no host that needs maintenence. Everything runs locally.
+There's no host that needs maintenance. Everything runs locally.
As long as you keep youtube-dl itself up to date and sync your "seen" file, there's little extra work to do.
**** Track your subscriptions with git
After creating a =subscriptions.sh= executable that downloads all the videos, you can add it to git and use it to track metadata about your subscriptions.
*** The Bad
**** Maximum playlist size is your disk size
-This is a good thig for getting a realistic view on your actual "watch later" list. However I've run out of disk space many times, and now I need to be more aware of how much is left.
+This is a good thing for getting a realistic view on your actual "watch later" list. However I've run out of disk space many times, and now I need to be more aware of how much is left.
*** The Ugly
We can only avoid all the bad parts of YouTube with youtube-dl as long as YouTube keeps the videos public and programmatically accessible. If YouTube ever blocks that we'd loose the ability to consume content this way, but also loose confidence on considering YouTube a healthy repository of videos on the internet.
** Going beyond
@@ -139,5 +139,7 @@ The =download_playlist= function could be aware of the specific machine that it
youtube-dl is a great tool to keep at hand. It covers a really large range of video websites and works robustly.
Feel free to copy and modify this code, and [[mailto:eu@euandre.org][send me]] suggestions of improvements or related content.
+** /Edit/
+2019/05/22: Fix spelling.
[fn:reasonable-internet] Considering how expensive it is and the many ways it could be better, but also how much it has improved over the last years, I say it's reasonable.