aboutsummaryrefslogtreecommitdiff
path: root/_articles/2018-12-21-using-youtube-dl-to-manage-youtube-subscriptions.md
blob: 183c624b70dd7b7abb2f9372b18d260ec765a721 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
title: Using "youtube-dl" to manage YouTube subscriptions
date: 2018-12-21
layout: post
lang: en
ref: using-youtube-dl-to-manage-youtube-subscriptions
---
I've recently read the
[announcement](https://www.reddit.com/r/DataHoarder/comments/9sg8q5/i_built_a_selfhosted_youtube_subscription_manager/)
of a very nice [self-hosted YouTube subscription
manager](https://github.com/chibicitiberiu/ytsm). I haven't used
YouTube's built-in subscriptions for a while now, and haven't missed
it at all. When I saw the announcement, I considered writing about the
solution I've built on top of [youtube-dl](https://youtube-dl.org/).

## Background: the problem with YouTube

In many ways, I agree with [André Staltz's view on data ownership and
privacy](https://staltz.com/what-happens-when-you-block-internet-giants.html):

> I started with the basic premise that "I want to be in control of my
> data". Sometimes that meant choosing when to interact with an internet
> giant and how much I feel like revealing to them. Most of times it
> meant not interacting with them at all. I don't want to let them be in
> full control of how much they can know about me. I don't want to be in
> autopilot mode. (...) Which leads us to YouTube. While I was able to
> find alternatives to Gmail (Fastmail), Calendar (Fastmail), Translate
> (Yandex Translate), *etc.* YouTube remains as the most indispensable
> Google-owned web service. It is really really hard to avoid consuming
> YouTube content. It was probably the smartest startup acquisition
> ever. My privacy-oriented alternative is to watch YouTube videos
> through Tor, which is technically feasible but not polite to use the
> Tor bandwidth for these purposes. I'm still scratching my head with
> this issue.

Even though I don't use most alternative services he mentions, I do
watch videos from YouTube. But I also feel uncomfortable logging in to
YouTube with a Google account, watching videos, creating playlists and
similar things.

Using the mobile app is worse: you can't even block ads in there.
You're in less control on what you share with YouTube and Google.

## youtube-dl

youtube-dl is a command-line tool for downloading videos, from YouTube
and [many other sites](https://rg3.github.io/youtube-dl/supportedsites.html):

```shell
$ youtube-dl https://www.youtube.com/watch?v=rnMYZnY3uLA
[youtube] rnMYZnY3uLA: Downloading webpage
[youtube] rnMYZnY3uLA: Downloading video info webpage
[download] Destination: A Origem da Vida _ Nerdologia-rnMYZnY3uLA.mp4
[download] 100% of 32.11MiB in 00:12
```

It can be used to download individual videos as showed above, but it
also has some interesting flags that we can use:

-   `--output`: use a custom template to create the name of the
    downloaded file;
-   `--download-archive`: use a text file for recording and remembering
    which videos were already downloaded;
-   `--prefer-free-formats`: prefer free video formats, like `webm`,
    `ogv` and Matroska `mkv`;
-   `--playlist-end`: how many videos to download from a "playlist" (a
    channel, a user or an actual playlist);
-   `--write-description`: write the video description to a
    `.description` file, useful for accessing links and extra content.

Putting it all together:

```shell
$ youtube-dl "https://www.youtube.com/channel/UClu474HMt895mVxZdlIHXEA" \
             --download-archive ~/Nextcloud/cache/youtube-dl-seen.conf \
             --prefer-free-formats \
             --playlist-end 20 \
             --write-description \
             --output "~/Downloads/yt-dl/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s"
```

This will download the latest 20 videos from the selected channel, and
write down the video IDs in the `youtube-dl-seen.conf` file. Running it
immediately after one more time won't have any effect.

If the channel posts one more video, running the same command again will
download only the last video, since the other 19 were already
downloaded.

With this basic setup you have a minimal subscription system at work,
and you can create some functions to help you manage that:

```shell
#!/bin/sh

export DEFAULT_PLAYLIST_END=15

download() {
  youtube-dl "$1" \
             --download-archive ~/Nextcloud/cache/youtube-dl-seen.conf \
             --prefer-free-formats \
             --playlist-end $2 \
             --write-description \
             --output "~/Downloads/yt-dl/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s"
}
export -f download


download_user() {
  download "https://www.youtube.com/user/$1" ${2-$DEFAULT_PLAYLIST_END}
}
export -f download_user


download_channel() {
  download "https://www.youtube.com/channel/$1" ${2-$DEFAULT_PLAYLIST_END}
}
export -f download_channel


download_playlist() {
  download "https://www.youtube.com/playlist?list=$1" ${2-$DEFAULT_PLAYLIST_END}
}
export -f download_playlist
```

With these functions, you now can have a subscription fetching script to
download the latest videos from your favorite channels:

```shell
#!/bin/sh

download_user     ClojureTV                            15
download_channel  "UCmEClzCBDx-vrt0GuSKBd9g"           100
download_playlist "PLqG7fA3EaMRPzL5jzd83tWcjCUH9ZUsbX" 15
```

Now, whenever you want to watch the latest videos, just run the above
script and you'll get all of them in your local machine.

## Tradeoffs

### I've made it for myself, with my use case in mind

1.  Offline

    My internet speed it somewhat reasonable[^internet-speed], but it is really
    unstable. Either at work or at home, it's not uncommon to loose internet
    access for 2 minutes 3~5 times every day, and stay completely offline for a
    couple of hours once every week.

    Working through the hassle of keeping a playlist on disk has payed
    off many, many times. Sometimes I even not notice when the
    connection drops for some minutes, because I'm watching a video and
    working on some document, all on my local computer.

    There's also no quality adjustment for YouTube's web player, I
    always pick the higher quality and it doesn't change during the
    video. For some types of content, like a podcast with some tiny
    visual resources, this doesn't change much. For other types of
    content, like a keynote presentation with text written on the
    slides, watching on 144p isn't really an option.

    If the internet connection drops during the video download,
    youtube-dl will resume from where it stopped.

    This is an offline first benefit that I really like, and works well
    for me.

2.  Sync the "seen" file

    I already have a running instance of Nextcloud, so just dumping the
    `youtube-dl-seen.conf` file inside Nextcloud was a no-brainer.

    You could try putting it in a dedicated git repository, and wrap the
    script with an autocommit after every run. If you ever had a merge
    conflict, you'd simply accept all changes and then run:

    ```shell
    $ uniq youtube-dl-seen.conf > youtube-dl-seen.conf
    ```

    to tidy up the file.

3.  Doesn't work on mobile

    My primary device that I use everyday is my laptop, not my phone. It
    works well for me this way.

    Also, it's harder to add ad-blockers to mobile phones, and most
    mobile software still depends on Google's and Apple's blessing.

    If you wish, you can sync the videos to the SD card periodically,
    but that's a bit of extra manual work.

### The Good

1.  Better privacy

    We don't even have to configure the ad-blocker to keep ads and
    trackers away!

    YouTube still has your IP address, so using a VPN is always a good
    idea. However, a timing analysis would be able to identify you
    (considering the current implementation).

2.  No need to self-host

    There's no host that needs maintenance. Everything runs locally.

    As long as you keep youtube-dl itself up to date and sync your
    "seen" file, there's little extra work to do.

3.  Track your subscriptions with git

    After creating a `subscriptions.sh` executable that downloads all
    the videos, you can add it to git and use it to track metadata about
    your subscriptions.

### The Bad

1.  Maximum playlist size is your disk size

    This is a good thing for getting a realistic view on your actual
    "watch later" list. However I've run out of disk space many
    times, and now I need to be more aware of how much is left.

### The Ugly

We can only avoid all the bad parts of YouTube with youtube-dl as long
as YouTube keeps the videos public and programmatically accessible. If
YouTube ever blocks that we'd loose the ability to consume content this
way, but also loose confidence on considering YouTube a healthy
repository of videos on the internet.

## Going beyond

Since you're running everything locally, here are some possibilities to
be explored:

### A playlist that is too long for being downloaded all at once

You can wrap the `download_playlist` function (let's call the wrapper
`inc_download`) and instead of passing it a fixed number to the
`--playlist-end` parameter, you can store the `$n` in a folder
(something like `$HOME/.yt-db/$PLAYLIST_ID`) and increment it by `$step`
every time you run `inc_download`.

This way you can incrementally download videos from a huge playlist
without filling your disk with gigabytes of content all at once.

### Multiple computer scenario

The `download_playlist` function could be aware of the specific machine
that it is running on and apply specific policies depending on the
machine: always download everything; only download videos that aren't
present anywhere else; *etc.*

## Conclusion

youtube-dl is a great tool to keep at hand. It covers a really large
range of video websites and works robustly.

Feel free to copy and modify this code, and
[send me](mailto:{{ site.author.email }}) suggestions of improvements or related
content.

## *Edit*

2019-05-22: Fix spelling.

[^internet-speed]: Considering how expensive it is and the many ways it could be
    better, but also how much it has improved over the last years, I say it's
    reasonable.