You-get: Dumb downloader that scrapes the web

198 points by Anon84 11 hours ago

andai 6 hours ago

For a while I had expensive internet and low bandwidth, but I loved listening to music and lectures on YouTube. At some point I realized that getting only the audio stream would save me 90% in bandwidth costs. [0]

youtube-dl (and yt-dlp) has a flag, I believe -G, which gives you the URL(s) for the requested format/quality. I used the command line on my computer and put the link in VLC. On my phone I had this elaborate workaround involving downloading the file to my VPS first over SSH, then downloading it to my phone, until I realized my phone browser can consume the URL directly, so I set up a PHP frontend for `youtube-dl -G -f bestaudio {url}`

It's no longer online and I lost the code, but it was like one line of code.

I mention this because you-get seems to support the same usecase (via --url / -u), so I wanted to let people know how useful this is!

(While it was online I shared it on some forums and got very positive feedback, people used it for audiobooks etc.)

[0] Also playing with screen off saves 80% battery life! YouTube knows these facts and that's why they made background playback (which fetches only audio stream) a paid feature...

TechDebtDevin 5 hours ago

Brave Mobile browser allows turning on background video audio thus eliminating the need for YouTube Premium and similar subscriptions.
- cocok 35 minutes ago
  
  For Firefox:
  https://github.com/mozilla/video-bg-play
- l3x4ur1n 5 hours ago
  
  I don't know why your comment is downvoted because I use this feature of Brave very often and I also exclusively watch YT in Brave mobile (no ads).
  - gaudystead 4 hours ago
    
    For me, it was as easy as adding a shortcut to the YouTube homepage on Brave that it basically acts like the YouTube app, but with ad blocking built in. It's the only way I watch YT videos on mobile.
    
    icar 2 hours ago
    
    You might be interested in GrayJay app.
    
    Sabinus 18 minutes ago
    
    It's a really cool idea for an app. Pity that Google takes down the videos where Rossman talks about it.
  - TechDebtDevin 2 hours ago
    
    There are a lot of people that don't like Brave's business model. But I've never given Brave a dime and turn off their ad network stuff and they've saved me hundreds of dollars on Youtube Premium over the years.
cquintana92 an hour ago

One of my last weekend projects was something similar: convert youtube playlists into podcast-compstible URLs:
https://github.com/cquintana92/yt2pc
dredmorbius an hour ago

mpv similarly has this option. I listen to far more videos than I watch.
<https://mpv.io/>
6yyyyyy 5 hours ago

NewPipe can do this very nicely, it even lets you build a playlist of videos.
wutwutwat 4 hours ago

A service that takes arbitrary user input and then attempts to download/proxy whatever is at the end of that input. Brave soul.
ww520 5 hours ago

That’s the -F option to list all the formats, including the audio streams. Pick the audio format with -f to download the audio. I usually pick the .m4a format and then run it through ffmpeg to convert to mp3.
- KMnO4 5 hours ago
  
  What’s the point of converting it to mp3? AAC inside an m4a container usually has better sound quality than similarly compressed mp3, and definitely better than reencoding.
  - userbinator 2 hours ago
    
    MP3 is accepted by far more players.
- krick 5 hours ago
  
  That's really unnecessarily complicated workflow you have. It's achievable by yt-dlp with just 3 flags:
  --extract-audio
  --format bestaudio
  --audio-format mp3
  - knowitnone 2 hours ago
    
    you're unnecessarily making huge assumptions. Some people don't want the bestaudio or mp3
    
    krick 28 minutes ago
    
    If I would make any assumptions, I would post another 30 options from my config that are nice to have when you download audio from youtube. These 3 are exactly equivalent to what gp does.
- andai 3 hours ago
  
  Same but I converted to Opus, because I was trying to squeeze it into as little bandwidth as possible. It was mostly speech content and Opus auto detects and optimizes for speech at low nitrates.
  - lozf 18 minutes ago
    
    You can download the Opus directly with -f 249 / 250 / 251 (~48kbps / ~80kbps / ~128 kbps respectively, but youtube don't always make them all available, where are -f 140 for the ~128kbps AAC (.m4a) is always available, and often the format code 139 (~48kbps) - the lower bitrates being adequate for most speech based content.
khimaros 3 hours ago

on Android YTDLnis solves this very nicely. simply share the video URL to the app and it can download whichever format you like https://github.com/deniscerri/ytdlnis
Synaesthesia 5 hours ago

BTW if you browse YouTube with Firefox browser on Android you can play back YouTube videos with the screen locked using background player fix extension.
01HNNWZ0MV43FF 6 hours ago

I think it's -x to just rip audio now

fnoobnar 2 hours ago

I’m not sure I understand why Bandcamp is on the list of supported sites: they allow you to just download the files on the condition you first pay the artist for them.

The fact you can download it with this tool is because the artist is letting you listen to it for free before buying it. Downloading it with this tool seems totally unnecessary and a bit of a jerk move. Bandcamp hosts mostly small and independent artists and labels.

lovethevoid 2 hours ago

Their list of supported sites isn't a declaration of where you should use this tool for moralistic reasons. It's just a list of popular sites it works on.
doublepg23 9 minutes ago

At what income level do pirates consider it immoral/moral to pirate something?
khaki54 2 hours ago

I presume you could subscribe and still use this tool? People use automation tools like this to download things that they already pay for because it saves them the effort of logging into 5 different apps depending on which walled garden it's in.
- hluska 2 hours ago
  
  Do artists get paid on Bandcamp if they bypass the login?

politelemon 9 hours ago

It seems they do not want you to report an issue without an accompanying fix for it.

> If you would like to report a problem you find when using you-get, please open a Pull Request, which should include [snip]

Can't say I've encountered this before.

wccrawford 9 hours ago
As the other commenter said, they want a failing test, not a fix.
```
    A detailed description of the encountered problem;
    At least one commit, addressing the problem through some unit test(s).
        Examples of good commits: #2675, #2680, #2685
```
"Addressing" is probably a bad word to use here. "Demonstrating" would have been better, IMO.
- tylerchilds 9 hours ago
  
  the most expensive piece of writing software is scoping work.
  i’m almost tempted to add a test suite just to give people more agency over my output because right now i’m only soliciting feedback in person to cut down on internet bullshit, like what happened to xz-utils
omoikane 6 hours ago
The Chinese version of the text has an extra header line that translates to "to prevent abuse via GitHub Issues, we are not accepting general issues". An earlier commit has this for the English text:
```
   `you-get` is currently experimenting with an aggressive approach to handling issues. Namely, a bug report must be addressed with some code via a pull request.
```
https://github.com/soimort/you-get/commit/75b44b83826b3c2d9a...
Maybe they got too much spam.
By the way, `tests/test.py` seems to just run the extractors against various websites directly. I can't find where it's mocking out network requests and replies. Maybe this is to simplify the process for people creating pull requests?
- godelski 6 hours ago
  
  I can get this, but I aggressively report accounts and issues. I'm not sure how GitHub handles them but they seem to not come back.
  Though what I'm unsure how to deal with is legitimate users being idiotic. For example, recently one issue was opened that asked where the source code was. Not only was there a directory named "src" but there were some links in the readme to specific parts. While I do appreciate GitHub and places like hugging face [0], there are a lot of very aggressive and demanding noobs.
  I'd like ways to handle them better.... I'm tired of people yelling at me because 5 year old research code no longer works out of the box or because you've never touched code before.
  [0] check any hugging face issue and you'll see far more spam. Same accounts will open multiple issues that just barate owners and hugging face makes it difficult to report these accounts.
  - throwaway314155 6 hours ago
    
    The solution is to ignore them and close their issue. Open source maintainers have enough to worry about and are unpaid, it's okay to be a little dictatorial when it comes to "bad questions".
    
    godelski 18 minutes ago
    
    That's not a solution.
    It addresses the specific issue but does nothing to prevent future similar issues. A solution to a cold is not handing someone a tissue.
    I like that these platforms are open to everyone but at the same time there are a lot of people who have no business participating. Being able to filter those people out is unfortunately a necessary tool to not get overloaded.
    Worse, I find that due to this many open source maintainers and up being quick to close issues and say rtfm. I can't tell you how many times I've had this happen where in my opening issue I quote the fm and even include a reproducible test. It's also common to just close and say "not our problem".
kylecazar 9 hours ago

They want you to just submit a PR with a test that, if passed, would indicate the problem for you is fixed.
- sigseg1v 9 hours ago
  
  I kind of like this. It's a more formal proof of concept. You prove the bug exists by writing a failing test. If they cannot construct a failing test then it's either too hard to mock or reproduce (and therefore maybe not even worth fixing, for a free tool), or it's impossible because it's not a bug. Frees up maintainer time from dealing with reports that aren't bugs.
  - latexr 7 hours ago
    
    > If they cannot construct a failing test then it's either too hard to mock or reproduce (…), or it's impossible because it's not a bug.
    Or, you know, the user is not a developer. Or is unfamiliar with Python, or their test suite, or git, or…
    It is perfectly possible to be good at reporting bugs but be incapable of submitting pull requests.
    
    newaccount74 an hour ago
    
    The problem with popular tools is that they have more bugs that can be fixed. So bug reports are pretty much worthless: You know that there are 1000 bugs out there, but you only have resources to fix 10 of them.
    By asking users to provide reproducible test cases, you can massively reduce the amount of work you have to do. Of course that means 90% of bugs will never be reported. But since you don't have the resources to fix them anyway, why not just focus on the bugs that can be reproduced and come with a test case...
- thangngoc89 9 hours ago
  
  What happens if you don’t know Python? Python is a relatively easy language to learn but no way I’m gonna learn Python just to report an issue
  - epcoa 8 hours ago
    
    Did you (or anyone) in this thread look to see exactly what they are looking for with their provided examples?
    https://github.com/soimort/you-get/pull/2680/commits/313b8d2...
    You do not need to know Python deeply to construct what they are expecting. They’re not actually looking for a unit test or something.
    
    latexr 7 hours ago
    
    > Did you (or anyone) in this thread look to see exactly what they are looking for with their provided examples?
    I did. And I looked at all examples of “good commits”, not just the trivial ones.
    https://github.com/soimort/you-get/pull/2685/files
    That’s already complex for someone unfamiliar with the software (which might nonetheless be able to open a competent bug report).
  - dartos 8 hours ago
    
    Then you don’t get to contribute bug reports.
    Perfectly fine rule for a maintainer to have.
  - Filligree 9 hours ago
    
    Good chance you wouldn't be writing good bug reports either, then. Github issues have enough noise that a first-pass filter like this feels like a good idea, even if it has some false positives.
    
    latexr 6 hours ago
    
    This in no way aligns with reality. I frequently interact with users who can’t code at all but make good bug reports. One of the best ways to ensure success is to have a form (GitHub allows creating those) which describe exactly what is necessary and guide people in the right direction.
    What you're saying is even worse, since you’re implying someone could be an expert computer programmer or power user, but because they’re unfamiliar with the specific language this project chose, they are incapable of making good bug reports. That makes no sense.
    
    papichulo2023 8 hours ago
    
    I fail to see the logic in your comment. Just another case of Goodhart's law.
    
    achierius 8 hours ago
    
    This isn't really a metric though. It's a formal existence proof that the bug exists. The key difference IMO is that you have to create a test which A) looks (to the maintainer) like it should pass, while simultaneously B) not passing. It's much harder to game.
    There are other cases where Goodharts Law fails as well: consider quant firms, where the "metric" used to judge a trader is basically how much money you pull in. Seems to be working fine for them
    
    dartos 7 hours ago
    
    If you can’t describe your bug in a test, then you probably can’t describe it sufficiently in English either.
    Seems to make sense
  - dotancohen 8 hours ago
    
    If the bug is egregious enough, somebody else will find it. If the bug is important enough to you but esoteric, then ask on a forum or enlist the help of someone you know who does know Python.
    How do you currently submit bug reports on e.g. MS Word or Adobe Photoshop? This way is certainly more open than those commonly-deployed software.
  - js8 8 hours ago
    
    The same thing that happens if the author of the OSS you use doesn't know English.
  - nunez 7 hours ago
    
    That's exactly it. They put up a gate that blocks low-effort issues that only add busywork. I like it!
onionisafruit 9 hours ago

Interesting. I like the idea of encouraging people to try creating a test or even a whole fix, but saying that’s all you will accept is a bit much. On the other hand, I’m not doing the work to maintain you-get. I don’t know what they deal with. This may be an effective way to filter a flood of repetitive issues from people who don’t know how to run a command line program.
- probably_wrong 8 hours ago
  
  I believe there are two extremes. On one end you get a bunch of repetitive non-issues, while on the other end you only get issues about (say) bugs in FreeBSD 13.3 because only hard-core users have the skills and patience to follow THE PROCESS.
  I know how to make an isolated virtual environment, install the package, make a fork, create a test and make a PR. But I don't know whether I care enough about a random project to actually do it.
thih9 9 hours ago

It’s relatively easy to write a failing test and it massively cuts down the work related to moderating issues. Also, reduces the danger of github issues turning into a support forum.
If this results in the project being easier to maintain and being maintained longer, then I’m fine with this.
- seneca 9 hours ago
  
  > It’s relatively easy to write a failing test and it massively cuts down the work related to moderating issues.
  Relative to what? Learning someone else's code base well enough to write a useful test is not trivial.
  It's not a bad method, but the vast majority of users won't be capable of writing a test that encapsulates their issue.
  - chucksmash 8 hours ago
    
    In the case of this tool, adding a failing test case looks trivial if you've got the URL of a page it fails on.
    Provided the maintainer is willing to provide some minimal guidance to issue reporters who lack the necessary know-how, it even seems like a clever back door way of helping people learn to contribute to open source.
zufallsheld 8 hours ago

Serverspec does the same: https://github.com/mizzy/serverspec?tab=readme-ov-file#maint...

jdthedisciple 2 hours ago

Anybody else getting this error constantly?

    you-get: [error] oops, something went wrong.
    you-get: don't panic, c'est la vie. please try the following steps:
    you-get:   (1) Rule out any network problem.
    you-get:   (2) Make sure you-get is up-to-date.
    you-get:   (3) Check if the issue is already known, on
    you-get:         https://github.com/soimort/you-get/wiki/Known-Bugs
    you-get:         https://github.com/soimort/you-get/issues
    you-get:   (4) Run the command with '--debug' option,
    you-get:       and report this issue with the full output.

Tried with debug flag but didn't really help

    pattern = str(pattern, 'latin1')
              ^^^^^^^^^^^^^^^^^^^^^^
    TypeError: decoding to str: need a bytes-like object, NoneType found

I was curious to see if it can bypass age restriction (though I tried on non-age-restricted video too with the same error).

wanderingmind 35 minutes ago

Nice work. But as a consumer, Why should I use you-get over yt-dlp? What are its strengths over yt-dlp, which works quiet well on a huge range of websites[1]

[1] https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites....

xg15 7 hours ago

I wouldn't exactly call a ytdl-style media downloader with a whole library of site-specific extractors and converters "dumb" but still cool that more projects like ytdl exist.

krick 5 hours ago

Given the title and the first few sentences from a description I assumed that it's some heuristic-based tool to try and grab whatever there is on the page, which would be useful if there's no tool which implemented the support for this site (which in most cases just means "yt-dlp doesn't support it"). But apparently it's also extractor-based with a separate extractor for each somewhat-popular source. So, basically it's just less sophisticated clone of yt-dlp?

KTibow 9 hours ago

Can someone explain why this is better than yt-dlp

grugagag 7 hours ago

How did you infer better than yt-dlp? I think the more the better when it comes to this space as google fights back.
- xg15 5 hours ago
  
  But some information what the differences to ytdlp are and what the reasons for starting an entirely new project were, would still be helpful.
  (Also, a multitude of tools isn't really all that helpful if they all stop working in the same instant because they all relied on the same APIs etc)
uniqueuid 9 hours ago

That's an interesting question. They only depend on a single library, but I wonder how much code is really their own. I found it curious, for example, that there is a dedicated mp4 joiner (I mean, if you already have ffmpeg, there is probably no way you can do it better yourself).
https://github.com/soimort/you-get/blob/develop/src/you_get/...
billsunshine 9 hours ago

[flagged]
- Etheryte 9 hours ago
  
  Please don't litter HN with LLM generated slop, this is actively reducing the quality of discussion. No one wants a future HN where people just spam LLM responses at one another.

MattDaEskimo 5 hours ago

Another library released which lies about what it is to circumvent anti-bot security.

Let's just not act surprised when tighter attestation comes in effect.

troupo an hour ago

I used to "save" interesting links by emailing them to myself.
Now most of them are dead, twitter accounts removed, youtube videos deleted, facebook pages bought by media management companies, sites rebuilt etc.
Whatever the primary goal if this tool, it, and other similar tools, are invaluable in actually saving and preserving content
ajsnigrutin 5 hours ago

This library/program solves problems that people have with pages like youtube... too many ads, no way to download videos for offline use (or archive for when they get removed), and better performance with a native player.
If I was forced to watch all the ads on youtube, i wouldn't watch videos there at all.
therein 5 hours ago

A future in which YouTube will refuse to stream you data because you didn't pass client attestation is definitely coming and I wish we could stop it.
It is a dark future where some of us will accept it, and rest of us will be constantly taking part in a cat-mouse chase in which we glitch out attestation tokens from vulnerable devices to get by.
- userbinator 2 hours ago
  
  We need laws against user-agent discrimination.

vanjajaja1 8 hours ago

> Search on Google Videos and download > $ you-get "Richard Stallman eats"

I don't often read instruction manuals, but this time I did and I found this gross easter egg

natch 2 hours ago

Is this just a fork of yet-dlp with credits rewritten?

tcsenpai 7 hours ago

I like this. I am imagining a companion extension for chrome/ff that uses you-get as a backend to implement it in a seamless way. Forward thinking idea: imagine going on youtube and have you-get extension bypass the youtube player and playing the content directly without ads. When I say youtube I might also say any other platform.

mikojan 7 hours ago

Sounds like FastStream Video Player
https://addons.mozilla.org/en-US/firefox/addon/faststream/?u...

dotancohen 8 hours ago

Can it back up a text webpage? Can it remove popups for newsletters, or subscription, or logins, or cookies' notifications? Can it read pages that require signing in?

demberto 7 hours ago

this different from JDownloader2?