July 08, 2004

wget to the rescue. Having trouble keeping up with all those mp3 blogs? This guy has figured out a way to scrape all the new audio files from your favourite sites.

If I knew anything about computers, I would totally do this.

  • Onanist, what kind of computer do you use? If it's a Mac, it's pretty easy to set up, and failing that, some quick work with Applescript could make it even easier.
  • No such luck, Sandspider it's a PC (Windows XP), but thanks for the advice.
  • If you want a quick run down of doing it on Windows, see this.
  • Maybe this goes without saying, but using the recursive option will suck down a large number of mp3s the first time you run it, if you start with a blog that links to a lot of other blogs.
  • You can also get wget with the Cygwin toolkit for Windows.
  • Thank you mecurius!
  • Am I The ONLY person out there who has a problem with this? For me and my compatriots, our only tenuous hold on legality rests in our writing. By scraping like this, you undermine what we do. I understand it's the internet, and people will do what they will do, but doesn't ANYONE out there have the shred of decency to understand that by embracing tricks like this you may be KILLING A GOOD THING?
  • I was thinking that this would be very bad for MP3 Bloggers, but mainly for the bandwidth issues. How does the writing make it more legal?
  • I'm with Psyko on this one. I like to imagine that my writing is (at least) a little bit why people come to my blog. If you just get the mp3s... Hell, why not just listen to the radio? JB: The writing often serves as some kind of advertisment for the music on offer, and offers a link to actually buy the music. Like Captain Psyko said, it is tenuous.
  • Thanks - it also lets you rip all the MP3's off a site without thinking about their bandwidth. I use altavista music search, but I never download directly from the results - I always look at the page they are from, and read what's there. I've actually learned a lot that way, especially from university music course webpages. (Though that means I am pirating an education - that's very wrong...) actually, my nickname is "jb" - I don't really mind it in caps (as many people render it), but it always feels like it's being shouted a bit. Unless you meant to shout. In which case I'm sorry for whatever I did. Sorry. Please don't hurt me.
  • The technologically advanced will always consume more than their "fair share" of technological resources. But they are only the first wave of what will soon be a trend. How long will it be before somebody writes a little app that takes a list of blogs and does the same thing? I'd give it less than a week. Life is nothing but problems and problem solving. This is the nature of progress and innovation. That's why the Bible says that "The Geek will inherit the earth." <hee hee> (monkeys roll eyes)
  • I like Blaise's weblog. (I've not really spent much time at Captain Psyko's - I will check it out)
  • Also, fair use law allows for the quotation of a creative work in the context of criticism. Therefore, by posting commentary for every track, I am engaging in criticism, and therefore, protected under fair use. At least, in the US. That's why the writing provides our tenuous hold on legality. That said, Bandwidth isn't much of a concern. My MP3's are hosted on a .mac account, which means that I have effectively unlimited banddwidth. But my traffic has DOUBLED today, (that USA Today piece'll do it...), and every person who reads my site is another step closer to the RIAA, and every person who steals via something like WGet, is another person encouraging me towards trying to figure out a way to stop that. Either by changing the structure of my links, setting the music to stream, or some other bizzarre scheme to keep robots off my page. So please monkeys, for the love of god, don't sscrape my page.
  • Captain, if you're serving whole MP3s as part of your criticism you are, legally speaking, fucked already. Quoting for criticism does not entitle you to serve a whole work, and doing so makes you as bad as anyone you're complaining about.
  • Either by changing the structure of my links, setting the music to stream, or some other bizzarre scheme to keep robots off my page. Putting everything out as zip files would keep them away, wouldn't it?
  • . Quoting for criticism does not entitle you to serve a whole work, and doing so makes you as bad as anyone you're complaining about. I'm serving up single tracks, and talking about albums. Technically speaking, I'm not quoting a whole work, merely a comprehensible portion thereof. While the legality is indeed very dubious, it hasn't been tested in court, and it's the only legal shield we MP3 bloggers have got. Beyond that, we rely on ethics and courtesy on the part of copyright holders, and our own sense of ethics and courtesy in taking down anything and everything that we are asked to. We don't have much else to stand on.
  • How is wget "stealing" when the file is in a http link on a page? Automating the process, yes. Doing it faster and more efficiently than one could "by hand", yes. But if using wget is stealing, then clicking the link is also, and putting it in the link in the first place must be called into question. Now, watch this drive.
  • dng: It would, until someone sets their wget scrape to files of type .zip. I'd be looking for a more substantial/lasting fix, i.e. making the links to an html page that autoredirects to the Mp3 or somesuch.
  • How is wget "stealing" when the file is in a http link on a page? because by making use of Wget, and by automating the process you are taking away any chance in hell that you're reading the text on the page, as a rule. Which to me, invalidates the entire MP3 Blog concept. Maybe I'm just being a luddite, I don't know. I'm p2p friendly as a rule, but the fact is that I find this wget stuff to be counter to the whole spirit of the mp3blog.
  • By the way, I'm sure others have seen the uses for this tool to scrape more (or other) than mp3s.
  • I'm p2p friendly as a rule, but the fact is that I find this wget stuff to be counter to the whole spirit of the mp3blog. Maybe it is. Sorta like speeding through the commercials (or the program itself) using your VCR is counter to the spirit of television. The consumer controls how (and how much) they consume. I'm sure that advertisers would like to think that people are paying attention and hanging on their every word, too. But all they can do is put the message out there for the percentage that will truly "consume" it. If you had a regular reader base before wget ever reached your blog, and you were happy with that (hopefully growing) number - then (bandwidth issues aside) how are the extra people dimming your bliss? Odds are that they had to read your site at one point (or, for that matter, were INTRODUCED to your site) thanks to the articles mentioning mp3 blogs and wget. It seems to me that the only bubble that wget has popped is your illusion regarding the ratio of "people reading your words to people downloading the mp3s you post". You were happy with that imaginary number before learning of a tool that made the reading OPTIONAL, and are now unhappy that the imaginary number has changed.
  • Mercurious, you're missing the point. It's not about the readership per se, but rather the veneer of tenuous legality that I lose when that ratio changes. Basically, it's the equivalent of me starting a P2p network for legal mp3s, and having it flooded by filetraders. They can do it, but damned if I'm not going to try to make it hard for them. This is MY ass on the line, not theirs.