Array.splice: deleteCount not optional

JavaScript: The Definitive Guide

I recently ran across some curious JavaScript behaviour. Nothing new there, but I was slightly annoyed to see that my handy reference book hadn't given me any clue about the inconsistent way in which different browsers handle the Array.splice method. O'Reilly's JavaScript: The Definitve Guide (5th edition) is usually pretty good at pointing out any browser compatibility issues.

But after digging some more, I was even more confused to find that the book seemed to be wrong in its description of the method.

Yahoo! Pipes, Caching and robots.txt

Every time I create or use a pipe, I'm indirectly causing hits on some third party's website. So I was curious to learn how the Yahoo! Pipes backend behaves. What caching does it do, is it a good and well behaved web citizen in general?

There isn't much official documentation to go on. The Pipes Troubleshooting guide has some notes on how to stop Pipes from downloading a feed too frequently and how to stop Pipes from using feeds at all.

So, I put myself in the shoes of the third party website that the pipes hit upon to find out more.

Yahoo! Pipes Tutorial - An example using the Fetch Page module to make a web scraper

Yahoo! recently released a new Fetch Page module which dramatically increases the number of useful things that Pipes can do. With this new "pipe input" module we're no longer restricted to working with well-organised data sets in supported formats such as CSV, RSS, Atom, XML, JSON, iCal or KML. Now we can grab any HTML page we like and use the power of the Regex module to slice and dice the raw text into shape.

In a nutshell, the Fetch Page module turns Yahoo! Pipes into a fully fledged web scraping IDE!

Yahoo! Pipes is a web scraping IDE in a nutshell

As it happens, I already have a web scraping project which has been broken for some time now. I don't have the energy to check out the hacky old PHP scrapers and debug the problem. But with Yahoo! Pipes and the Fetch Page module to hand, I can throw away my PHP scripts and their associated libraries, delete the cron jobs and free my overloaded webserver from the onerous responsibility. Time to get cracking.