YQL: Using Web Content For Non-Programmers

Advertisement

Building a beautiful design is a great experience. Seeing the design break apart when people start putting in real content, though, is painful. That’s why testing it as soon as possible with real information to see how it fares is so important. To this end, Web services provide us with a lot of information with which to fill our products. In recent years, this has been a specialist’s job, but the sheer amount of information available and the number of systems to consume it makes it easier and easier to use Web services, even for people with not much development experience.

On Programmable Web, you can find (to date) 2580 different application programming interfaces (or APIs). An API allows you to get access to an information provider’s data in a raw format and reformat it to suit your needs.

The API listing site programmableweb.com lists 2580 different APIs to chose from.

The Trouble With APIs

The problem with APIs is that access to them varies in simplicity, from just having to load data from a URL all the way up to having to authenticate with the server and give all kinds of information about the application you want to build before getting your first chunk of information.

Each API is based on a different idea of what information you need to provide, what format it should be in, what data it will give back and in what format. All this makes using third-party APIs in your products very time-consuming, and the pain multiplies with each one you use. If you want to get photos from Flickr and updates from Twitter and then show the geographical information in Twitter on a map, then you have quite a trek ahead.

Simplifying API Access

Yahoo uses APIs for nearly all of its products. Instead of accessing a database and displaying the information live on the screen, the front end calls an API, which in turn gets the information from the back end, which talks to databases. This gives Yahoo the benefit of being able to scale to millions of users and being able to change either the front or back end without disrupting the other.

Because the APIs have been built over 10 years, they all vary in format and the way in which you access them. This cost Yahoo too much time, which is why it built Yahoo Pipes — to ease the process.

Yahoo Pipes
Large view

Pipes is amazing. It is a visual way to mix and match information from the Web. However, as people used Pipes more, they ran into limitations. Versioning pipes was hard; to change the functionality of the pipe just slightly, you had to go back to the system, and it tended to slow down with very complex and large conversions. This is why Yahoo offers a new system for people’s needs that change a lot or get very complex.

YQL is both a service and a language (Yahoo Query Language). It makes consuming Web services and APIs dead simple, both in terms of access and format.

Retrieving Data With YQL

The easiest way to access YQL is to use the YQL console. This tool allows you to preview your YQL work and play with the system without having to know any programming at all. The interface is made up of several components:

The YQL console
Large view

  1. The YQL statement section is where you write your YQL query.
    YQL has a very simple syntax, and we’ll get into its details a bit later on. Now is the time to try it out. Enter your query, define the output format (XML or JSON), check whether to have diagnostics reporting, and then hit the “Test” button to see the information. There is also a permalink; click it to make sure you don’t lose your work in case you accidentally hit the “Back” button.
  2. The results section shows you the information returned from the Web service.
    You can either read it in XML or JSON format or click the “Tree view” to navigate the data in an Explorer-like interface.
  3. The REST query section gives you the URL of your YQL query.
    You can copy and paste this URL at any time to use it in a browser or program. Getting information from different sources with YQL is actually this easy.
  4. The queries section gives you access to queries that you previously entered.
    You can define query aliases for yourself (much as you would bookmark websites), get a history of the latest queries (very useful in case you mess up) and get some sample queries to get started.
  5. The data tables section lists all the Web services you can access using YQL.
    Clicking the name of a table will in most cases open a demo query in the console. If you hover over the link, you’ll get two more links — desc and src — which give you information about the parameters that the Web service allows and which show the source of the data table itself. In most cases, all you need to do is click the name. You can also filter the data table list by typing what you’re looking for.

Using YQL Data

By far the easiest way to use YQL data is to select JSON as the output format and define a callback function. If you do that, you can then copy and paste the URL from the console and write a very simple JavaScript to display the information in HTML. Let’s give that a go.

As a very simple example, let’s get some photos from Flickr for the search term “cat”:

select * from flickr.photos.search where text="cat"

Type that into the YQL console, and hit the “Test” button. You will get the results in XML — a lot of information about the photos:

Getting photos of cats using YQL and Flickr
Large view

Instead of XML, choose JSON as the output format, and enter myflickr as the callback function name. You will get the same information as a JSON object inside a call to the function myflickr.

Getting photos of cats using YQL and Flickr
Large view

You can then copy the URL created in the “REST query” field:

getting photos of cats using YQL and Flickr
Large view

Write a JavaScript function called myflickr with a parameter data, and copy and paste the URL as the src of another script block:

<script>
  function myflickr(data){
    alert(data);
  }
</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20tex
t%3D%22cat%22&format=json&env=store%3A%2F%2Fdatatables.org
%2Falltableswithkeys&callback=myflickr"></script>

If you run this inside a browser, the URL you copied will retrieve the data from the YQL server and send it to the myflickr function as the data parameter. The data parameter is an object that contains all the returned information from YQL. To make sure you have received the right information, test whether the data.query.results property exists; then you can loop over the result set:

<script>function myflickr(data){
  if(data.query.results){
    var photos = data.query.results.photo;
    for(var i=0,j=photos.length;i<j;i++){
      alert(photos[i].title);
    }
  }
}</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20text%3D%22cat%22
&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
callback=myflickr"></script>

You can easily get the structure of the information and know what is loop-able by checking the tree view of the results field in the console:

The data structure in tree format

Right now, all this does is display the titles of the retrieved photos as alerts, which is nothing but annoying. To display the photos in the right format, we need a bit more — but no magic either:

<div id="flickr"></div>
<script>function myflickr(data){
  if(data.query.results){
    var out = '<ul>';
    var photos = data.query.results.photo;
    for(var i=0,j=photos.length;i<j;i++){
      out += '<li><img src="http://farm' + photos[i].farm +
             '.static.flickr.com/' + photos[i].server + '/' + photos[i].id +
             '_' + photos[i].secret + '_s.jpg" alt="' + photos[i].title + 
             '"></li>';
    }
    out += '</ul>';
  }
  document.getElementById('flickr').innerHTML = out;
}</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20text%3D%22cat%22&
format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
callback=myflickr"></script>

Photos of cats on flickr

Put this into action and you’ll get photos of cats, live from Flickr and without having to go through any painful authentication process.

The complexity of the resulting HTML for display differs from data set to data set, but in essence the main trick remains the same: define a callback function, write it, copy and paste the URL you created in the console, test that data has been returned, and then go nuts.

Using YQL To Reuse HTML Content

One other very powerful use of YQL is to access HTML content on the Web and filter it for reuse. This is usually called “scraping” and is a pretty painful process. YQL makes it easier because of two things: it cleans up the HTML retrieved from a website by running it through HTML Tidy, and it allows you to filter the result with XPATH. As an example, let’s retrieve the list of my upcoming conferences and display it.

Go to http://icant.co.uk/ to see my upcoming speaking engagements:

Christian heilmann's upcoming speaking engagements at icant.co.uk

You can then use Firebug in Firefox to inspect this section of the page. Simply open Firebug, click the box with the arrow icon next to the bug, and move the cursor around the page until the blue border is around the element you want to inspect:

Inspecting an element in firebug
Large view

Right-click the selection, and select “Copy XPath” from the menu:

Inspecting an element in firebug
Large view

Go to the YQL console, and type in the following:

select * from html where url="http://icant.co.uk" and xpath=''

Copy the XPath from Firebug into the query, and hit the “Test” button.

select * from html where url="http://icant.co.uk" and 
xpath='//*[@id="travels"]'

Retrieving HTML in YQL
Large view

As you can see, this gets the HTML of the section that we want inside some XML. The easiest way to reuse this in HTML is by requesting a format that YQL calls JSON-P-X. This will return a simple JSON object with the HTML as a string. To use this, do the following:

  1. Copy the URL from the REST field in the console.
  2. Add &format=xml&callback=travels to the end of the URL.
  3. Add this as the src to a script block, and write this terribly simple JavaScript function:
<div id="travels"></div>
<script>function travels(data){
  if(data.results){
    var travels = document.getElementById('travels');
    travels.innerHTML = data.results[0];
  }
}</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Ficant.co.uk%22%20
and%20xpath%3D'%2F%2F*%5B%40id%3D%22travels%22%5D'&
diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
format=xml&callback=travels"></script>

The result is an unordered list of my events on your website:

Scraping and displaying HTML with YQL

Debugging YQL Queries

Things will go wrong, and having no idea why is terribly frustrating. The good news with YQL is that you will get error messages that are actually human-readable. If something fails in the console, you will see a big box under the query telling you what the problem was:

YQL displaying an error message
Large view

Furthermore, you will see a diagnostics block in the data returned from YQL that tells you in detail what happened “under the hood.” If there are any problems accessing a certain service, it will show up there.

Diagnostics in data sets in YQL
Large view

YQL Syntax

The basic syntax of YQL is very easy:

select {what} from {source} where {conditions}

You can filter your results, cut the information down only to the bits you want, paginate the results and nest queries in others. For all the details of the syntax and its nuances, check the extensive YQL documentation.

YQL Examples

You can do quite amazing things with YQL. By nesting statements in parentheses and filtering the results, you can reach far and wide across the Web of data. Simply click the following examples to see the results as XML documents. Copy and paste them into the console to play with them.

This is just a taste of the power of YQL. Check out some of my presentations on the subject.

YQL’s Limits

YQL has a few (sensible) limits:

  • You can access the URL 10,000 times an hour; after that you will be blocked. It doesn’t matter in our case because the blocking occurs per user, and since we are using JavaScript, this affects our end users individually and not our website. If you use YQL on the back end, you should cache the results and also authenticate to the service via oAuth to be allowed more requests.
  • The language allows you to retrieve information; insert, update and delete from data sets; and limit the amount of data you get back. You can get paginated data (0 to 20, 20 to 40 and so on), and you can sort and find unique entries. What you can’t do in the YQL syntax is more complex queries, like “Get me all data sets in which the third character in the title attribute is x,” or something like that. You could, however, write a JavaScript that does this kind of transformation before YQL returns the data..
  • You can access all open data on the Web, but if a website chooses to block YQL using the robots.txt directive, you won’t be allowed to access it. The same applies to data sources that require authentication or are hosted behind a firewall.

There Is More To YQL

This article covers how to use YQL to access information. If you have an interesting data set and want it to become part of the YQL infrastructure, you can easily do that, too. We’ll cover that in the next article.

Documentation and Related Links

(al)(vf)

SQL, YQL,

↑ Back to top

An international Developer Evangelist working for Mozilla in the lovely town of London, England.

  1. 1

    Great article for this great resource for programmers… YQL is really a user-based language

    1
  2. 2

    Christian thanks for the article. I think both Yahoo Pipes and YQL are super useful technologies. I’m using Yahoo Pipes for years and recently I started to use YQL for me is specially interesting that you can parse any web content with JavaScript(JSONP) so we no longer need use PHP or other server side language just YQL + JavaScript. I build one experiment web app with this combination http://www.vcarrer.com/2010/11/hacker-news-mobile-front-page-reader.html. There is huge huge potential for building web apps with YQL technology.
    So we have super useful technology and I really hope for Yahoo permanent commitment to support this technology.

    4
  3. 3

    YQL is really fun to play around, there are some nice plugins for jQuery and Mootools as well:

    http://github.com/gabrielfalcao/jquery-yql/
    http://mootools.net/forge/p/request_yqml

    2
  4. 4

    I’m always playing with YQL and PIPES. Great article!

    -2
  5. 5

    That is awesome.

    I’d heard of YQL but never used it before. I’ve just tried it out on my website. There’s so many possibilities, the HTML scraping is especially handy.

    Thanks Christian!

    1
  6. 6

    Yahoo is closing down services that don’t make them richer.
    So beside YQL begin a really cool tech, while they do not offer YQL with paid options (eg above the limit range) I recommend everyone to use Google Appengine to do scrapping, proxy, tojson proxy, all these things that YQL is useful…

    3
  7. 7

    YQL is interesting to work with. Nice post giving detailed info about amazing YQL can do.

    1
  8. 8

    Thats awesome :)

    After Brazilian Yahoo Open Hack Day ,i’m using YQL to make some nice things in my work

    0
  9. 9

    I’ve been getting a 503 error on local.search for a few days now. cant get any info from the @yql account on twitter. anyone else know whats going on?

    You can see for yourself on their own example:
    http://developer.yahoo.com/yql/console/?q=select%20*%20from%20local.search%20where%20query%3D%22sushi%22%20and%20location%3D%22san%20francisco%2C%20ca%22

    4
  10. 10

    I started a thread on the YQL forum to request a fix. if this bug is effecting anyone else, please head over and leave a comment so they fix it faster.

    http://developer.yahoo.net/forum/?showtopic=7992

    0
  11. 11

    To fill in the JSON data, a simple templating engine is useful:
    Consider template to be the template string where the keys of the current Object called “obj” are represented as ${key}:

    template.replace(/${([^}]+)}/g, function(full, key) { return obj[key]||”; });

    this will return the template with all the ${key} items either replaced or removed if not present in obj.

    1
  12. 12

    While I love YQL and mess about with it alot in my personal projects I’m still unsure about using it in my professional work. As has been seen recently, Yahoo could shut the service down which would leave you in a bad place if all your APIs were derived from YQL.

    0
  13. 13

    What a great article Christian. This has explained so much to me in such a brief period of time. I have also checked out your site and will watch your vidz on YQL. I had not heard of you before reading this, but will now definitely follow you as a resource for learning about db access. In fact, I’m sure that I’ll refer back to this article numerous. You’ve made it easier for us front-end guys to wrap our heads around this. Thx again. You live in London right?

    0
  14. 14

    Great article! I wonder why i didnt stumble upon YQL before.

    0
  15. 15

    Please also check out the YQL blog at http://yqlblog.net

    1
  16. 16

    Thanks Christian,

    After X-mas I’ll experiment with YQL for sure “…and then go nuts.” :)

    Freek, Amsterdam

    0
  17. 17

    Great article as I would expect from Christian Heilmann. But why now? This stuff has been around for a while! I have seen Heilmann present this stuff on Fronteers. Dazzling presentation. I think YQL is a real clever tool.

    0
  18. 18

    Hey everyone!

    This is a great presentation of super service YQL written by amazing guy Christian Heilmann. I’ve seen it live on HTML5CSS3 event in Slovenia and liked it immediately.

    I’ve created a ‘playground’ site, which retrieves your location and publish informations about your country: http://galjot.si/yql/
    Later on I used YQL to fetch my social networking sites, http://galjot.si/

    I’ve noticed yesterday that flickr response was null (“results”: null).
    Any idea why is that? It’s the same with every query I tested with flickr.

    0
  19. 19

    Seduction Guide For Men

    July 4, 2012 1:27 am

    It is appropriate time to make a few plans for the future and it is time to be happy. I’ve read this put up and if I may I wish to suggest you some attention-grabbing issues or tips. Perhaps you can write subsequent articles referring to this article. I desire to read more issues approximately it!

    0
  20. 20

    Luís Fernando Guedes

    April 11, 2013 9:43 am

    Cool! Thank you very much!

    0
  21. 21

    Well, YQL is the infrastructure of Yahoo internally for all services. So there is a much bigger stake in it. Delicious always had the issue of competing with the doomed Yahoo Bookmarks.

    I’ve been talking to the YQL team for a long time about an installable version and there is a pilot running on appengine now (YQL is written in Java).

    I found AppEngine also to be unreliable for high traffic scraping. YQL’s datatables.org was on AE and fell over a few times (granted that was a year or so ago).

    So while your advice is sound, AE is not a replacement at all as it needs you to know coding (beautiful soup and others make it easy but not by a far cry as easy as mixing and matching in YQL).

    There is – sadly enough – no other service that offers the same range of access to the web of data than YQL does (ScraperWiki is another interesting one) and I preached for years to monetise it by offering B2B services.

    So please, tell Yahoo that YQL is awesome and that people _want_ to use it professionally – then things can happen.

    I am not in Yahoo any more, so I will do the same from the outside.

    9

Leave a Comment

Yay! You've decided to leave a comment. That's fantastic! Please keep in mind that comments are moderated and rel="nofollow" is in use. So, please do not use a spammy keyword or a domain as your name, or else it will be deleted. Let's have a personal and meaningful conversation instead. Thanks for dropping by!

↑ Back to top