Photomosaics and a Google Image Grabber

March 21, 2005

Photomosaics and a Google Image Grabber

Category: Photo,Software — badsegue @ 1:45 am

background

You’ve probably seen photomosaics before.

Mosaic (scaled way down)

Original image (full size)

These are images that are composed of other images. There are several free/cheap programs out there that can take a given picture and make a mosaic using a set of pictures of your choosing. The one I’ve used and had great results with is AndreaMosaic. It’s free and easy to use.

Here’s a sample mosaic I made. This is a scaled down version of the original, which is around 8MB. The original image is only 70×70, and the component images are thumbnail sized, around 100×100.

You don’t need high resolution images to make a mosaic, but the resulting image can have enough detail to produce poster sized prints. I’ve made 24×30 prints using nothing more than low resoultion source image and a bunch of thumbnails.

Once you figure out the basic approach and dimensions needed for the final images, you can be producing mosaics in a matter of minutes. The hardest part is coming up with enough feeder images to give the mosaic enough

approach

If you’ve got hundreds or thousands of images and you want to use those in the mosaic then you may not need to find any more feeder images. I like to use images related to the original’s subject matter, rather than just any image (although that can be interesting as well). So for the holiday dog picture I wanted holiday and dog pictures. The natural place to look was Google Images. You can search on anything and find any number of relevant images, and in thumbnail size from the results page. Since thumbnails are the perfect size for feeding into the mosiac there is no need to go to the host page and download the full-size version.

implementation

This Perl program takes a search term and a range, then fetches the matching images from Google Images. It saves them into a folder with the same name as the search term, in your current directory. The images are saved using the URL of the image, so if you re-run the search it won’t fetch an image it’s already stored.

Usage: get.pl

 [start range] [end range]
search term: This is the query string passed to Google Images.  It can be whatever you want, but if it is more than one word then you have to put the term in quotes.  You can use the Google query language, like "flower AND rose", "rose -wine", etc.

start range: The starting index to retrieve.  Google returns 20 images per page, so this will start retrieving the page that contains the start range image.

end range: The ending index to retrieve.  The program will stop once it retrieves the page that contains the end range image.

Use start-end to control how many images to fetch. Usually you will just do something like

get.pl "flower" 0 100

If you later wanted to get more images of that type you can do

get.pl "flower" 100 500

This will avoid the images you’ve already retrieved and save you some time.

Because you’re only downloading the thumbnails the program is usable even on dial-ups.

use HTML::Parser;
use HTTP::Request::Common;
use LWP;
use URI::Escape;

use strict;

$|=1;

my $client = LWP::UserAgent->new(agent=>'Mozilla', timeout=>'0', keep_alive=>1);
my $ua    = "Mozilla";
my $in    = "./";
my $query = shift; chomp($query);
my $start_idx = shift; chomp($start_idx);
my $end_idx = shift; chomp($start_idx);
my $url   = "http://images.google.com/images?q=$query+filetype:jpg\&safe=off";
my $start = $start_idx || "0";
my $stop = $end_idx || 0;
my $dest_dir = "$in/" . uri_escape ($query);

my $count = 1;

my $p = new HTML::Parser (
 api_version => 3,
 start_h     => [\&tag, "tagname, attr"],
);

print "Start = $start, Stop = $stop, Query = $query\n";
mkdir $in || die "Couldn't make $in ($!)\n";
mkdir $dest_dir || die "Couldn't make $dest_dir ($!)\n";


while (1) {
  my $test = $start;
 
  # Get the search results page
  my $request = HTTP::Request->new('GET', "${url}\&start=${start}");
  my $response = $client->request($request);
  
  $p->parse( $response->content );
  # See if we are out of images
 if ($test == $start || ($stop && ($start >= $stop))) {
  print "Done.\n";
  exit 0;
 }
}

sub tag {
  my ($tagname, $attr) = (@_);

  # Found the next page graphic, increment counter to continue grabbing
  if ($attr->{'src'} && ($attr->{'src'} eq "/nav_next.gif" )) {
        $start += 20;
  }

  return unless ($tagname eq 'img');
  return unless ($attr->{'src'} && $attr->{'src'} =~ /images\?q=tbn:.*\.jpg/i);
  my $filename = $attr->{'src'};
  $filename =~ s/\/images\?q=tbn:.*://;
  $filename = uri_escape($filename);

  if (-e "$dest_dir/$filename") {
    print "Skipping ";
  } else {
    my $request = HTTP::Request->new('GET', "http://images.google.com$attr->{'src'}");
    my $response = $client->request($request, "${dest_dir}/${filename}");
  }
  print "$filename (", $count++, ")\n";
}

badsegue.org • • • • •

March 21, 2005