Some information that might be of use to someone, at discount prices.

April 6, 2005

Thumbnail Image Grabber

Category: — badsegue @ 9:25 am

background

Here is a .NET application that downloads thumbnail images from various online image search engines. I use it to collect fodder for photomosaics. It’s not the most sophisticated program, and there are programs out there that will crawl a website and grab all the images. I haven’t found one that quite does what I want it to do, which is why I put together a custom script to handle it. This project is an extension of that script. It basically does what the image grabbing Perl script does, but is easier to use for those uncomfortable with scripts.

features

Yahoo! and Google were supported initially. Version .51 adds support for MSN, Flickr, and PBase.

The application is multi-threaded. A separate thread is used for getting page results each search engine. The thumbnail URLs are parsed and queued. A separate pool of threads dequeues URLs and downloads the image into the specified directory, with each search being stored in it’s own subdirectory. Images are added to a list view so they can be previewed while the downloads in progress. The list view can be toggled to a detail view which shows larger versions of the graphics. You can delete downloaded images by selecting them and hitting the DELETE key.

Some planned features:

  • Additional sites (Xusenet, …)
  • Installer/Uninstaller
  • Display images with preserved dimensions
  • Configurable number of image threads
  • get grabbing

    All you need to do is specify the search term and click Fetch!.
    You can control the number of images grabbed by setting the Stop and Start values. The Start value would be useful when you’ve previously fetched say 100 images, then decide you want some more. You can set the Start to 101 and bypass the ones you’ve already retrieved.
    You can see the images as they are retrieved. You can also monitor the Log panel, which shows the actual HTML pages retrieved, and the images as they are processed.

    issues

    I don’t do much .NET work, so some things may not be done optimally (or at all!). This is something of a learning exercise and a work-in-progress.

    One thing I’ve noticed is a problem introduced by .NET 1.1 SP1. Yahoo! apparently returns an invalid HTTP response header, which the pre-SP1 1.1 framework accepted. I had to add a configuration item to get that working again.
    You need .NET 1.1 installed. It used to work with 1.0, but I needed the Directory Browser control that is part of 1.1.

    download

    Here it is. This is just a zip with the .exe and a configuration file. Just extract the archive to a folder and click on the .exe to run it. To uninstall just delete the folder–nothing is installed in the registry or start menu.

    You will need to have the .NET framework v. 1.1 installed. You probably have it already, but if not you can get it from Microsoft. The config file is used to get around a problem between Yahoo and .NET 1.1 SP1. Yahoo returns an invalid HTTP response and .NET doesn’t like it. If you don’t have SP1 installed and are having this type of problem then try deleting the config file and re-running the program.

    conclusion

    Once you have the pictures you want you can feed them to the photo mosaic program. I’ve tried a few, and ended up using AndreaMosaic. It’s free, is fast, and I’ve had good results with it. There are several others out there as well, just do a search for “photo mosaic” and you’ll find a bunch of options.

    • • •

    13 Comments

    1. Post your comments with any problems or suggestions you might have.

      Comment by badsegue — April 15, 2005 @ 12:08 am
    2. Here’s an interesting article on a photo mosaic algorithm.

      Comment by badsegue — April 19, 2005 @ 8:04 am
    3. I downloaded the program but it won’t open up. It says the
      program failed to initialize properly. Help

      Comment by sandi — May 1, 2005 @ 2:37 am
    4. Log:

      Unable to get data from http://images.search.yahoo.com/search/images : An exception occurred during a WebClient request.
      Unable to get data from http://images.google.com/images : An exception occurred during a WebClient request.

      Comment by Slof — May 1, 2005 @ 4:36 am
    5. sandi – Without knowing anything else about the problem, I think it might have to do with the .NET framework. Can you verify what version of the framework you have? You can check under Control Panel/Add Remove Programs. Look for Microsoft .NET Framework 1.1. If it is not there Microsoft recommends you use Windows Update to get it. http://windowsupdate.microsoft.com/

      Comment by badsegue — May 2, 2005 @ 6:21 am
    6. Slof – I’ve seen those errors when my connection was down. Can you tell me if you have .NET Framework 1.1 SP 1 installed? You can find it under Control Panel/Add Remove Programs. One thing to try is to rename the ImageGrabber.exe.config to ImageGrabber.exe.config.bak, and then restart the program. That file is there to work around an issue with SP 1.

      Comment by badsegue — May 2, 2005 @ 6:55 pm
    7. Ok, changing the .config file made it work. I only briefly skimmed over that download paragraph.

      Comment by Slof — May 11, 2005 @ 10:19 pm
    8. Thanks, after make all the things, the program works very well, Thanks for sharing

      Comment by Carlos — May 22, 2005 @ 12:53 am
    9. Nice program, works very well. Suggested site to add: http://www.pbase.com – I’d be happy with *just* PBase.

      Jazz

      Comment by JazzLad — June 7, 2005 @ 5:17 am
    10. can I use your program to get thumbnails from http://www.pbase.com.

      Comment by jan — June 29, 2005 @ 3:18 pm
    11. Love the PBase option – I noticed something interesting. If you go to http://images.google.com you can search for images by site by typing the site in the searchbox. Cool feature. When I searched the NASA site, it gave me 1.9 million hits, but when I told ImageGrabber to grab the same search, it only found 196 . . . did I do something wrong?

      Love the software, thanks!

      Jazz

      Comment by JazzLad — July 14, 2005 @ 6:42 am
    12. Jazz – Google lies! I did a search for “site:nasa.gov”, and it reported “1,420,000 for site:nasa.gov”.
      The grabber only was able to find 338 unique images. From the browser I went to the last page I
      could, and it maxed out around 500. On the last page there is a link to include some omitted
      results, and going down that path made a max of 1000 images available. The grabber is hitting
      that first limit of 500. I suspect that the number in the “un-omitted” initial search varies
      depending on the search terms and number of results, and that’s the number that the grabber is
      going to encounter. I could modify the grabber to get the un-omitted result set, but I doubt if
      it’s possible to get to the rest of the 1.42M. I generally don’t set the max to more than 500
      since the quality (and relevance) of results tends to be pretty poor at that point.

      Comment by badsegue — July 14, 2005 @ 7:34 am
    13. Actually, I would really like an option to search omitted results (if it’s not too much trouble :)), perhaps have 2 checkboxes for Google (I’d hate to upset anyone that prefers it the way it is now ;)).

      In the case of Nasa, I think all pictures will be relevant because I had it searching specifically that domain’s results & I doubt someone will use nasa.gov or jpl.nasa.gov in the filename of say, a bird :) I don’t know what algorithm Google uses to omit, but a lot of Nasa photos have similar filenames, sizes and even subject matter (like all the Mars photos).

      Thanks!
      Jazz

      Comment by JazzLad — July 15, 2005 @ 10:11 pm

    Comments RSS

    Sorry, the comment form is closed at this time.