TITLE: Content Grabber

 

AUTHOR: Benjamin Mullin

 

ABSTRACT:

This program will automate grabbing information from multiple pages that donít have RSS feeds. ††It will allow the user to pick the content that will go on their personal page.

 

WHAT:

This application will work as a web page.From this web page, you will send a webpage URL that you want to extract content from. The webpage content, both text and images, will be displayed in a broken down into items.The items will be displayed along side checkboxes.†† Each check box will be content that will be extracted from the page.The content grabbed will be displayed on one page.

 

WHY:

††††††††† Every morning, there are a handful of daily bulletin sites that I read.Specifically, I grab content from my.yahoo.com, www.sfexaminer.com, www.surfpulse.com, www.blakestah.com. I want to have most of the information on one page.

 

HOW:††

††††††††† In order to extract the content into basics, just text and images, I will make an XSLT sheet that will attempt to filter out all markup tags.Then there will be a form that will use servlet pages to pick the content the user wants.Then a DTD will be created for each page to have content grabbed from.There will be ways to change your personal page.I may expand from the Wahoo! project.

 

QUESTIONS:

††††††††† Do you think this will be challenging enough?What advice do you have?