5117 shaares
1 résultat
taggé
extractor
Goose was originally an article extractor written in Java that has most recently (aug2011) been converted to a scala project. This is a complete rewrite in python. The aim of the software is to take any news article or article-type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.