How to easily crawl article content

A simple way to crawler an article from the internet is using the python library “from newspaper import Article”.

You add a link and then you get the content that you wish. Here you can find the whole documentation.

Code sample

from newspaper import Article

url = "https://fivethirtyeight.com/features/what-were-watching-in-the-nhls-playoff-races/"
article = Article(url)

# Download html
article.download()

# Get information 
article.parse()
article.nlp()

print("title",article.title, "\n")
print("publish_date",article.publish_date, "\n")
print("top_image",article.top_image, "\n")
print("summary",article.summary, "\n")
print("keywords",article.keywords, "\n")

Leave a Reply