BeautifulSoup get_text method can be used for stripping html tags and getting page contents.
html_content.py file is like:
# -*- coding: utf-8 -*-
import sys
import os
from bs4 import BeautifulSoup
import requests
if sys.stdout.encoding is None:
os.putenv("PYTHONIOENCODING", 'UTF-8')
os.execv(sys.executable, ['python']+sys.argv)
url = sys.argv[1]
page_content = requests.get(url)
text = BeautifulSoup(page_content.text).get_text()
print text
This python code can be run with command line argument like:
# python html_content.py http://kadirsert.blogspot.com
Comments
Post a Comment