You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
10 years ago | |
---|---|---|
test | 10 years ago | |
.editorconfig | 11 years ago | |
.gitignore | 11 years ago | |
.travis.yml | 11 years ago | |
COPYING | 11 years ago | |
MANIFEST.in | 11 years ago | |
README.md | 11 years ago | |
html2text.py | 10 years ago | |
setup.py | 11 years ago |
README.md
html2text
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: html2text.py [(filename|url) [encoding]]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--ignore-links don't include any formatting for links
--ignore-images don't include any formatting for images
-g, --google-doc convert an html-exported Google Document
-d, --dash-unordered-list
use a dash rather than a star for unordered list items
-b BODY_WIDTH, --body-width=BODY_WIDTH
number of characters per output line, 0 for no wrap
-i LIST_INDENT, --google-list-indent=LIST_INDENT
number of pixels Google indents nested lists
-s, --hide-strikethrough
hide strike-through text. only relevent when -g is
specified as well
Or you can use it from within Python:
import html2text
print html2text.html2text("<p>Hello, world.</p>")
Or with some configuration options:
import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
Originally written by Aaron Swartz. This code is distributed under the GPLv3.
How to do a release
- Update the version in
html2text.py
- Update the version in
setup.py
- Run
python setup.py sdist upload
How to run unit tests
cd test/
python run_tests.py