You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Aaron Swartz 8ddc844b03 Merge pull request #64 from adhiraj/import_testcase 10 years ago
test Forgot to remove ; after testing 10 years ago
.editorconfig .editorconfig added to maintain correct linebreaks (mostly for the test data). 11 years ago
.gitignore Markdown-safe characters will not be escaped in urls. Test case updated. 11 years ago
.travis.yml remove python3 from travis, make tests work on 2.5 11 years ago
COPYING add COPYING (fix #31) 11 years ago
MANIFEST.in Added a MANIFEST.in so that COPYING and README.md is included in source 11 years ago
README.md add travis to readme 11 years ago
html2text.py Merge pull request #62 from adhiraj/import_sentence 10 years ago
setup.py release 3.200.3 (with COPYING) 11 years ago

README.md

html2text

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text.py [(filename|url) [encoding]]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --ignore-links        don't include any formatting for links
  --ignore-images       don't include any formatting for images
  -g, --google-doc      convert an html-exported Google Document
  -d, --dash-unordered-list
                        use a dash rather than a star for unordered list items
  -b BODY_WIDTH, --body-width=BODY_WIDTH
                        number of characters per output line, 0 for no wrap
  -i LIST_INDENT, --google-list-indent=LIST_INDENT
                        number of pixels Google indents nested lists
  -s, --hide-strikethrough
                        hide strike-through text. only relevent when -g is
                        specified as well

Or you can use it from within Python:

import html2text
print html2text.html2text("<p>Hello, world.</p>")

Or with some configuration options:

import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to do a release

  1. Update the version in html2text.py
  2. Update the version in setup.py
  3. Run python setup.py sdist upload

How to run unit tests

cd test/
python run_tests.py

Build Status