‹› markdown.htmlparser

This module imports a copy of html.parser.HTMLParser and modifies it heavily through monkey-patches. A copy is imported rather than the module being directly imported as this ensures that the user can import and use the unmodified library for their own needs.

‹› markdown.htmlparser.HTMLExtractor(md: Markdown, *args, **kwargs)

Bases: HTMLParser

Extract raw HTML from text.

The raw HTML is stored in the htmlStash of the Markdown instance passed to md and the remaining text is stored in cleandoc as a list of strings.

‹› markdown.htmlparser.HTMLExtractor.reset()

Reset this instance. Loses all unprocessed data.

‹› markdown.htmlparser.HTMLExtractor.close()

Handle any buffered data.

‹› markdown.htmlparser.HTMLExtractor.line_offset: int property

Returns char index in self.rawdata for the start of the current line.

‹› markdown.htmlparser.HTMLExtractor.at_line_start() -> bool

Returns True if current position is at start of line.

Allows for up to three blank spaces at start of line.

‹› markdown.htmlparser.HTMLExtractor.get_endtag_text(tag: str) -> str

Returns the text of the end tag.

If it fails to extract the actual text from the raw data, it builds a closing tag with tag.

‹› markdown.htmlparser.HTMLExtractor.handle_empty_tag(data: str, is_block: bool)

Handle empty tags (<data>).

‹› markdown.htmlparser.HTMLExtractor.get_starttag_text() -> str

Return full source of start tag: <...>.