‹› `markdown.htmlparser` ¶

This module imports a copy of html.parser.HTMLParser and modifies it heavily through monkey-patches. A copy is imported rather than the module being directly imported as this ensures that the user can import and use the unmodified library for their own needs.

Classes:

HTMLExtractor –

Extract raw HTML from text.

‹› `markdown.htmlparser.HTMLExtractor(md: Markdown, *args, **kwargs)` ¶

Bases: HTMLParser

Extract raw HTML from text.

The raw HTML is stored in the htmlStash of the Markdown instance passed to md and the remaining text is stored in cleandoc as a list of strings.

Methods:

reset –

Reset this instance. Loses all unprocessed data.
close –

Handle any buffered data.
at_line_start –

Returns True if current position is at start of line.
get_endtag_text –

Returns the text of the end tag.
handle_empty_tag –

Handle empty tags (<data>).
get_starttag_text –

Return full source of start tag: <...>.

Attributes:

line_offset (int) –

Returns char index in self.rawdata for the start of the current line.

‹› `markdown.htmlparser.HTMLExtractor.reset()` ¶

Reset this instance. Loses all unprocessed data.

‹› `markdown.htmlparser.HTMLExtractor.close()` ¶

Handle any buffered data.

‹› `markdown.htmlparser.HTMLExtractor.line_offset: int` `property` ¶

Returns char index in self.rawdata for the start of the current line.

‹› `markdown.htmlparser.HTMLExtractor.at_line_start() -> bool` ¶

Returns True if current position is at start of line.

Allows for up to three blank spaces at start of line.

‹› `markdown.htmlparser.HTMLExtractor.get_endtag_text(tag: str) -> str` ¶

Returns the text of the end tag.

If it fails to extract the actual text from the raw data, it builds a closing tag with tag.

‹› `markdown.htmlparser.HTMLExtractor.handle_empty_tag(data: str, is_block: bool)` ¶

Handle empty tags (<data>).

‹› `markdown.htmlparser.HTMLExtractor.get_starttag_text() -> str` ¶

Return full source of start tag: <...>.

‹› `markdown.htmlparser` ¶

‹› `markdown.htmlparser.HTMLExtractor(md: Markdown, *args, **kwargs)` ¶

‹› `markdown.htmlparser.HTMLExtractor.reset()` ¶

‹› `markdown.htmlparser.HTMLExtractor.close()` ¶

‹› `markdown.htmlparser.HTMLExtractor.line_offset: int` `property` ¶

‹› `markdown.htmlparser.HTMLExtractor.at_line_start() -> bool` ¶

‹› `markdown.htmlparser.HTMLExtractor.get_endtag_text(tag: str) -> str` ¶

‹› `markdown.htmlparser.HTMLExtractor.handle_empty_tag(data: str, is_block: bool)` ¶

‹› `markdown.htmlparser.HTMLExtractor.get_starttag_text() -> str` ¶

Table Of Contents

Previous topic

Next topic

This Page

‹› markdown.htmlparser ¶

‹› markdown.htmlparser.HTMLExtractor(md: Markdown, *args, **kwargs) ¶

‹› markdown.htmlparser.HTMLExtractor.reset() ¶

‹› markdown.htmlparser.HTMLExtractor.close() ¶

‹› markdown.htmlparser.HTMLExtractor.line_offset: int property ¶

‹› markdown.htmlparser.HTMLExtractor.at_line_start() -> bool ¶

‹› markdown.htmlparser.HTMLExtractor.get_endtag_text(tag: str) -> str ¶

‹› markdown.htmlparser.HTMLExtractor.handle_empty_tag(data: str, is_block: bool) ¶

‹› markdown.htmlparser.HTMLExtractor.get_starttag_text() -> str ¶

‹› `markdown.htmlparser` ¶

‹› `markdown.htmlparser.HTMLExtractor(md: Markdown, *args, **kwargs)` ¶

‹› `markdown.htmlparser.HTMLExtractor.reset()` ¶

‹› `markdown.htmlparser.HTMLExtractor.close()` ¶

‹› `markdown.htmlparser.HTMLExtractor.line_offset: int` `property` ¶

‹› `markdown.htmlparser.HTMLExtractor.at_line_start() -> bool` ¶

‹› `markdown.htmlparser.HTMLExtractor.get_endtag_text(tag: str) -> str` ¶

‹› `markdown.htmlparser.HTMLExtractor.handle_empty_tag(data: str, is_block: bool)` ¶

‹› `markdown.htmlparser.HTMLExtractor.get_starttag_text() -> str` ¶