@yozora/tokenizer-html-block
An HTML block is a group of lines that is treated as raw HTML (and will not be escaped in HTML output).
There are seven kinds of HTML block, which can be defined by their start and end conditions. The block begins with a line that meets a start condition (after up to three spaces optional indentation). It ends with the first subsequent line that meets a matching end condition, or the last line of the document, or the last line of the container block containing the current HTML block, if no line is encountered that meets the end condition. If the first line meets both the start condition and the end condition, the block will contain just that line.
- Start condition: line begins with the string
<script
,<pre
, or<style
(case-insensitive), followed by whitespace, the string>
, or the end of the line.
End condition: line contains an end tag </script>
, </pre>
, or </style>
(case-insensitive; it need not match the start tag).
- Start condition: line begins with the string
<!--
.
End condition: line contains the string -->
.
- Start condition: line begins with the string
<?
.
End condition: line contains the string ?>
.
- Start condition: line begins with the string
<!
followed by an uppercase ASCII letter.
End condition: line contains the character >
.
- Start condition: line begins with the string
<![CDATA[
.
End condition: line contains the string ]]>
.
- Start condition: line begins the string
<
or</
followed by one of the strings (case-insensitive)address
,article
,aside
,base
,basefont
,blockquote
,body
,caption
,center
,col
,colgroup
,dd
,details
,dialog
,dir
,div
,dl
,dt
,fieldset
,figcaption
,figure
,footer
,form
,frame
,frameset
,h1
,h2
,h3
,h4
,h5
,h6
,head
,header
,hr
,html
,iframe
,legend
,li
,link
,main
,menu
,menuitem
,nav
,noframes
,ol
,optgroup
,option
,p
,param
,section
,source
,summary
,table
,tbody
,td
,tfoot
,th
,thead
,title
,tr
,track
,ul
, followed by whitespace, the end of the line, the string>
, or the string/>
.
End condition: line is followed by a blank line.
- Start condition: line begins with a complete open tag
(with any [tag name]gfm-tag-name other than
script
,style
, orpre
) or a complete closing tag, followed only by whitespace or the end of the line.
End condition: line is followed by a blank line.
HTML blocks continue until they are closed by their appropriate end condition, or the last line of the document or other container block. This means any HTML within an HTML block that might otherwise be recognised as a start condition will be ignored by the parser and passed through as-is, without changing the parser’s state.
- See github flavor markdown spec for details.
- See Live Examples for an intuitive impression.
Install
- npm
- Yarn
- pnpm
npm install --save @yozora/tokenizer-html-block
yarn add @yozora/tokenizer-html-block
pnpm add @yozora/tokenizer-html-block
Usage
@yozora/tokenizer-html-block has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm,
so you can use YozoraParser
/ GfmExParser
/ GfmParser
directly.
- Basic Usage
- YozoraParser
- GfmParser
- GfmExParser
@yozora/tokenizer-html-block cannot be used alone, it needs to be registered in YastParser as a plugin-in before it can be used.
import { DefaultYastParser } from '@yozora/core-parser'
import ParagraphTokenizer from '@yozora/tokenizer-paragraph'
import TextTokenizer from '@yozora/tokenizer-text'
import HtmlBlockTokenizer from '@yozora/tokenizer-html-block'
const parser = new DefaultYastParser()
.useBlockFallbackTokenizer(new ParagraphTokenizer())
.useInlineFallbackTokenizer(new TextTokenizer())
.useTokenizer(new HtmlBlockTokenizer())
// parse source markdown content
parser.parse(`
<pre language="haskell"><code>
import Text.HTML.TagSoup
main :: IO ()
main = print $ parseTags tags
</code></pre>
okay
`)
import YozoraParser from '@yozora/parser'
const parser = new YozoraParser()
// parse source markdown content
parser.parse(`
<pre language="haskell"><code>
import Text.HTML.TagSoup
main :: IO ()
main = print $ parseTags tags
</code></pre>
okay
`)
import GfmParser from '@yozora/parser-gfm'
const parser = new GfmParser()
// parse source markdown content
parser.parse(`
<pre language="haskell"><code>
import Text.HTML.TagSoup
main :: IO ()
main = print $ parseTags tags
</code></pre>
okay
`)
import GfmExParser from '@yozora/parser-gfm-ex'
const parser = new GfmExParser()
// parse source markdown content
parser.parse(`
<pre language="haskell"><code>
import Text.HTML.TagSoup
main :: IO ()
main = print $ parseTags tags
</code></pre>
okay
`)
Options
Name | Type | Required | Default |
---|---|---|---|
name | string | false | "@yozora/tokenizer-html-block" |
priority | number | false | TokenizerPriority.ATOMIC |
-
name
: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase. -
priority
: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in thematch-block
stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.
Types
@yozora/tokenizer-html-block produce Html type nodes. See @yozora/ast for full base types.
import type { YastLiteral } from '@yozora/ast'
export const HtmlType = 'html'
export type HtmlType = typeof HtmlType
/**
* HTML (Literal) represents a fragment of raw HTML.
* @see https://github.com/syntax-tree/mdast#html
* @see https://github.github.com/gfm/#html-blocks
* @see https://github.github.com/gfm/#raw-html
*/
export type Html = YastLiteral<HtmlType>
Live Examples
-
(Condition 1)
-
Comment (Condition 2)
-
Processing instruction (Condition 3)
-
Declaration (Condition 4)
-
CDATA (Condition 5)
-
(Condition 6)
-
(Condition 7)