Skip to main content
Version: 3.x.x 🚧

@yozora/tokenizer-list

Npm VersionNpm DownloadNpm LicenseModule formats: cjs, esmNode.js VersionTested with JestCode Style: prettier

github flavor markdown spec

A list is a sequence of one or more list items of the same type. The list items may be separated by any number of blank lines.

Two list items are of the same type if they begin with a list marker of the same type. Two list markers are of the same type if

  • (a) they are bullet list markers using the same character (-, +, or *) or

  • (b) they are ordered list numbers with the same delimiter (either . or )).

A list is an ordered list if its constituent list items begin with ordered list markers, and a bullet list if its constituent list items begin with bullet list markers.

The start number of an ordered list is determined by the list number of its initial list item. The numbers of subsequent list items are disregarded.

A list is loose if any of its constituent list items are separated by blank lines, or if any of its constituent list items directly contain two block-level elements with a blank line between them. Otherwise a list is tight. (The difference in HTML output is that paragraphs in a loose list are wrapped in <p> tags, while paragraphs in a tight list are not.)


A list marker is a bullet list marker or an ordered list marker.

A bullet list marker is a -, +, or * character.

An ordered list markeris a sequence of 1βˆ’91-9 arabic digits (0-9), followed by either a . character or a ) character. (The reason for the length limit is that with 10 digits we start seeing integer overflows in some browsers.)

The following rules define list items:

  1. Basic case. If a sequence of lines LsL_s constitute a sequence of blocks BsB_s starting with a non-whitespace character, and MM is a list marker of width WW followed by 1β©½Nβ©½41 \leqslant N \leqslant 4 spaces, then the result of prepending MM and the following spaces to the first line of LsL_s, and indenting subsequent lines of LsL_s by W+NW + N spaces, is a list item with Bs as its contents. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.

Exceptions:

  1. When the first list item in a list interrupts a paragraphβ€”that is, when it starts on a line that would otherwise count as paragraph continuation textβ€”then

    • (a) the lines Ls must not begin with a blank line, and
    • (b) if the list item is ordered, the start number must be 11.
  2. If any line is a thematic break then that line is not a list item.

  3. Item starting with indented code. If a sequence of lines LsL_s constitute a sequence of blocks BsB_s starting with an indented code block, and MM is a list marker of width WW followed by one space, then the result of prepending MM and the following space to the first line of LsL_s, and indenting subsequent lines of LsL_s by W+1W + 1 spaces, is a list item with BsB_s as its contents. If a line is empty, then it need not be indented. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.

  4. Item starting with a blank line. If a sequence of lines LsL_s starting with a single blank line constitute a (possibly empty) sequence of blocks BsB_s, not separated from each other by more than one blank line, and MM is a list marker of width WW, then the result of prepending MM to the first line of LsL_s, and indenting subsequent lines of LsL_s by W+1W + 1 spaces, is a list item with BsB_s as its contents. If a line is empty, then it need not be indented. The type of the list item (bullet or ordered) is determined by the type of its list marker. If the list item is ordered, then it is also assigned a start number, based on the ordered list marker.

  5. Indentation. If a sequence of lines LsL_s constitutes a list item according to rule #1, #2, or #3, then the result of indenting each line of LsL_s by 1βˆ’31-3 spaces (the same for each line) also constitutes a list item with the same contents and attributes. If a line is empty, then it need not be indented.

  6. Laziness. If a string of lines LsL_s constitute a list item with contents BsB_s, then the result of deleting some or all of the indentation from one or more lines in which the next non-whitespace character after the indentation is paragraph continuation text is a list item with the same contents and attributes. The unindented lines are called lazy continuation lines.

Install​

npm install --save @yozora/tokenizer-list

Usage​

tip

@yozora/tokenizer-list has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm, so you can use YozoraParser / GfmExParser / GfmParser directly.

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()

// parse source markdown content
parser.parse(`
- a
- b
- c
- d
- e
- f
- g

---

- [ ] This is a TODO item.
- [-] This is a processing TODO item.
- [x] This is a finished TODO item.

---

1. This is an ordered list item

a. This is an another type of ordered list item
`)

Options​

NameTypeRequiredDefault
namestringfalse"@yozora/tokenizer-list"
prioritynumberfalseTokenizerPriority.CONTAINING_BLOCK
emptyItemCouldNotInterruptedTypesstring[]false[ParagraphType, PhrasingContentType]
enableTaskListItembooleanfalsefalse
  • name: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase.

  • priority: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in the match-block stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.

  • emptyItemCouldNotInterruptedTypes: Specify an array of Node types that could not be interrupted by this Tokenizer if the current list-item is empty.

    See https://github.github.com/gfm/#example-263.

  • enableTaskListItem: Should enable task list item (extension).

Types​

@yozora/tokenizer-list produce List and ListItem type nodes. See @yozora/ast for full base types.

import type { ListItem, Parent } from '@yozora/ast'

export const ListType = 'list'
export type ListType = typeof ListType

/**
* List represents a list of items.
* @see https://github.com/syntax-tree/mdast#list
* @see https://github.github.com/gfm/#list
*/
export interface List extends Parent<ListType> {
/**
* Whether it is an ordered lit.
*/
ordered: boolean
/**
* The starting number of a ordered list-item.
*/
start?: number
/**
* Marker of a unordered list-item, or delimiter of an ordered list-item.
*/
marker: number
/**
* Whether if the list is loose.
* @see https://github.github.com/gfm/#loose
*/
spread: boolean
/**
* Lists are container block.
*/
children: ListItem[]
}

Live Examples​

  • Basic.

    Β Β 
    Β Β 
  • Item Starting with indented code.

    Β Β 
    Β Β 
  • Item Starting with a blank line.

    Β Β 
    Β Β 
  • Indentation.

    Β Β 
    Β Β 
  • Lazyniess.

    Β Β 
    Β Β 
  • Indentation in sublist.

    Β Β 
    Β Β 
  • Task list item (extension).

    Β Β 
    Β Β 
  • In order to solve of unwanted lists in paragraphs with hard-wrapped numerals, we allow only lists starting with 11 to interrupt paragraphs.

    Β Β 
    Β Β 
  • There can be any number of blank lines between items.

    Β Β 
    Β Β 
  • To separate consecutive lists of the same type, or to separate a list from an indented code block that would otherwise be parsed as a subparagraph of the final list item, you can insert a blank HTML comment.

    Β Β 
    Β Β 
  • List items need not be indented to the same level. The following list items will be treated as items at the same list level, since none is indented enough to belong to the previous list item.

    Β Β 
    Β Β 
  • Note, however, that list items may not be indented more than three spaces. Here - e is treated as a paragraph continuation line, because it is indented more than three spaces.

    #292
    Β Β 
    Β Β 
  • And here, 3. c is treated as in indented code block, because it is indented four spaces and preceded by a blank line.

    #293
    Β Β 
    Β Β 
  • This is a loose list, because there is a blank line between two of the list items.

    #294
    Β Β 
    Β Β 
  • So is this, with a empty second item.

    #295
    Β Β 
    Β Β 
  • These are loose lists, even though there is no space between the items, because one of the items directly contains two block-level elements with a blank line between them.

    Β Β 
    Β Β 
  • This is a tight list, because the blank lines are in a code block.

    #298
    Β Β 
    Β Β 
  • This is a tight list, because the blank line is between two paragraphs of a sublist. So the sublist is loose while the outer list is tight.

    #299
    Β Β 
    Β Β 
  • This is a tight list, because the blank line is inside the block quote.

    #300
    Β Β 
    Β Β 
  • This list is tight, because the consecutive block elements are not separated by blank lines.

    #301
    Β Β 
    Β Β 
  • A single-paragraph list is tight.

    Β Β 
    Β Β 
  • This list is loose, because of the blank line between the two block elements in the list item.

    #304
    Β Β 
    Β Β 
  • Here the outer list is loose, the inner list tight.

    Β Β 
    Β Β