@yozora/tokenizer-link
A link contains link text (the visible text), a link destination (the URI that is the link destination), and optionally a link title. There are two basic kinds of links in Markdown. In inline links the destination and title are given immediately after the link text. In reference links the destination and title are defined elsewhere in the document.
A link text consists of a sequence of zero or more inline
elements enclosed by square brackets ([
and ]
). The following rules apply:
-
Links may not contain other links, at any level of nesting. If multiple otherwise valid link definitions appear nested inside each other, the inner-most definition is used.
-
Brackets are allowed in the link text only if
a) they are backslash-escaped or
b) they appear as a matched pair of brackets, with an open bracket
[
, a sequence of zero or more inlines, and a close bracket]
. -
Backtick [code spans][gfm-inlnie-code], autolinks, and raw HTML tags bind more tightly than the brackets in link text. Thus, for example,
[foo`]`
could not be a link text, since the second]
is part of a code span. -
The brackets in link text bind more tightly than markers for emphasis and strong emphasis. Thus, for example,
*[foo*](url)
is a link.
A link destination consists of either
-
a sequence of zero or more characters between an opening
<
and a closing>
that contains no line breaks or unescaped<
or>
characters, or -
a nonempty sequence of characters that does not start with
<
, does not include ASCII space or control characters, and includes parentheses only ifa) they are backslash-escaped or
b) they are part of a balanced pair of unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.)
A link title consists of either
-
a sequence of zero or more characters between straight double-quote characters (
"
), including a"
character only if it is backslash-escaped, or -
a sequence of zero or more characters between straight single-quote characters (
'
), including a'
character only if it is backslash-escaped, or -
a sequence of zero or more characters between matching parentheses (
(...)
), including a(
or)
character only if it is backslash-escaped.
Although link titles may span multiple lines, they may not contain a blank line.
An inline link consists of a link text
followed immediately by a left parenthesis (
, optional whitespace,
an optional link destination, an optional
link title separated from the link destination by
whitespace, optional whitespace, and a
right parenthesis )
. The link’s text consists of the inlines contained in the
link text (excluding the enclosing square brackets). The link’s
URI consists of the link destination, excluding enclosing <...>
if present,
with backslash-escapes in effect as described above. The link’s title consists
of the link title, excluding its enclosing delimiters, with backslash-escapes
in effect as described above.
- See github flavor markdown spec for details.
- See Live Examples for an intuitive impression.
Install
- npm
- Yarn
- pnpm
npm install --save @yozora/tokenizer-link
yarn add @yozora/tokenizer-link
pnpm add @yozora/tokenizer-link
Usage
@yozora/tokenizer-link has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm,
so you can use YozoraParser
/ GfmExParser
/ GfmParser
directly.
- Basic Usage
- YozoraParser
- GfmParser
- GfmExParser
@yozora/tokenizer-link cannot be used alone, it needs to be registered in Parser as a plugin-in before it can be used.
import { DefaultParser } from '@yozora/core-parser'
import ParagraphTokenizer from '@yozora/tokenizer-paragraph'
import TextTokenizer from '@yozora/tokenizer-text'
import LinkTokenizer from '@yozora/tokenizer-link'
const parser = new DefaultParser()
.useFallbackTokenizer(new ParagraphTokenizer())
.useFallbackTokenizer(new TextTokenizer())
.useTokenizer(new LinkTokenizer())
// parse source markdown content
parser.parse(`
[link](/uri "title")
[link](/uri)
`)
import YozoraParser from '@yozora/parser'
const parser = new YozoraParser()
// parse source markdown content
parser.parse(`
[link](/uri "title")
[link](/uri)
`)
import GfmParser from '@yozora/parser-gfm'
const parser = new GfmParser()
// parse source markdown content
parser.parse(`
[link](/uri "title")
[link](/uri)
`)
import GfmExParser from '@yozora/parser-gfm-ex'
const parser = new GfmExParser()
// parse source markdown content
parser.parse(`
[link](/uri "title")
[link](/uri)
`)
Options
Name | Type | Required | Default |
---|---|---|---|
name | string | false | "@yozora/tokenizer-link" |
priority | number | false | TokenizerPriority.LINKS |
-
name
: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase. -
priority
: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in thematch-block
stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.Exception: Delimiters of type
full
are always processed before other type delimiters.
Types
@yozora/tokenizer-link produce Link type nodes. See @yozora/ast for full base types.
import type { YatParent, Resource } from '@yozora/ast'
export const LinkType = 'link'
export type LinkType = typeof LinkType
/**
* Link represents a hyperlink.
* @see https://github.com/syntax-tree/mdast#link
* @see https://github.github.com/gfm/#inline-link
*/
export interface Link extends Parent<LinkType>, Resource {}
Live Examples
-
Basic.
-
The title may be omitted.
-
Both the title and the destination may be omitted.
-
The destination can only contain spaces if it is enclosed in pointy brackets.
-
The destination cannot contain line breaks, even if enclosed in pointy brackets.
-
The destination can contain
)
if it is enclosed in pointy brackets. -
Pointy brackets that enclose links must be unescaped.
-
These are not links, because the opening pointy bracket is not matched properly.
-
Parentheses inside the link destination may be escaped.
-
Any number of parentheses are allowed without escaping, as long as they are balanced.
-
However, if you have unbalanced parentheses, you need to escape or use the
<...>
form. -
Parentheses and other symbols can also be escaped, as usual in Markdown.
-
A link can contain fragment identifiers and queries.
-
Note that a backslash before a non-escapable character is just a backslash.
-
Note that, because titles can often be parsed as destinations, if you try to omit the destination and keep the title, you’ll get unexpected results.
-
Titles may be in single quotes, double quotes, or parentheses.
-
Backslash escapes and entity and numeric character references may be used in titles.
-
Titles must be separated from the link using a whitespace. Other Unicode whitespace like non-breaking space doesn’t work.
-
Nested balanced quotes are not allowed without escaping.
-
But it is easy to work around this by using a different quote type.
-
[Whitespace][gfm-whitepace] is allowed around the destination and title.
-
But it is not allowed between the link text and the following parenthesis.
-
The link text may contain balanced brackets, but not unbalanced ones, unless they are escaped
-
The link text may contain inline content.
-
However, links may not contain other links, at any level of nesting.
-
These cases illustrate the precedence of link text grouping over emphasis grouping.
-
Note that brackets that aren’t part of links do not take precedence.
-
These cases illustrate the precedence of HTML tags, code spans, and autolinks over link grouping.