D★Mark

Denis Defreyne <denis@stoneship.org>

CAUTION: D★Mark is experimental — use at your own risk!

_D★Mark_ is a language for marking up prose. It facilitates writing semantically meaningful text, without limiting itself to the semantics provided by HTML or Markdown.

Here’s an example of D★Mark:

source

h2. Patterns

para. Patterns are used to find items and layouts based on their identifier. They come in three varieties:

list.

item. glob patterns
item. regular expression patterns
item. legacy patterns

para. A glob pattern that matches every item is %pattern{/*/}. A glob pattern that matches every item/layout with the extension %filename{md} is %glob{/*/.md}.


Samples

The `samples/` directory contains some sample D★Mark files. They can be processed by invoking the appropriate script with the same filename. For example:

.… % bundle exec ruby samples/trivial.rb <p>I’m a trivial example!</p> .…

Structure of a D★Mark document

_D★Mark_ knows two constructs:

Block-level elements

Every non-blank line of a D★Mark document corresponds to a block. A block can be a paragraph, a list, a header, a source code listing, or more. They start with the name of the element, a period, a space character, followed by the content. For example:

+

source

para. Patterns are used to find items and layouts based on their identifier. They come in three varieties.


Inline elements

Inside a block, text can be marked up using inline elements, which start with a percentage sign, the name of the element, and the content within braces. For example, `%emph{crazy}` is an `emph` element with the content `crazy`.

Block-level elements can be nested. To do so, indent the nested block two spaces deeper than the enclosing block. For example, the following defines a `list` element with three `item` elements inside it:

source

list.

item. glob patterns
item. regular expression patterns
item. legacy patterns

Block-level elements can also include plain text. In this case, the content is not wrapped inside a nested block-level element. This is particularly useful for source code listing. For example:

source

listing.

identifier = Nanoc::Identifier.new('/about.md')

identifier.without_ext
# => "/about"

identifier.ext
# => "md"

Block-level elements and inline elements are identical in the tree representation of D★Mark. This means that any inline element can be rewritten as a block-level element.

NOTE: To do: Elaborate on the distinction and similarity of block-level and inline elements.

NOTE: To do: Describe escaping rules.

Attributes

Both block and inline elements can also have attributes. Attributes are enclosed in square brackets after the element name, as a comma-separated list of key-value pairs separated by an equal sign. The value part, along with the equal sign, can be omitted, in which case the value will be equal to the key name.

For example:

  • `%code{Nanoc::VERSION}` is an inline `code` element with the `lang` attribute set to `ruby`.

  • `%only{Refer to the release notes for details.}` is an inline `only` element with the `web` attribute set to `web`.

  • `h2. All about donkeys` is a block-level `h2` element with the `id` attribute set to `donkey`.

  • `p. This is a paragraph that only readers of the book will see.` is a block-level `para` element with the `print` attribute set to `print`.

NOTE: The behavior of keys with missing values might change to default to booleans rather than to the key name.

Goals

Be extensible

D★Mark defines only the syntax of the markup language, and doesn’t bother with semantics. It does not prescribe which element names are valid in the context of a vocabulary, because it does not come with a vocabulary.

Be simple

Simplicity implies being easy to write and easy to parse. D★Mark eschews ambiguity and aims to have a short formal syntactical definition. This also means that it is easy to syntax highlight.

Be compact

Introduce as little extra syntax as possible.

Comparison with other languages

D★Mark takes inspiration from a variety of other languages.

HTML

HTML is syntactically unambiguous, but comparatively more verbose than other languages. It also prescribes only a small set of elements, which makes it awkward to use for prose that requires more thorough markup. It is possible use `span` or `div` elements with custom classes, but this approach turns an already verbose language into something even more verbose.

+

source,html

<p>A glob pattern that matches every item is <span class=“pattern attr-kind-glob”>/*/</span>.</p>


+

source,d-mark

para. A glob pattern that matches every item is %pattern{/*/}.


XML

Similar to HTML, with the major difference that XML does not prescribe a set of elements.

+

source,xml

<para>A glob pattern that matches every item is <pattern kind=“glob”>/*/</pattern>.</para>


+

source,d-mark

para. A glob pattern that matches every item is %pattern{/*/}.


Markdown

Markdown has a compact syntax, but is complex and ambiguous, as evidenced by the many different mutually incompatible implementations. It prescribes a small set of elements (smaller even than HTML). It supports embedding raw HTML, which in theory makes it possible to combine the best of both worlds, but in practice leads to markup that is harder to read than either Markdown or HTML separately, and occasionally trips up the parser and syntax highlighter.

+

source

A glob pattern that matches every item is <span class=“glob attr-kind-glob”>/*/</span>.


+

source,d-mark

para. A glob pattern that matches every item is %pattern{/*/}.


AsciiDoc

AsciiDoc, along with its AsciiDoctor variant, are syntactically unambiguous, but complex languages. They prescribe a comparatively large set of elements which translates well to DocBook and HTML. They do not support custom markup or embedding raw HTML, which makes them harder t use for prose that requires more complex markup.

+ _(No example, as this example cannot be represented with AsciiDoc.)_

TeX, LaTeX

TeX is a turing-complete programming language, as opposed to a markup language, intended for typesetting. This makes it impractical for using it as the source for converting it to other formats. Its syntax is simple and compact, and served as an inspiration for D★Mark.

+

source,latex

A glob pattern that matches every item is pattern[glob]{/*/}.


+

source,d-mark

para. A glob pattern that matches every item is %pattern{/*/}.


JSON, YAML

JSON and YAML are data interchange formats rather than markup languages, and thus are not well-suited for marking up prose.

+

source,json

[

"A glob pattern that matches every item is ",
["pattern", {"kind": "glob"}, ["/**/*"]],
"."

]


+

source,d-mark

para. A glob pattern that matches every item is %pattern{/*/}.


Specification

NOTE: To do: write this section.

Programmatic usage

Handling a D★Mark file consists of two stages: parsing and translating.

The parsing stage converts text into a list of nodes. Construct a parser with the tokens as input, and call `#run` to get the list of nodes.

source,ruby

content = File.read(ARGV) nodes = DMark::Parser.new(content).run


The translating stage is not the responsibility of D★Mark. A translator is part of the domain of the source text, and D★Mark only deals with syntax rather than semantics. A translator will run over the tree and convert it into something else (usually another string). To do so, handle each node type (`DMark::ElementNode` or `String`). For example, the following translator will convert the tree into something that resembles XML:

source,ruby

class MyXMLLikeTranslator < DMark::Translator

def handle(node)
  case node
  when String
    out << node
  when DMark::Parser::ElementNode
    out << "<#{node.name}>"
    handle_children(node)
    out << "</#{node.name}>"
  end
end

end

result = MyXMLLikeTranslator.new(nodes).run puts result