geekswg

毕少侠

毕少侠也在江湖
github

Pick on your Markdown

Many people can write Markdown, but few can write it well. Is there a tool that can act as a "secretary," checking the Markdown syntax and style in documents, suggesting solutions, automatically fixing issues, and even automatically adding the "Pangu spacing" between Chinese and English? The Markdown syntax checker introduced in this article can do just that.

Introduction#

Many people can write Markdown, but few can write it well. This is partly due to issues within the Markdown ecosystem itself: syntax variations and implementations are diverse, incompatible, and even contradictory.

On the other hand, few are willing to spend time carefully reading the technical specifications of Markdown; most people only read one or two "quick-start" guides and self-approve their skills, neglecting some details; when encountering issues in writing, they often rely on imagination and intuition for judgment.

As a result, a large number of Markdown files with erratic syntax and chaotic formatting have emerged, confusing both readers and formatting software.

Since JavaScript has ESLint and Python has PyLint, does Markdown also have markdownlint? The answer is yes!

Example#

The source code of this blog has introduced the markdownlint specification, and you can download the source code to see the configuration.

{{< link href="http://github.com/Lruihao/hugo-blog" content="Lruihao/hugo-blog" card=true >}}

Introducing markdownlint#

markdownlint is a Markdown syntax checking tool that can check for syntax errors in Markdown files and some non-standard writing styles, making Markdown clean and hygienic.

markdownlint has two versions: the original version based on Ruby by Mark Harrison and the port based on Node.js by David Anson. The Node.js version has gained popularity and activity, and this article will use the Node.js version as an example.

markdownlint can be used in various scenarios, including:

The main purpose of this article is to introduce the use of markdownlint-cli2, as it can be integrated into projects for easier team collaboration.

History of markdownlint cli#

According to David's blog1, around 2015, Igor Shubovych discussed the idea of developing a CLI tool with him. At that time, David was not ready, so Igor independently developed the markdownlint-cli CLI tool.

After two years of development, more and more people began using markdownlint-cli, prompting David to start contributing code to the markdownlint-cli project, adding new features, and becoming a major maintainer over the next three years. By 2020, David felt it was difficult to change certain things in other people's projects (possibly involving backward compatibility issues), so he re-established a project called markdownlint-cli2, improving upon markdownlint-cli to achieve faster execution speed, more flexible configuration, and fewer dependencies.

Currently, both tools continue to update alongside markdownlint. If you are already using the old markdownlint-cli in a project, you can continue to use it to avoid unknown issues. For newly introduced projects, consider using the more powerful markdownlint-cli2.

Installing markdownlint-cli2#

npm install markdownlint-cli2 --save-dev

Configure shortcut commands:

{
  "scripts": {
    "lint:md": "markdownlint-cli2 \"content/**/*.md\"",
    "fix:md": "npm run lint:md -- --fix"
  }
}

Install the markdownlint-rule-search-replace plugin2:

npm install markdownlint-rule-search-replace --save-dev

Create a .markdownlint.jsonc file in the project root directory to configure rules:

// This file defines our configuration for Markdownlint. See
// https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md
// for more details on each rule.
{
  "default": true,
  "ul-style": {
    "style": "dash"
  },
  "ul-indent": {
    "indent": 2
  },
  "no-hard-tabs": {
    "spaces_per_tab": 2
  },
  "line-length": false,
  "no-duplicate-header": {
    "allow_different_nesting": true
  },
  "single-title": {
    "front_matter_title": "^\\s*title\\s*[:=]"
  },
  "no-trailing-punctuation": {
    "punctuation": ".,;:"
  },
  // Consecutive Notes/Callouts currently don't conform with this rule
  "no-blanks-blockquote": false,
  // Force ordered numbering to catch accidental list ending from indenting
  "ol-prefix": {
    "style": "ordered"
  },
  "no-inline-html": {
    "allowed_elements": [
      "br",
      "code",
      "details",
      "div",
      "img",
      "kbd",
      "p",
      "pre",
      "sub",
      "summary",
      "sup",
      "table",
      "tbody",
      "td",
      "tfoot",
      "th",
      "thead",
      "tr",
      "ul",
      "ol",
      "var",
      "ruby",
      "rp",
      "rt",
      "i"
    ]
  },
  "no-bare-urls": false,
  // Produces too many false positives
  "fenced-code-language": false,
  "code-block-style": {
    "style": "fenced"
  },
  "no-space-in-code": false,
  "emphasis-style": {
    "style": "underscore"
  },
  "strong-style": {
    "style": "asterisk"
  },
  // https://github.com/OnkarRuikar/markdownlint-rule-search-replace
  "search-replace": {
    "rules": [
      {
        "name": "nbsp",
        "message": "Don't use no-break spaces",
        "searchPattern": "/ /g",
        "replace": " ",
        "searchScope": "all"
      },
      {
        // zh-cn/zh-tw prefers double em-dash instead
        "name": "em-dash",
        "message": "Don't use '--'. Use em-dash (—) instead",
        "search": " -- ",
        "replace": " — ",
        "searchScope": "text"
      },
      {
        "name": "trailing-spaces",
        "message": "Avoid trailing spaces",
        "searchPattern": "/  +$/gm",
        "replace": "",
        "searchScope": "all"
      },
      {
        "name": "double-spaces",
        "message": "Avoid double spaces",
        "searchPattern": "/([^\\s>])  ([^\\s|])/g",
        "replace": "$1 $2",
        "searchScope": "text"
      },
      {
        "name": "stuck-definition",
        "message": "Character is stuck to definition description marker",
        "searchPattern": "/- :(\\w)/g",
        "replace": "- : $1",
        "searchScope": "text"
      },
      {
        "name": "localhost-links",
        "message": "Don't use localhost for links",
        "searchPattern": "/\\]\\(https?:\\/\\/localhost:\\d+\\//g",
        "replace": "](/",
        "searchScope": "text"
      },
      // zh-cn prefers rules
      {
        "name": "double-em-dash",
        "message": "Don't use '--'. Use double em-dash (——) instead",
        "search": " -- ",
        "replace": "——",
        "searchScope": "text"
      },
      {
        "name": "force-pronoun",
        "message": "Consider using '你' instead of '您'",
        "searchPattern": "/您/g",
        "searchScope": "text"
      }
    ]
  }
}

Then create a .markdownlint-cli2.jsonc file in the project root directory to configure rules:

{
  "config": {
    "extends": "./.markdownlint.jsonc"
  },
  "customRules": ["markdownlint-rule-search-replace"],
  "ignores": [
    "node_modules",
    ".git",
    ".github",
    "**/conflicting/**",
    "**/orphaned/**"
  ]
}

Installing lint-staged#

npm install lint-staged --save-dev

Configure .lintstagedrc.json:

{
  "content/**/*.md": "markdownlint-cli2 --fix"
}

Installing husky#

npx husky-init && npm install

Configure .husky/pre-commit

#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"

npx lint-staged

This way, every time code is committed, it will automatically check and fix syntax errors in all Markdown files in the content directory.

Introducing AutoCorrect#

Pangu Spacing#

In many Chinese communities, it is a common unwritten style requirement to manually add spaces between Chinese and English, known as "Pangu spacing." Whether this requirement is reasonable and how to meet it is a valuable topic, but it is beyond the scope of this article3.

Here, I will simply summarize: adding space between Chinese and English is to achieve visual separation, making it more aesthetically pleasing and readable. Ideally, this "gap" should be automatically added by the typesetting engine, with a width of 1/4 of a full-width space (em). However, due to the complexity and variability of digital typesetting environments, we cannot expect typesetting engines to have this capability most of the time (including the most common web environments). Therefore, we can only resort to manually inserting a half-width space (as its width is usually close to 1/4 em) to achieve a similar effect.

If you want to manually add spaces between Chinese and English, are there any automatic checking and completion methods?

The answer is certainly yes, and there is more than one option.

pangu.js#

One of the most famous options is the pangu.js project. If you have used a browser plugin called "Why can't you just add a space?," then you have used pangu.js — it is developed by the same author and is based on pangu.js. The Hugo FixIt theme also includes pangu.js to automatically optimize the mixed Chinese and English content in blog articles.

AutoCorrect#

Another option is AutoCorrect. Compared to pangu.js, which mainly focuses on text content, AutoCorrect was born in the Chinese community of the Ruby language, so it has considered the mixed Chinese and English scenarios in programming code from the beginning (see the project's test files for reference), making it more versatile.

Comparison of pangu.js and AutoCorrect:

ProjectOnline VersionVSCode ExtensionCommand Line Tool
pangu.js
AutoCorrectAutoCorrect EditorAutoCorrect
  • pangu.js does not have an official VSCode plugin; the most commonly used is the third-party port Pangu-Markdown developed by xlthu.
  • The command line tool for pangu.js is limited to Node.js and needs to be installed via npm: npm i pangu.
  • The command line tool for AutoCorrect can be installed independently and also has more language versions such as Rust and Node.js.

I have used pangu.js in blogs, VSCode, and browser plugins for a long time and have encountered many issues. Its convenience also brings "brutality," and the handling rules are uncontrollable, which has always troubled me. Therefore, this article attempts to use AutoCorrect as a replacement for pangu.js. In fact, the results of AutoCorrect are indeed better.

Use AutoCorrect in NPM#

Install autocorrect-node:

npm install autocorrect-node --save-dev

Modify shortcut commands:

{
  "scripts": {
    "fix:md": "autocorrect content --fix && markdownlint-cli2 \"content/**/*.md\" --fix",
    "lint:md": "autocorrect content --lint && markdownlint-cli2 \"content/**/*.md\""
  }
}

Modify .lintstagedrc.json:

{
  "content/**/*.md": [
    "autocorrect --fix",
    "markdownlint-cli2 --fix"
  ]
}

Add a .autocorrectignore:

# AutoCorrect Link ignore rules.
# https://github.com/huacnlee/autocorrect
#
# Like `.gitignore`, this file tells AutoCorrect which files need to check, some need to ignore.
node_modules/
build/
public/
resources/

Execute npx autocorrect init to pull the default .autocorrectrc configuration, then add a rule:

textRules:
  # sorted by `LC_ALL=C sort` command
  一二三,四五六.七八九: 0

Conclusion#

This article mainly introduced two tools: markdownlint-cli2 and AutoCorrect. The former is used to check Markdown syntax and style, while the latter is used to automatically fill in the "Pangu spacing" between Chinese and English. Both tools can be integrated into projects for unified standards and team collaboration.

Footnotes#

  1. If one is good, two must be better [markdownlint-cli2 is a new kind of command-line interface for markdownlint]

  2. markdownlint-rule-search-replace is a custom markdownlint rule for searching and replacing patterns.

  3. For further interest, please read the Zhihu discussion "Should there be spaces between Chinese and English in mixed typesetting?," the W3C draft standard "Chinese Typesetting Requirements" §3.2.2, and listen to the podcast "Type Talk" Episode 14.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.