aboutsummaryrefslogtreecommitdiff
path: root/compiler/lexer_test.go (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Use new parser and DFA compilerRyo Nihei2021-12-101-514/+0
|
* Add fragment expressionRyo Nihei2021-05-251-0/+28
| | | | A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`).
* Add character property expression (Meet RL1.2 of UTS #18 partially)Ryo Nihei2021-04-301-1/+45
| | | | | | | | | | \p{property name=property value} matches a character has the property. When the property name is General_Category, it can be omitted. That is, \p{Letter} equals \p{General_Category=Letter}. Currently, only General_Category is supported. This feature meets RL1.2 of UTS #18 partially. RL1.2 Properties: https://unicode.org/reports/tr18/#RL1.2
* Add code point expression (Meet RL1.1 of UTS #18)Ryo Nihei2021-04-241-6/+174
| | | | | | | | \u{hex string} matches a character has the code point represented by the hex string. For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits. This feature meets RL1.1 of UTS #18. RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1
* Change the lexical specs of regexp and define concrete syntax error valuesRyo Nihei2021-04-171-29/+58
| | | | | * Make the lexer treat ']' as an ordinary character in default mode * Define values of the syntax error type that represents error information concretely
* Fix grammar the parser acceptsRyo Nihei2021-04-111-1/+72
| | | | | * Add cases test the parse method. * Fix the parser to pass the cases.
* Add logical inverse expressionRyo Nihei2021-04-011-2/+30
| | | | [^a-z] matches any character that is not in the range a-z.
* Add range expressionRyo Nihei2021-02-241-3/+8
| | | | [a-z] matches any one character from a to z. The order of the characters depends on Unicode code points.
* Add + and ? operatorsRyo Nihei2021-02-201-3/+9
| | | | | * a+ matches 'a' one or more times. This is equivalent to aa*. * a? matches 'a' zero or one time.
* Add bracket expression matching specified characterRyo Nihei2021-02-141-2/+30
| | | | The bracket expression matches any single character specified in it. In the bracket expression, the special characters like ., *, and so on are also handled as normal characters.
* Add dot symbol matching any single characterRyo Nihei2021-02-141-2/+4
| | | | | | | | | The dot symbol matches any single character. When the dot symbol appears, the parser generates an AST matching all of the well-formed UTF-8 byte sequences. Refelences: * https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G7404 * Table 3-6. UTF-8 Bit Distribution * Table 3-7. Well-Formed UTF-8 Byte Sequences
* Add compilerRyo Nihei2021-02-141-0/+105
The compiler takes a lexical specification expressed by regular expressions and generates a DFA accepting the tokens. Operators that you can use in the regular expressions are concatenation, alternation, repeat, and grouping.