aboutsummaryrefslogtreecommitdiff
path: root/driver/lexer_test.go (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Use golangci-lintRyo Nihei2021-12-151-2/+18
|
* Use new parser and DFA compilerRyo Nihei2021-12-101-2/+5
|
* Make character properties available in an inverse expression (Make ↵Ryo Nihei2021-11-251-0/+16
| | | | [^\p{...}] available)
* Remove the ModeName and KindName fields from the driver.Token structRyo Nihei2021-10-031-213/+205
|
* Disallow upper cases in an identifierRyo Nihei2021-09-241-84/+84
|
* Add name field to the lexical specificationRyo Nihei2021-09-181-0/+27
|
* Generate constant values representing mode IDs, mode names, kind IDs, and ↵Ryo Nihei2021-09-181-77/+77
| | | | kind names
* Define a lexical specification interfaceRyo Nihei2021-09-111-208/+215
|
* Add lexeme positions to tokensRyo Nihei2021-08-071-2/+139
| | | | close #1
* Change APIsRyo Nihei2021-08-011-9/+9
| | | | | | | | | | | | | | | | | | | | | | Change fields of tokens, results of lexical analysis, as follows: - Rename: mode -> mode_id - Rename: kind_id -> mode_kind_id - Add: kind_id The kind ID is unique across all modes, but the mode kind ID is unique only within a mode. Change fields of a transition table as follows: - Rename: initial_mode -> initial_mode_id - Rename: modes -> mode_names - Rename: kinds -> kind_names - Rename: specs[].kinds -> specs[].kind_names - Rename: specs[].dfa.initial_state -> specs[].dfa.initial_state_id Change public types defined in the spec package as follows: - Rename: LexModeNum -> LexModeID - Rename: LexKind -> LexKindName - Add: LexKindID - Add: StateID
* Add unique kind IDs to tokensRyo Nihei2021-08-011-173/+174
|
* Add spec.EscapePattern functionRyo Nihei2021-07-221-0/+28
|
* Support passive mode transitionRyo Nihei2021-06-101-6/+106
|
* Add fragment expressionRyo Nihei2021-05-251-0/+52
| | | | A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`).
* Rename fields of driver.TokenRyo Nihei2021-05-131-1/+1
|
* Add --compression-level option to compile commandRyo Nihei2021-05-111-19/+21
| | | | --compression-level specifies a compression level. The default value is 2.
* Fix a text representation of an error tokenRyo Nihei2021-05-111-3/+3
| | | | This commit fixes a bug that caused the second and subsequent characters of the text representation of an error token to be missing.
* Remove Peek* functionsRyo Nihei2021-05-051-60/+0
|
* Add lex modeRyo Nihei2021-05-041-170/+251
| | | | | | | | | | lex mode is a feature that separates transition tables per each mode. The lexer starts from an initial state indicated by `initial_state` field and transitions between modes according to `push` and `pop` fields. The initial state will always be `default`. Currently, maleeni doesn't provide the ability to change the initial state. You can specify the modes of each lex entry using `modes` field. When the mode isn't indicated explicitly, the entries have `default` mode.
* Add code point expression (Meet RL1.1 of UTS #18)Ryo Nihei2021-04-241-2/+35
| | | | | | | | \u{hex string} matches a character has the code point represented by the hex string. For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits. This feature meets RL1.1 of UTS #18. RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1
* Add validation of lexical specs and improve error messagesRyo Nihei2021-04-171-32/+31
|
* Change the lexical specs of regexp and define concrete syntax error valuesRyo Nihei2021-04-171-18/+21
| | | | | * Make the lexer treat ']' as an ordinary character in default mode * Define values of the syntax error type that represents error information concretely
* Print the result of the lex command in JSON formatRyo Nihei2021-04-061-114/+114
| | | | | * Print the result of the lex command in JSON format. * Print the EOF token.
* Add logical inverse expressionRyo Nihei2021-04-011-3/+20
| | | | [^a-z] matches any character that is not in the range a-z.
* Add range expressionRyo Nihei2021-02-241-1/+260
| | | | [a-z] matches any one character from a to z. The order of the characters depends on Unicode code points.
* Add + and ? operatorsRyo Nihei2021-02-201-5/+35
| | | | | * a+ matches 'a' one or more times. This is equivalent to aa*. * a? matches 'a' zero or one time.
* Add types of lexical specificationsRyo Nihei2021-02-161-65/+62
| | | | APIs of compiler and driver packages use these types. Because CompiledLexSpec struct a lexer takes has kind names of lexical specification entries, the lexer sets them to tokens.
* Add bracket expression matching specified characterRyo Nihei2021-02-141-0/+18
| | | | The bracket expression matches any single character specified in it. In the bracket expression, the special characters like ., *, and so on are also handled as normal characters.
* Add dot symbol matching any single characterRyo Nihei2021-02-141-1/+43
| | | | | | | | | The dot symbol matches any single character. When the dot symbol appears, the parser generates an AST matching all of the well-formed UTF-8 byte sequences. Refelences: * https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G7404 * Table 3-6. UTF-8 Bit Distribution * Table 3-7. Well-Formed UTF-8 Byte Sequences
* Add driverRyo Nihei2021-02-141-0/+147
The driver takes a DFA and an input text and generates a lexer. The lexer tokenizes the input text according to the lexical specification that the DFA expresses.