aboutsummaryrefslogtreecommitdiff
path: root/driver (follow)
Commit message (Expand)AuthorAgeFilesLines
* Use golangci-lintRyo Nihei2021-12-151-2/+18
* Use new parser and DFA compilerRyo Nihei2021-12-101-2/+5
* Make character properties available in an inverse expression (Make [^\p{...}]...Ryo Nihei2021-11-251-0/+16
* Update godocRyo Nihei2021-10-052-8/+16
* Remove the ModeName and KindName fields from the driver.Token structRyo Nihei2021-10-033-259/+272
* Format the source code of a lexer maleeni-go generatesRyo Nihei2021-10-021-40/+140
* Disallow upper cases in an identifierRyo Nihei2021-09-241-84/+84
* Add name field to the lexical specificationRyo Nihei2021-09-182-10/+43
* Generate constant values representing mode IDs, mode names, kind IDs, and kin...Ryo Nihei2021-09-182-91/+165
* Add maleeni-go command•••maleeni-go generates a lexer that recognizes a specific lexical specification. Ryo Nihei2021-09-141-0/+517
* Define a lexical specification interfaceRyo Nihei2021-09-113-335/+353
* Remove --debug option from the lex commandRyo Nihei2021-09-081-35/+0
* Add lexeme positions to tokens•••close #1 Ryo Nihei2021-08-072-39/+224
* Change APIs•••Change fields of tokens, results of lexical analysis, as follows: - Rename: mode -> mode_id - Rename: kind_id -> mode_kind_id - Add: kind_id The kind ID is unique across all modes, but the mode kind ID is unique only within a mode. Change fields of a transition table as follows: - Rename: initial_mode -> initial_mode_id - Rename: modes -> mode_names - Rename: kinds -> kind_names - Rename: specs[].kinds -> specs[].kind_names - Rename: specs[].dfa.initial_state -> specs[].dfa.initial_state_id Change public types defined in the spec package as follows: - Rename: LexModeNum -> LexModeID - Rename: LexKind -> LexKindName - Add: LexKindID - Add: StateID Ryo Nihei2021-08-012-77/+78
* Add unique kind IDs to tokensRyo Nihei2021-08-012-178/+186
* Add spec.EscapePattern functionRyo Nihei2021-07-221-0/+28
* Support passive mode transitionRyo Nihei2021-06-102-24/+136
* Add fragment expression•••A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`). Ryo Nihei2021-05-251-0/+52
* Rename fields of driver.TokenRyo Nihei2021-05-132-24/+23
* Add --compression-level option to compile command•••--compression-level specifies a compression level. The default value is 2. Ryo Nihei2021-05-112-25/+43
* Fix a text representation of an error token•••This commit fixes a bug that caused the second and subsequent characters of the text representation of an error token to be missing. Ryo Nihei2021-05-112-22/+51
* Update README and godocRyo Nihei2021-05-101-7/+23
* Add --break-on-error option to lex command•••As you use --break-on-error option, break lexical analysis with exit status 1 immediately when an error token appears. Ryo Nihei2021-05-081-0/+1
* Add CLI optionsRyo Nihei2021-05-081-15/+15
* Change type of acceping_states to sliceRyo Nihei2021-05-071-2/+2
* Add transition table compressorRyo Nihei2021-05-071-6/+12
* Remove Peek* functionsRyo Nihei2021-05-052-86/+0
* Add lex mode•••lex mode is a feature that separates transition tables per each mode. The lexer starts from an initial state indicated by `initial_state` field and transitions between modes according to `push` and `pop` fields. The initial state will always be `default`. Currently, maleeni doesn't provide the ability to change the initial state. You can specify the modes of each lex entry using `modes` field. When the mode isn't indicated explicitly, the entries have `default` mode. Ryo Nihei2021-05-042-206/+350
* Generate an invalid token from incompleted input.•••When the lexer's buffer has unaccepted data and reads the EOF, the lexer treats the buffered data as an invalid token. Ryo Nihei2021-05-021-0/+5
* Add code point expression (Meet RL1.1 of UTS #18)•••\u{hex string} matches a character has the code point represented by the hex string. For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits. This feature meets RL1.1 of UTS #18. RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1 Ryo Nihei2021-04-241-2/+35
* Add validation of lexical specs and improve error messagesRyo Nihei2021-04-172-33/+32
* Change the lexical specs of regexp and define concrete syntax error values•••* Make the lexer treat ']' as an ordinary character in default mode * Define values of the syntax error type that represents error information concretely Ryo Nihei2021-04-171-18/+21
* Print the result of the lex command in JSON format•••* Print the result of the lex command in JSON format. * Print the EOF token. Ryo Nihei2021-04-062-134/+180
* Add logical inverse expression•••[^a-z] matches any character that is not in the range a-z. Ryo Nihei2021-04-011-3/+20
* Add range expression•••[a-z] matches any one character from a to z. The order of the characters depends on Unicode code points. Ryo Nihei2021-02-241-1/+260
* Add + and ? operators•••* a+ matches 'a' one or more times. This is equivalent to aa*. * a? matches 'a' zero or one time. Ryo Nihei2021-02-201-5/+35
* Add logging to lex command•••lex command writes logs out to the maleeni-lex.log file. When you generate a lexer using driver.NewLexer(), you can choose whether the lexer writes logs or not. Ryo Nihei2021-02-161-3/+58
* Add types of lexical specifications•••APIs of compiler and driver packages use these types. Because CompiledLexSpec struct a lexer takes has kind names of lexical specification entries, the lexer sets them to tokens. Ryo Nihei2021-02-162-79/+78
* Add bracket expression matching specified character•••The bracket expression matches any single character specified in it. In the bracket expression, the special characters like ., *, and so on are also handled as normal characters. Ryo Nihei2021-02-141-0/+18
* Add dot symbol matching any single character•••The dot symbol matches any single character. When the dot symbol appears, the parser generates an AST matching all of the well-formed UTF-8 byte sequences. Refelences: * https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G7404 * Table 3-6. UTF-8 Bit Distribution * Table 3-7. Well-Formed UTF-8 Byte Sequences Ryo Nihei2021-02-141-1/+43
* Add driver•••The driver takes a DFA and an input text and generates a lexer. The lexer tokenizes the input text according to the lexical specification that the DFA expresses. Ryo Nihei2021-02-142-0/+309