| Commit message (Expand) | Author | Age | Files | Lines |
| * | Use golangci-lint | Ryo Nihei | 2021-12-15 | 1 | -2/+18 |
| * | Use new parser and DFA compiler | Ryo Nihei | 2021-12-10 | 1 | -2/+5 |
| * | Make character properties available in an inverse expression (Make [^\p{...}]... | Ryo Nihei | 2021-11-25 | 1 | -0/+16 |
| * | Update godoc | Ryo Nihei | 2021-10-05 | 2 | -8/+16 |
| * | Remove the ModeName and KindName fields from the driver.Token struct | Ryo Nihei | 2021-10-03 | 3 | -259/+272 |
| * | Format the source code of a lexer maleeni-go generates | Ryo Nihei | 2021-10-02 | 1 | -40/+140 |
| * | Disallow upper cases in an identifier | Ryo Nihei | 2021-09-24 | 1 | -84/+84 |
| * | Add name field to the lexical specification | Ryo Nihei | 2021-09-18 | 2 | -10/+43 |
| * | Generate constant values representing mode IDs, mode names, kind IDs, and kin... | Ryo Nihei | 2021-09-18 | 2 | -91/+165 |
| * | Add maleeni-go command•••maleeni-go generates a lexer that recognizes a specific lexical specification.
| Ryo Nihei | 2021-09-14 | 1 | -0/+517 |
| * | Define a lexical specification interface | Ryo Nihei | 2021-09-11 | 3 | -335/+353 |
| * | Remove --debug option from the lex command | Ryo Nihei | 2021-09-08 | 1 | -35/+0 |
| * | Add lexeme positions to tokens•••close #1
| Ryo Nihei | 2021-08-07 | 2 | -39/+224 |
| * | Change APIs•••Change fields of tokens, results of lexical analysis, as follows:
- Rename: mode -> mode_id
- Rename: kind_id -> mode_kind_id
- Add: kind_id
The kind ID is unique across all modes, but the mode kind ID is unique only within a mode.
Change fields of a transition table as follows:
- Rename: initial_mode -> initial_mode_id
- Rename: modes -> mode_names
- Rename: kinds -> kind_names
- Rename: specs[].kinds -> specs[].kind_names
- Rename: specs[].dfa.initial_state -> specs[].dfa.initial_state_id
Change public types defined in the spec package as follows:
- Rename: LexModeNum -> LexModeID
- Rename: LexKind -> LexKindName
- Add: LexKindID
- Add: StateID
| Ryo Nihei | 2021-08-01 | 2 | -77/+78 |
| * | Add unique kind IDs to tokens | Ryo Nihei | 2021-08-01 | 2 | -178/+186 |
| * | Add spec.EscapePattern function | Ryo Nihei | 2021-07-22 | 1 | -0/+28 |
| * | Support passive mode transition | Ryo Nihei | 2021-06-10 | 2 | -24/+136 |
| * | Add fragment expression•••A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`).
| Ryo Nihei | 2021-05-25 | 1 | -0/+52 |
| * | Rename fields of driver.Token | Ryo Nihei | 2021-05-13 | 2 | -24/+23 |
| * | Add --compression-level option to compile command•••--compression-level specifies a compression level. The default value is 2.
| Ryo Nihei | 2021-05-11 | 2 | -25/+43 |
| * | Fix a text representation of an error token•••This commit fixes a bug that caused the second and subsequent characters of the text representation of an error token to be missing.
| Ryo Nihei | 2021-05-11 | 2 | -22/+51 |
| * | Update README and godoc | Ryo Nihei | 2021-05-10 | 1 | -7/+23 |
| * | Add --break-on-error option to lex command•••As you use --break-on-error option, break lexical analysis with exit status 1 immediately when an error token appears.
| Ryo Nihei | 2021-05-08 | 1 | -0/+1 |
| * | Add CLI options | Ryo Nihei | 2021-05-08 | 1 | -15/+15 |
| * | Change type of acceping_states to slice | Ryo Nihei | 2021-05-07 | 1 | -2/+2 |
| * | Add transition table compressor | Ryo Nihei | 2021-05-07 | 1 | -6/+12 |
| * | Remove Peek* functions | Ryo Nihei | 2021-05-05 | 2 | -86/+0 |
| * | Add lex mode•••lex mode is a feature that separates transition tables per each mode.
The lexer starts from an initial state indicated by `initial_state` field and
transitions between modes according to `push` and `pop` fields.
The initial state will always be `default`.
Currently, maleeni doesn't provide the ability to change the initial state.
You can specify the modes of each lex entry using `modes` field.
When the mode isn't indicated explicitly, the entries have `default` mode.
| Ryo Nihei | 2021-05-04 | 2 | -206/+350 |
| * | Generate an invalid token from incompleted input.•••When the lexer's buffer has unaccepted data and reads the EOF, the lexer treats the buffered data as an invalid token.
| Ryo Nihei | 2021-05-02 | 1 | -0/+5 |
| * | Add code point expression (Meet RL1.1 of UTS #18)•••\u{hex string} matches a character has the code point represented by the hex string.
For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits.
This feature meets RL1.1 of UTS #18.
RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1
| Ryo Nihei | 2021-04-24 | 1 | -2/+35 |
| * | Add validation of lexical specs and improve error messages | Ryo Nihei | 2021-04-17 | 2 | -33/+32 |
| * | Change the lexical specs of regexp and define concrete syntax error values•••* Make the lexer treat ']' as an ordinary character in default mode
* Define values of the syntax error type that represents error information concretely
| Ryo Nihei | 2021-04-17 | 1 | -18/+21 |
| * | Print the result of the lex command in JSON format•••* Print the result of the lex command in JSON format.
* Print the EOF token.
| Ryo Nihei | 2021-04-06 | 2 | -134/+180 |
| * | Add logical inverse expression•••[^a-z] matches any character that is not in the range a-z.
| Ryo Nihei | 2021-04-01 | 1 | -3/+20 |
| * | Add range expression•••[a-z] matches any one character from a to z. The order of the characters depends on Unicode code points.
| Ryo Nihei | 2021-02-24 | 1 | -1/+260 |
| * | Add + and ? operators•••* a+ matches 'a' one or more times. This is equivalent to aa*.
* a? matches 'a' zero or one time.
| Ryo Nihei | 2021-02-20 | 1 | -5/+35 |
| * | Add logging to lex command•••lex command writes logs out to the maleeni-lex.log file.
When you generate a lexer using driver.NewLexer(), you can choose whether the lexer writes logs or not.
| Ryo Nihei | 2021-02-16 | 1 | -3/+58 |
| * | Add types of lexical specifications•••APIs of compiler and driver packages use these types. Because CompiledLexSpec struct a lexer takes has kind names of lexical specification entries, the lexer sets them to tokens.
| Ryo Nihei | 2021-02-16 | 2 | -79/+78 |
| * | Add bracket expression matching specified character•••The bracket expression matches any single character specified in it. In the bracket expression, the special characters like ., *, and so on are also handled as normal characters.
| Ryo Nihei | 2021-02-14 | 1 | -0/+18 |
| * | Add dot symbol matching any single character•••The dot symbol matches any single character. When the dot symbol appears, the parser generates an AST matching all of the well-formed UTF-8 byte sequences.
Refelences:
* https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G7404
* Table 3-6. UTF-8 Bit Distribution
* Table 3-7. Well-Formed UTF-8 Byte Sequences
| Ryo Nihei | 2021-02-14 | 1 | -1/+43 |
| * | Add driver•••The driver takes a DFA and an input text and generates a lexer. The lexer tokenizes the input text according to the lexical specification that the DFA expresses.
| Ryo Nihei | 2021-02-14 | 2 | -0/+309 |