aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* Use Go 1.16Ryo Nihei2021-08-072-2/+2
|
* Update CHANGELOGRyo Nihei2021-08-021-0/+7
|
* Change APIsRyo Nihei2021-08-0111-231/+289
| | | | | | | | | | | | | | | | | | | | | | Change fields of tokens, results of lexical analysis, as follows: - Rename: mode -> mode_id - Rename: kind_id -> mode_kind_id - Add: kind_id The kind ID is unique across all modes, but the mode kind ID is unique only within a mode. Change fields of a transition table as follows: - Rename: initial_mode -> initial_mode_id - Rename: modes -> mode_names - Rename: kinds -> kind_names - Rename: specs[].kinds -> specs[].kind_names - Rename: specs[].dfa.initial_state -> specs[].dfa.initial_state_id Change public types defined in the spec package as follows: - Rename: LexModeNum -> LexModeID - Rename: LexKind -> LexKindName - Add: LexKindID - Add: StateID
* Add unique kind IDs to tokensRyo Nihei2021-08-015-178/+239
|
* Fix CHANGELOGRyo Nihei2021-07-291-1/+1
|
* Update CHANGELOGRyo Nihei2021-07-221-0/+7
|
* Add CHANGELOGRyo Nihei2021-07-221-0/+17
|
* Add spec.EscapePattern functionRyo Nihei2021-07-222-0/+49
|
* Support passive mode transitionRyo Nihei2021-06-103-25/+140
|
* Update READMERyo Nihei2021-06-081-1/+1
|
* Update READMERyo Nihei2021-06-041-2/+30
|
* Add status badgeRyo Nihei2021-06-031-0/+2
|
* Set up CIRyo Nihei2021-06-031-0/+22
|
* Update READMERyo Nihei2021-06-021-0/+13
|
* Update READMERyo Nihei2021-05-281-0/+59
|
* Add example lexical specificationRyo Nihei2021-05-273-0/+604
|
* Allow duplicate names between fragments and non-fragmentsRyo Nihei2021-05-273-11/+123
|
* Add fragment expressionRyo Nihei2021-05-2511-61/+540
| | | | A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`).
* Fix the initial state numberRyo Nihei2021-05-191-1/+5
| | | | Since 0 represents an invalid value in a transition table, assign a number greater than or equal to 1 to states.
* Remove the shorthand for --compression-level option from the compile commandRyo Nihei2021-05-131-1/+1
|
* Rename fields of driver.TokenRyo Nihei2021-05-133-26/+25
|
* Use go fmt instead of gofmtRyo Nihei2021-05-121-1/+1
|
* Add --compression-level option to compile commandRyo Nihei2021-05-116-45/+119
| | | | --compression-level specifies a compression level. The default value is 2.
* Fix a text representation of an error tokenRyo Nihei2021-05-112-22/+51
| | | | This commit fixes a bug that caused the second and subsequent characters of the text representation of an error token to be missing.
* Update README and godocRyo Nihei2021-05-102-8/+227
|
* Change package structureRyo Nihei2021-05-086-7/+5
| | | | The executable can be installed using `go install ./cmd/maleeni`.
* Add --break-on-error option to lex commandRyo Nihei2021-05-082-3/+9
| | | | As you use --break-on-error option, break lexical analysis with exit status 1 immediately when an error token appears.
* Add CLI optionsRyo Nihei2021-05-084-56/+117
|
* Change type of acceping_states to sliceRyo Nihei2021-05-073-5/+9
|
* Add transition table compressorRyo Nihei2021-05-076-18/+431
|
* Remove Peek* functionsRyo Nihei2021-05-052-86/+0
|
* Improve performance of the symbolPositionSetRyo Nihei2021-05-044-63/+98
| | | | | | | | | | When using a map to represent a set, performance degrades due to the increased number of calls of runtime.mapassign. Especially when the number of symbols is large, as in compiling a pattern that contains character properties like \p{Letter}, adding elements to the set alone may take several tens of seconds of CPU time. Therefore, this commit solves this problem by changing the representation of the set from map to array.
* Add lex modeRyo Nihei2021-05-044-211/+504
| | | | | | | | | | lex mode is a feature that separates transition tables per each mode. The lexer starts from an initial state indicated by `initial_state` field and transitions between modes according to `push` and `pop` fields. The initial state will always be `default`. Currently, maleeni doesn't provide the ability to change the initial state. You can specify the modes of each lex entry using `modes` field. When the mode isn't indicated explicitly, the entries have `default` mode.
* Generate an invalid token from incompleted input.Ryo Nihei2021-05-021-0/+5
| | | | When the lexer's buffer has unaccepted data and reads the EOF, the lexer treats the buffered data as an invalid token.
* Fix parser to recognize property expressions in bracket expressionsRyo Nihei2021-05-022-0/+14
|
* Improve compilation time a littleRyo Nihei2021-05-023-174/+269
| | | | | | | | | | A pattern like \p{Letter} generates an AST with many symbols concatenated by alt operators, which results in a large number of symbol positions in one state of the DFA. Such a pattern increases the compilation time. This commit improves the compilation time a little better. - To avoid calling astNode#first and astNode#last recursively, memoize the result of them. - Use a byte sequence that symbol positions are encoded to as a hash value to avoid using fmt.Fprintf function. - Implement a sort function for symbol positions instead of using sort.Slice function.
* Add character property expression (Meet RL1.2 of UTS #18 partially)Ryo Nihei2021-04-3010-27/+4748
| | | | | | | | | | \p{property name=property value} matches a character has the property. When the property name is General_Category, it can be omitted. That is, \p{Letter} equals \p{General_Category=Letter}. Currently, only General_Category is supported. This feature meets RL1.2 of UTS #18 partially. RL1.2 Properties: https://unicode.org/reports/tr18/#RL1.2
* Add code point expression (Meet RL1.1 of UTS #18)Ryo Nihei2021-04-246-18/+512
| | | | | | | | \u{hex string} matches a character has the code point represented by the hex string. For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits. This feature meets RL1.1 of UTS #18. RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1
* Add validation of lexical specs and improve error messagesRyo Nihei2021-04-176-75/+174
|
* Change the lexical specs of regexp and define concrete syntax error valuesRyo Nihei2021-04-177-446/+603
| | | | | * Make the lexer treat ']' as an ordinary character in default mode * Define values of the syntax error type that represents error information concretely
* Increase the maximum number of symbol positions per patternRyo Nihei2021-04-125-29/+139
| | | | | This commit increases the maximum number of symbol positions per pattern to 2^15 (= 32,768). When the limit is exceeded, the parse method returns an error.
* Fix grammar the parser acceptsRyo Nihei2021-04-116-99/+1193
| | | | | * Add cases test the parse method. * Fix the parser to pass the cases.
* Add logging to compile commandRyo Nihei2021-04-084-49/+133
| | | | | compile command writes logs out to the maleeni-compile.log file. When you use compiler.Compile(), you can choose whether the lexer writes logs or not.
* Print the result of the lex command in JSON formatRyo Nihei2021-04-063-140/+185
| | | | | * Print the result of the lex command in JSON format. * Print the EOF token.
* Add logical inverse expressionRyo Nihei2021-04-017-37/+786
| | | | [^a-z] matches any character that is not in the range a-z.
* Pass values in error type to panic()Ryo Nihei2021-03-071-2/+2
| | | | Because parser.parse() expects that recover() returns a value in error type, apply this change.
* RefactoringRyo Nihei2021-02-255-502/+351
| | | | | | * Remove token field from symbolNode * Simplify notation of nested nodes * Simplify arguments of newSymbolNode()
* Add range expressionRyo Nihei2021-02-244-9/+977
| | | | [a-z] matches any one character from a to z. The order of the characters depends on Unicode code points.
* Add + and ? operatorsRyo Nihei2021-02-206-21/+117
| | | | | * a+ matches 'a' one or more times. This is equivalent to aa*. * a? matches 'a' zero or one time.
* Fix computation of last positionsRyo Nihei2021-02-172-0/+122
|