aboutsummaryrefslogtreecommitdiff
path: root/compiler/syntax_error.go (unfollow)
Commit message (Collapse)AuthorFilesLines
2021-05-25Add fragment expressionRyo Nihei11-61/+540
A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`).
2021-05-19Fix the initial state numberRyo Nihei1-1/+5
Since 0 represents an invalid value in a transition table, assign a number greater than or equal to 1 to states.
2021-05-13Remove the shorthand for --compression-level option from the compile commandRyo Nihei1-1/+1
2021-05-13Rename fields of driver.TokenRyo Nihei3-26/+25
2021-05-12Use go fmt instead of gofmtRyo Nihei1-1/+1
2021-05-11Add --compression-level option to compile commandRyo Nihei6-45/+119
--compression-level specifies a compression level. The default value is 2.
2021-05-11Fix a text representation of an error tokenRyo Nihei2-22/+51
This commit fixes a bug that caused the second and subsequent characters of the text representation of an error token to be missing.
2021-05-10Update README and godocRyo Nihei2-8/+227
2021-05-08Change package structureRyo Nihei6-7/+5
The executable can be installed using `go install ./cmd/maleeni`.
2021-05-08Add --break-on-error option to lex commandRyo Nihei2-3/+9
As you use --break-on-error option, break lexical analysis with exit status 1 immediately when an error token appears.
2021-05-08Add CLI optionsRyo Nihei4-56/+117
2021-05-07Change type of acceping_states to sliceRyo Nihei3-5/+9
2021-05-07Add transition table compressorRyo Nihei6-18/+431
2021-05-05Remove Peek* functionsRyo Nihei2-86/+0
2021-05-04Improve performance of the symbolPositionSetRyo Nihei4-63/+98
When using a map to represent a set, performance degrades due to the increased number of calls of runtime.mapassign. Especially when the number of symbols is large, as in compiling a pattern that contains character properties like \p{Letter}, adding elements to the set alone may take several tens of seconds of CPU time. Therefore, this commit solves this problem by changing the representation of the set from map to array.
2021-05-04Add lex modeRyo Nihei4-211/+504
lex mode is a feature that separates transition tables per each mode. The lexer starts from an initial state indicated by `initial_state` field and transitions between modes according to `push` and `pop` fields. The initial state will always be `default`. Currently, maleeni doesn't provide the ability to change the initial state. You can specify the modes of each lex entry using `modes` field. When the mode isn't indicated explicitly, the entries have `default` mode.
2021-05-02Generate an invalid token from incompleted input.Ryo Nihei1-0/+5
When the lexer's buffer has unaccepted data and reads the EOF, the lexer treats the buffered data as an invalid token.
2021-05-02Fix parser to recognize property expressions in bracket expressionsRyo Nihei2-0/+14
2021-05-02Improve compilation time a littleRyo Nihei3-174/+269
A pattern like \p{Letter} generates an AST with many symbols concatenated by alt operators, which results in a large number of symbol positions in one state of the DFA. Such a pattern increases the compilation time. This commit improves the compilation time a little better. - To avoid calling astNode#first and astNode#last recursively, memoize the result of them. - Use a byte sequence that symbol positions are encoded to as a hash value to avoid using fmt.Fprintf function. - Implement a sort function for symbol positions instead of using sort.Slice function.
2021-04-30Add character property expression (Meet RL1.2 of UTS #18 partially)Ryo Nihei10-27/+4748
\p{property name=property value} matches a character has the property. When the property name is General_Category, it can be omitted. That is, \p{Letter} equals \p{General_Category=Letter}. Currently, only General_Category is supported. This feature meets RL1.2 of UTS #18 partially. RL1.2 Properties: https://unicode.org/reports/tr18/#RL1.2
2021-04-24Add code point expression (Meet RL1.1 of UTS #18)Ryo Nihei6-18/+512
\u{hex string} matches a character has the code point represented by the hex string. For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits. This feature meets RL1.1 of UTS #18. RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1
2021-04-17Add validation of lexical specs and improve error messagesRyo Nihei6-75/+174
2021-04-17Change the lexical specs of regexp and define concrete syntax error valuesRyo Nihei7-446/+603
* Make the lexer treat ']' as an ordinary character in default mode * Define values of the syntax error type that represents error information concretely
2021-04-12Increase the maximum number of symbol positions per patternRyo Nihei5-29/+139
This commit increases the maximum number of symbol positions per pattern to 2^15 (= 32,768). When the limit is exceeded, the parse method returns an error.
2021-04-11Fix grammar the parser acceptsRyo Nihei6-99/+1193
* Add cases test the parse method. * Fix the parser to pass the cases.
2021-04-08Add logging to compile commandRyo Nihei4-49/+133
compile command writes logs out to the maleeni-compile.log file. When you use compiler.Compile(), you can choose whether the lexer writes logs or not.
2021-04-06Print the result of the lex command in JSON formatRyo Nihei3-140/+185
* Print the result of the lex command in JSON format. * Print the EOF token.
2021-04-01Add logical inverse expressionRyo Nihei7-37/+786
[^a-z] matches any character that is not in the range a-z.
2021-03-07Pass values in error type to panic()Ryo Nihei1-2/+2
Because parser.parse() expects that recover() returns a value in error type, apply this change.
2021-02-25RefactoringRyo Nihei5-502/+351
* Remove token field from symbolNode * Simplify notation of nested nodes * Simplify arguments of newSymbolNode()
2021-02-24Add range expressionRyo Nihei4-9/+977
[a-z] matches any one character from a to z. The order of the characters depends on Unicode code points.
2021-02-20Add + and ? operatorsRyo Nihei6-21/+117
* a+ matches 'a' one or more times. This is equivalent to aa*. * a? matches 'a' zero or one time.
2021-02-17Fix computation of last positionsRyo Nihei2-0/+122
2021-02-16Add logging to lex commandRyo Nihei3-5/+126
lex command writes logs out to the maleeni-lex.log file. When you generate a lexer using driver.NewLexer(), you can choose whether the lexer writes logs or not.
2021-02-16Add CLIRyo Nihei6-0/+433
2021-02-16Add types of lexical specificationsRyo Nihei5-90/+133
APIs of compiler and driver packages use these types. Because CompiledLexSpec struct a lexer takes has kind names of lexical specification entries, the lexer sets them to tokens.
2021-02-14Add bracket expression matching specified characterRyo Nihei4-9/+127
The bracket expression matches any single character specified in it. In the bracket expression, the special characters like ., *, and so on are also handled as normal characters.
2021-02-14Add dot symbol matching any single characterRyo Nihei7-21/+201
The dot symbol matches any single character. When the dot symbol appears, the parser generates an AST matching all of the well-formed UTF-8 byte sequences. Refelences: * https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G7404 * Table 3-6. UTF-8 Bit Distribution * Table 3-7. Well-Formed UTF-8 Byte Sequences
2021-02-14Add driverRyo Nihei2-0/+309
The driver takes a DFA and an input text and generates a lexer. The lexer tokenizes the input text according to the lexical specification that the DFA expresses.
2021-02-14Add compilerRyo Nihei9-0/+1268
The compiler takes a lexical specification expressed by regular expressions and generates a DFA accepting the tokens. Operators that you can use in the regular expressions are concatenation, alternation, repeat, and grouping.