aboutsummaryrefslogtreecommitdiff
path: root/driver/lexer_test.go (unfollow)
Commit message (Expand)AuthorFilesLines
2021-04-30Add character property expression (Meet RL1.2 of UTS #18 partially)•••\p{property name=property value} matches a character has the property. When the property name is General_Category, it can be omitted. That is, \p{Letter} equals \p{General_Category=Letter}. Currently, only General_Category is supported. This feature meets RL1.2 of UTS #18 partially. RL1.2 Properties: https://unicode.org/reports/tr18/#RL1.2 Ryo Nihei10-27/+4748
2021-04-24Add code point expression (Meet RL1.1 of UTS #18)•••\u{hex string} matches a character has the code point represented by the hex string. For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits. This feature meets RL1.1 of UTS #18. RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1 Ryo Nihei6-18/+512
2021-04-17Add validation of lexical specs and improve error messagesRyo Nihei6-75/+174
2021-04-17Change the lexical specs of regexp and define concrete syntax error values•••* Make the lexer treat ']' as an ordinary character in default mode * Define values of the syntax error type that represents error information concretely Ryo Nihei7-446/+603
2021-04-12Increase the maximum number of symbol positions per pattern•••This commit increases the maximum number of symbol positions per pattern to 2^15 (= 32,768). When the limit is exceeded, the parse method returns an error. Ryo Nihei5-29/+139
2021-04-11Fix grammar the parser accepts•••* Add cases test the parse method. * Fix the parser to pass the cases. Ryo Nihei6-98/+1192
2021-04-08Add logging to compile command•••compile command writes logs out to the maleeni-compile.log file. When you use compiler.Compile(), you can choose whether the lexer writes logs or not. Ryo Nihei4-49/+133
2021-04-06Print the result of the lex command in JSON format•••* Print the result of the lex command in JSON format. * Print the EOF token. Ryo Nihei3-140/+185
2021-04-01Add logical inverse expression•••[^a-z] matches any character that is not in the range a-z. Ryo Nihei7-37/+786
2021-03-07Pass values in error type to panic()•••Because parser.parse() expects that recover() returns a value in error type, apply this change. Ryo Nihei1-2/+2
2021-02-25Refactoring•••* Remove token field from symbolNode * Simplify notation of nested nodes * Simplify arguments of newSymbolNode() Ryo Nihei5-502/+351
2021-02-24Add range expression•••[a-z] matches any one character from a to z. The order of the characters depends on Unicode code points. Ryo Nihei4-9/+977
2021-02-20Add + and ? operators•••* a+ matches 'a' one or more times. This is equivalent to aa*. * a? matches 'a' zero or one time. Ryo Nihei6-21/+117
2021-02-17Fix computation of last positionsRyo Nihei2-0/+122
2021-02-16Add logging to lex command•••lex command writes logs out to the maleeni-lex.log file. When you generate a lexer using driver.NewLexer(), you can choose whether the lexer writes logs or not. Ryo Nihei3-5/+126
2021-02-16Add CLIRyo Nihei6-0/+433
2021-02-16Add types of lexical specifications•••APIs of compiler and driver packages use these types. Because CompiledLexSpec struct a lexer takes has kind names of lexical specification entries, the lexer sets them to tokens. Ryo Nihei5-90/+133
2021-02-14Add bracket expression matching specified character•••The bracket expression matches any single character specified in it. In the bracket expression, the special characters like ., *, and so on are also handled as normal characters. Ryo Nihei4-9/+127
2021-02-14Add dot symbol matching any single character•••The dot symbol matches any single character. When the dot symbol appears, the parser generates an AST matching all of the well-formed UTF-8 byte sequences. Refelences: * https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G7404 * Table 3-6. UTF-8 Bit Distribution * Table 3-7. Well-Formed UTF-8 Byte Sequences Ryo Nihei7-21/+201
2021-02-14Add driver•••The driver takes a DFA and an input text and generates a lexer. The lexer tokenizes the input text according to the lexical specification that the DFA expresses. Ryo Nihei2-0/+309
2021-02-14Add compiler•••The compiler takes a lexical specification expressed by regular expressions and generates a DFA accepting the tokens. Operators that you can use in the regular expressions are concatenation, alternation, repeat, and grouping. Ryo Nihei9-0/+1268