| Commit message (Expand) | Author | Age | Files | Lines |
| * | Absorb compiler/parser/ | EuAndreh | 2024-11-29 | 7 | -3605/+0 |
| * | Fix the calculation of inverse bracket expressions•••Close #7
| Ryo Nihei | 2022-04-19 | 2 | -10/+402 |
| * | Avoid panic on spelling inconsistencies errors•••close #5
| Ryo Nihei | 2022-03-21 | 1 | -0/+112 |
| * | Use golangci-lint | Ryo Nihei | 2021-12-15 | 7 | -48/+23 |
| * | Add tests of compiler/parser package | Ryo Nihei | 2021-12-11 | 3 | -4/+13 |
| * | Make character properties unavailable in bracket expressions | Ryo Nihei | 2021-12-11 | 5 | -33/+105 |
| * | Simplify process that generates UTF-8 byte sequences from a code point range | Ryo Nihei | 2021-12-11 | 1 | -1/+1 |
| * | Use new parser and DFA compiler | Ryo Nihei | 2021-12-10 | 15 | -5140/+154 |
| * | Add a new DFA compiler that generates DFA from a set of CPTree | Ryo Nihei | 2021-12-10 | 6 | -0/+1402 |
| * | Add a new parser that constructs a tree representing characters as code point... | Ryo Nihei | 2021-12-10 | 7 | -0/+3134 |
| * | Move UTF8-related processes to utf8 package | Ryo Nihei | 2021-12-01 | 2 | -702/+128 |
| * | Make contributory properties unavailable except internal use•••This change follows [UAX #44 5.13 Property APIs].
> The following subtypes of Unicode character properties should generally not be exposed in APIs,
> except in limited circumstances. They may not be useful, particularly in public API collections,
> and may instead prove misleading to the users of such API collections.
>
> * Contributory properties are not recommended for public APIs.
> ...
https://unicode.org/reports/tr44/#Property_APIs
| Ryo Nihei | 2021-11-28 | 2 | -1/+62 |
| * | Move all UCD-related processes to ucd package | Ryo Nihei | 2021-11-27 | 4 | -4777/+5 |
| * | Support Alphabetic property (Meet RL1.2 of UTS #18 partially) | Ryo Nihei | 2021-11-26 | 3 | -1/+420 |
| * | Make character properties available in an inverse expression (Make [^\p{...}]... | Ryo Nihei | 2021-11-25 | 1 | -0/+4 |
| * | Support Lowercase and Uppercase property (Meet RL1.2 of UTS #18 partially) | Ryo Nihei | 2021-11-25 | 4 | -21/+153 |
| * | Support White_Space property (Meet RL1.2 of UTS #18 partially) | Ryo Nihei | 2021-11-24 | 4 | -25/+110 |
| * | Fix key of generalCategoryCodePoints map•••Use the abbreviation `cn` of the general category value `unassigned` as a key of the `generalCategoryCodePoints` map.
| Ryo Nihei | 2021-11-23 | 1 | -696/+696 |
| * | Remove --debug option from compile command | Ryo Nihei | 2021-09-23 | 1 | -36/+1 |
| * | Keep the order of AST nodes constant | Ryo Nihei | 2021-09-22 | 4 | -20/+50 |
| * | Add name field to the lexical specification | Ryo Nihei | 2021-09-18 | 2 | -0/+4 |
| * | Change APIs•••Change fields of tokens, results of lexical analysis, as follows:
- Rename: mode -> mode_id
- Rename: kind_id -> mode_kind_id
- Add: kind_id
The kind ID is unique across all modes, but the mode kind ID is unique only within a mode.
Change fields of a transition table as follows:
- Rename: initial_mode -> initial_mode_id
- Rename: modes -> mode_names
- Rename: kinds -> kind_names
- Rename: specs[].kinds -> specs[].kind_names
- Rename: specs[].dfa.initial_state -> specs[].dfa.initial_state_id
Change public types defined in the spec package as follows:
- Rename: LexModeNum -> LexModeID
- Rename: LexKind -> LexKindName
- Add: LexKindID
- Add: StateID
| Ryo Nihei | 2021-08-01 | 7 | -70/+96 |
| * | Add unique kind IDs to tokens | Ryo Nihei | 2021-08-01 | 1 | -0/+38 |
| * | Allow duplicate names between fragments and non-fragments | Ryo Nihei | 2021-05-27 | 1 | -0/+103 |
| * | Add fragment expression•••A fragment entry is defined by an entry whose `fragment` field is `true`, and is referenced by a fragment expression (`\f{...}`).
| Ryo Nihei | 2021-05-25 | 8 | -49/+440 |
| * | Fix the initial state number•••Since 0 represents an invalid value in a transition table, assign a number greater than or equal to 1 to states.
| Ryo Nihei | 2021-05-19 | 1 | -1/+5 |
| * | Use go fmt instead of gofmt | Ryo Nihei | 2021-05-12 | 1 | -1/+1 |
| * | Add --compression-level option to compile command•••--compression-level specifies a compression level. The default value is 2.
| Ryo Nihei | 2021-05-11 | 1 | -7/+57 |
| * | Change package structure•••The executable can be installed using `go install ./cmd/maleeni`.
| Ryo Nihei | 2021-05-08 | 1 | -1/+1 |
| * | Add CLI options | Ryo Nihei | 2021-05-08 | 1 | -3/+3 |
| * | Change type of acceping_states to slice | Ryo Nihei | 2021-05-07 | 1 | -2/+6 |
| * | Add transition table compressor | Ryo Nihei | 2021-05-07 | 2 | -9/+60 |
| * | Improve performance of the symbolPositionSet•••When using a map to represent a set, performance degrades due to
the increased number of calls of runtime.mapassign.
Especially when the number of symbols is large, as in compiling a pattern that
contains character properties like \p{Letter}, adding elements to the set
alone may take several tens of seconds of CPU time.
Therefore, this commit solves this problem by changing the representation of
the set from map to array.
| Ryo Nihei | 2021-05-04 | 4 | -63/+98 |
| * | Add lex mode•••lex mode is a feature that separates transition tables per each mode.
The lexer starts from an initial state indicated by `initial_state` field and
transitions between modes according to `push` and `pop` fields.
The initial state will always be `default`.
Currently, maleeni doesn't provide the ability to change the initial state.
You can specify the modes of each lex entry using `modes` field.
When the mode isn't indicated explicitly, the entries have `default` mode.
| Ryo Nihei | 2021-05-04 | 1 | -2/+82 |
| * | Fix parser to recognize property expressions in bracket expressions | Ryo Nihei | 2021-05-02 | 2 | -0/+14 |
| * | Improve compilation time a little•••A pattern like \p{Letter} generates an AST with many symbols concatenated by alt operators,
which results in a large number of symbol positions in one state of the DFA.
Such a pattern increases the compilation time. This commit improves the compilation time a little better.
- To avoid calling astNode#first and astNode#last recursively, memoize the result of them.
- Use a byte sequence that symbol positions are encoded to as a hash value to avoid using fmt.Fprintf function.
- Implement a sort function for symbol positions instead of using sort.Slice function.
| Ryo Nihei | 2021-05-02 | 3 | -174/+269 |
| * | Add character property expression (Meet RL1.2 of UTS #18 partially)•••\p{property name=property value} matches a character has the property.
When the property name is General_Category, it can be omitted.
That is, \p{Letter} equals \p{General_Category=Letter}.
Currently, only General_Category is supported.
This feature meets RL1.2 of UTS #18 partially.
RL1.2 Properties: https://unicode.org/reports/tr18/#RL1.2
| Ryo Nihei | 2021-04-30 | 8 | -27/+4440 |
| * | Add code point expression (Meet RL1.1 of UTS #18)•••\u{hex string} matches a character has the code point represented by the hex string.
For instance, \u{3042} matches hiragana あ (U+3042). The hex string must have 4 or 6 digits.
This feature meets RL1.1 of UTS #18.
RL1.1 Hex Notation: https://unicode.org/reports/tr18/#RL1.1
| Ryo Nihei | 2021-04-24 | 5 | -16/+477 |
| * | Add validation of lexical specs and improve error messages | Ryo Nihei | 2021-04-17 | 1 | -2/+8 |
| * | Change the lexical specs of regexp and define concrete syntax error values•••* Make the lexer treat ']' as an ordinary character in default mode
* Define values of the syntax error type that represents error information concretely
| Ryo Nihei | 2021-04-17 | 5 | -425/+568 |
| * | Increase the maximum number of symbol positions per pattern•••This commit increases the maximum number of symbol positions per pattern to 2^15 (= 32,768).
When the limit is exceeded, the parse method returns an error.
| Ryo Nihei | 2021-04-12 | 5 | -29/+139 |
| * | Fix grammar the parser accepts•••* Add cases test the parse method.
* Fix the parser to pass the cases.
| Ryo Nihei | 2021-04-11 | 6 | -98/+1192 |
| * | Add logging to compile command•••compile command writes logs out to the maleeni-compile.log file.
When you use compiler.Compile(), you can choose whether the lexer writes logs or not.
| Ryo Nihei | 2021-04-08 | 3 | -47/+108 |
| * | Add logical inverse expression•••[^a-z] matches any character that is not in the range a-z.
| Ryo Nihei | 2021-04-01 | 6 | -34/+766 |
| * | Pass values in error type to panic()•••Because parser.parse() expects that recover() returns a value in error type, apply this change.
| Ryo Nihei | 2021-03-07 | 1 | -2/+2 |
| * | Refactoring•••* Remove token field from symbolNode
* Simplify notation of nested nodes
* Simplify arguments of newSymbolNode()
| Ryo Nihei | 2021-02-25 | 5 | -502/+351 |
| * | Add range expression•••[a-z] matches any one character from a to z. The order of the characters depends on Unicode code points.
| Ryo Nihei | 2021-02-24 | 3 | -8/+717 |
| * | Add + and ? operators•••* a+ matches 'a' one or more times. This is equivalent to aa*.
* a? matches 'a' zero or one time.
| Ryo Nihei | 2021-02-20 | 5 | -16/+82 |
| * | Fix computation of last positions | Ryo Nihei | 2021-02-17 | 2 | -0/+122 |
| * | Add types of lexical specifications•••APIs of compiler and driver packages use these types. Because CompiledLexSpec struct a lexer takes has kind names of lexical specification entries, the lexer sets them to tokens.
| Ryo Nihei | 2021-02-16 | 2 | -11/+27 |