| Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Instead of being an increment over "end" that is carried along on NFA
transformation, now the id is computed directly as an increment on
"end". During this refactor, I even saw that "end" and "nextID" of
`concat()` are computed differently, despite arriving at the same
result: "end" is rhs.end, while "nextID" is the max of the nextID from
lhs and rhs.
|
|
* src/paca.mjs
(searchNFAStep): Now instead of just checking if the node has a
transition via a character literal directly, we also check (via the
`performTransition()` function) if a metacharacter interpretation
allows a transition to happen.
(intepretMetacharacter): Add function that "executes" the action
representation in the "meta" attribute of the object, when present.
It is somewhat ad-hoc now, doing checks that implicitly only exist
for "." or "class" metacharacters, but OK for now, given the
possibilities.
(performTransition): Do the fallback to `interpretMetacharacter()`,
giving it an empty object when the node doesn't have the "meta"
attribute.
* tests/paca.mjs (test_{interpretMetacharacter,performTransition): Add
routine test implementation.
|
|
Do not change any observable behaviour outside of `characterClass()`, as
the new output is 100% semantically compatible, but faster and most
importantly, much smaller.
* src/paca.mjs
(characterClass): Leverage `compressCharacterRanges()` when processing
the given set. Also use the same compressed range to filter out set
matches when they are already within the range.
(compareRange): Add function that sorts based on the first numeric
element of the range. When they're equal, use the second element as
the tie breaker.
(compressRangeStep): Add function that takes ranges in sorted order
tries to mush them together when the start of the second (`from`) is
contained by the first (`curr[1]`). Since the ranges are given to
us sorted, we already know that the start of the second is greater
than or equal to the start of the first. When this is the case, we
pick the largest ending to merge the ranges, otherwise we just place
them one next to the other, sequentially.
(compressCharacterRanges): Add function that sorts the ranges and
reduces them with `compressRangeStep()`. Here we have to give an
empty array as the initial value to prevent `compressRangeStep()` to
be given ranges as `[m, n]`, instead of either `[]` or `[[m, n]]`.
This is also why there's a check for `!curr` in
`compressRangeStep()` - to plug the other side of this adaptation.
(inRange): Add function that checks if the given character is
contained by any of the ranges of the given object. At the end,
instead of looking through every `from`, all we need is a single
match, so we use `.some()` instead.
* tests/paca.mjs
(test_characterClass): Add test case for a range that collapses many
literal character matches and many ranges into a single range.
(test_{compareRange,compressRangeStep,compressCharacterRanges,inRange):
Add routine sample-based test cases.
|
|
* src/paca.mjs
(characterClass): Add function that builds the NFA node for
`{ meta: "class" }`. This node leaves the "direct" and
"transitions" keys empty, and add its data under the "meta" key.
One option was to use an inline function that could simply be
called directly during the search to check for a match, but instead
I chose a data representation instead, in order to keep the NFA
literal as obvious and self-representing as possible. Later, the
searching part will have to properly interpret the data of "meta"
properly, instead of blindly executing an opaque function. This
does separate the compilation from execution logic, but keep the NFA
clean of opaque closures.
(wildcard): Add function that buildl the NFA node for `{ meta: "." }`.
Similar to `characterClass()`, the new "meta" key contains pure data
that represents the execution of the metacharacter during search.
(baseNFA, literal): Rename the existing `baseNFA()` to `literal()`.
Then add a new `baseNFA()` function that decides between a character
literal and a metacharacter.
(buildNFAStep): Instead of checking the type of `token`, we check if
`token` has the "operator" attribute, since we now have
metacharacters that also aren't strings.
(classStateStep): Add missing "caret" key to the final metacharacter
output. It was already being detected, just not included in the
result.
(escapingStateStep): Stick to 80 columns.
* tests/paca.mjs
(test_characterClass, test_wildcard, test_baseNFA): Add obligatory
test cases.
(test_buildNFAStep): Include test case for metacharacter.
|
|
The character class `[a-z]`, and specially the wildcard `.`, aren't
operators: they really do represent themselves with their own special
semantics, and they take no operands. So instead of have the "operator"
type behave in two ways, with and without arguments, we instead have
this new type, the "meta" character. In equivalence to the literal
character, the metacharacter represents itself, and also takes no
argument. We also can not touch the precedence parsing of operators by
tainting it with special conditions for "." and "class", since they
should behave just like literal characters: be pushed directly onto the
stack.
As of now, there are only 2 meta characters: "class" and ".".
* src/paca.mjs
(operatorChars): Remove "." from the set of operator characters.
(classStateStep): Return `{ meta: "class" }` instead of
`{ operator: "class" }`.
(isMeta): Add equivalent to `isTransition()` and `isOperator()`.
(opFor, tokenizeRegexStep): Add new `opFor()` function for classifying
a given character, choosing between an operator, a metacharacter and
a literal character, and use this function in the body of
`tokenizeRegexStep()`.
(PRECEDENCE): Remove early entry of precedence values for "class" and
".".
(toPostfixStep): Instead of just checking if a character is a literal
one before pushing it onto the stack, check that it isn't an
operator just by checking if it is an object that has the `operator`
attribute.
* tests/paca.mjs
(test_isOperator): Remove test case for ".", as it is no longer
considered an operator.
(classStateStep): Update to rename from `{ operator: "class" }` to
`{ meta: "class" }`.
(test_toPostfixStep, test_toPostfix): Add test cases for meta
characters.
(test_OPERATOR_FNS): BONUS - Use direct assignment to reset the array
to an empty value instead of `arr.splice(0)`.
|
|
* src/paca.mjs
(escapingStateStep): Return an error when escaping non-metacharacters.
This way cases like \d, which is syntax for [0-9] which will
eventually be recognized, will not change its behaviour from a noop
escape of "d" to matching digits.
(operatorChars, isOperator): Hoist both of these up before their usage
in `escapingStateStep()`.
* tests/paca.mjs
(test_isOperator): Hoist its definition and position inside the
`runTests([...])` array to match src/paca.mjs.
(test_escapingStateStep): Adjust existing cases and add test case for
good/bad escapes.
(test_tokenizeRegexStep): Fix bad starting escape, that broke because
it was escaping a non-metacharacter.
|
|
* src/paca.mjs
(isTransition): Add new function as an improved version of the raw
usage of `stateTransitionOperators`, equivalent to `isAnchor()` and
`isOperator()`.
(operatorChars, isOperator): Add new static set `operatorChars` as
backing data of `isOperator()`, instead of ad-hoc conditional in its
implementation. Also now add the `.` character as an operator by
including it in the `operatorChars` set.
(tokenizeRegexStep): Use the new `isTransition()` function instead of
checking the set directly. Also tweak ternary to fit in 80 columns.
(PRECEDENCE): Add `.` operator with lowest precedence, as it is not
really operating on anything, and is instead a target to be operated
on.
* tests/paca.mjs
(test_isTransition): Add obligatory test cases.
(test_isOperator): Include test case for `.` wildcard operator.
|
|
* src/paca.mjs
(ANCHOR_FNS): Add simple handlers for ^ and $ anchors, that only look
for the position of the character in the pattern as validation
during tokenization.
(isAnchor): Add simple boolean function to identify anchor characters.
(tokenizeRegexStep): Include check if character `isAnchor()`, and call
the appropriate `ANCHOR_FNS[char]` when true.
* tests/paca.mjs
(test_ANCHOR_FNS): Add test with 4 cases - 2 for success and 2 for
errors for ^ and $.
(test_isAnchor): Add obligatory simple test cases.
(test_tokenizeRegexStep): Include test case for tokenizing patterns
with character class.
|
|
position in runTests
|
|
|
|
|
|
|
|
* src/paca.mjs
(escapingStateStep): Use `shouldConcat()` instead of only checking if
we're on the last char. We abuse it a bit by passing `null` as the
first argument, since it is being escaped.
(nonConcatOperators, shouldConcat): Hoist the definition of both above
`escapingStateStep()`, so that they're defined before being used.
* tests/paca.mjs (test_shouldConcat): Add test case where `null` is
explicitly passed as the first argument.
|
|
* src/paca.mjs (classStateStep): New function equivalent to
`rangeStateStep()` for character class expressions. For now it knowns
how to handle escaping ([abc\-_]), simple ranges ([a-z]), negation
([^abc]) and the hyphen literal as the first char ([-a-z_]).
* tests.paca.mjs (test_classStateStep): New test entry has a test case
each scenario described above.
|
|
|
|
|
|
|
|
|
|
|
|
* tests/paca.mjs (test_rangeStateStep): Add first test case, for when we
find a closing "}" when no comma was seen.
|
|
* src/paca.mjs (numFromDigits): Move it to before the *StateStep
functions, as it is now used in `rangeStateStep()` function. So
instead of letting it be defined afters its usage, move it up.
* tests/paca.mjs: Do the same hoisting to the import of the
`numFromDigits` name, to the definition of `test_numFromDigits` and
its inclusion in the order of the call to `runTests()`.
|
|
Introducing "[" now we will start to write the code to parse the
character class expressions, i.e. [a-z0-9]. The `context` key will
contain a `set` with all the literal characters that were found, and all
the ranges too. For parsing the ranges, a `range` key equivalent to the
one for the {m,n} range is used. Despite the superficial syntax being
simmilar, its logic, semantic and implementation will be different.
* src/paca.mjs (TRANSITION_FNS) <"[">: Add new transition function for
handling the start of a character class expression.
* tests/paca.mjs (TRANSITION_FNS): Add a singular test entry, that
exercises the conditionless body of the function.
|
|
* src/paca.mjs (TRANSITION_FNS): Add trailing underscore to ignored
arguments, even though it breaks the name of the `_state` and
`context` destructuring arguments.
* tests/paca.mjs (test_TRANSITION_FNS): Add new test function with a
single case for each transition character. Since these transitions
are unconditional and contain no logic, this single sample test is
enough to cover for all of its behaviour.
|
|
This reverts commit 15f206e4940cb80ff98eea7c376d9c618f80ed0e.
|
|
|
|
When handling a custom state, dispatch it to the appropriate function in
`STATE_FNS`; and when looking for chars that enters these custom states,
dispatch it to the appropriate function in `TRANSITION_FNS`.
The body of each part didn't change, so no tests had to be modified.
But now we can write specific tests for each case, and remove the bulk
of the logic out of `tokenizeRegexFn()`.
|
|
|
|
|
|
|
|
singleton array
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|