summaryrefslogtreecommitdiff
path: root/tests (unfollow)
Commit message (Collapse)AuthorFilesLines
2025-08-01src/paca.mjs: Improve implementation of interpretMetacharactersHEADmainEuAndreh1-28/+172
2025-07-31Makefile: Install like a Node.js packageEuAndreh2-6/+10
2025-07-20tests/paca.mjs: Add WIP testsEuAndreh1-0/+9
2025-07-20src/paca.mjs: Rename buildDFA -> toDFAEuAndreh2-22/+25
2025-07-20src/paca.mjs: Support returning multiple options from `performTransition()`EuAndreh1-5/+6
2025-07-20Add initial support for caret and dollar metacharactersEuAndreh2-22/+200
2025-07-17.gitignore: Remove trailing slash from node_modules ruleEuAndreh1-1/+1
2025-07-17src/paca.mjs: Rename {start,end}ID => {start,end}EuAndreh1-44/+42
2025-07-17Do away with the "nextID" attributeEuAndreh2-53/+10
Instead of being an increment over "end" that is carried along on NFA transformation, now the id is computed directly as an increment on "end". During this refactor, I even saw that "end" and "nextID" of `concat()` are computed differently, despite arriving at the same result: "end" is rhs.end, while "nextID" is the max of the nextID from lhs and rhs.
2025-07-16Support searching in the NFA using the metacharacters.EuAndreh2-1/+123
* src/paca.mjs (searchNFAStep): Now instead of just checking if the node has a transition via a character literal directly, we also check (via the `performTransition()` function) if a metacharacter interpretation allows a transition to happen. (intepretMetacharacter): Add function that "executes" the action representation in the "meta" attribute of the object, when present. It is somewhat ad-hoc now, doing checks that implicitly only exist for "." or "class" metacharacters, but OK for now, given the possibilities. (performTransition): Do the fallback to `interpretMetacharacter()`, giving it an empty object when the node doesn't have the "meta" attribute. * tests/paca.mjs (test_{interpretMetacharacter,performTransition): Add routine test implementation.
2025-07-16Compress character class when compiling NFA.EuAndreh2-3/+174
Do not change any observable behaviour outside of `characterClass()`, as the new output is 100% semantically compatible, but faster and most importantly, much smaller. * src/paca.mjs (characterClass): Leverage `compressCharacterRanges()` when processing the given set. Also use the same compressed range to filter out set matches when they are already within the range. (compareRange): Add function that sorts based on the first numeric element of the range. When they're equal, use the second element as the tie breaker. (compressRangeStep): Add function that takes ranges in sorted order tries to mush them together when the start of the second (`from`) is contained by the first (`curr[1]`). Since the ranges are given to us sorted, we already know that the start of the second is greater than or equal to the start of the first. When this is the case, we pick the largest ending to merge the ranges, otherwise we just place them one next to the other, sequentially. (compressCharacterRanges): Add function that sorts the ranges and reduces them with `compressRangeStep()`. Here we have to give an empty array as the initial value to prevent `compressRangeStep()` to be given ranges as `[m, n]`, instead of either `[]` or `[[m, n]]`. This is also why there's a check for `!curr` in `compressRangeStep()` - to plug the other side of this adaptation. (inRange): Add function that checks if the given character is contained by any of the ranges of the given object. At the end, instead of looking through every `from`, all we need is a single match, so we use `.some()` instead. * tests/paca.mjs (test_characterClass): Add test case for a range that collapses many literal character matches and many ranges into a single range. (test_{compareRange,compressRangeStep,compressCharacterRanges,inRange): Add routine sample-based test cases.
2025-07-16Build NFA nodes for "." and "class" metacharactersEuAndreh2-38/+305
* src/paca.mjs (characterClass): Add function that builds the NFA node for `{ meta: "class" }`. This node leaves the "direct" and "transitions" keys empty, and add its data under the "meta" key. One option was to use an inline function that could simply be called directly during the search to check for a match, but instead I chose a data representation instead, in order to keep the NFA literal as obvious and self-representing as possible. Later, the searching part will have to properly interpret the data of "meta" properly, instead of blindly executing an opaque function. This does separate the compilation from execution logic, but keep the NFA clean of opaque closures. (wildcard): Add function that buildl the NFA node for `{ meta: "." }`. Similar to `characterClass()`, the new "meta" key contains pure data that represents the execution of the metacharacter during search. (baseNFA, literal): Rename the existing `baseNFA()` to `literal()`. Then add a new `baseNFA()` function that decides between a character literal and a metacharacter. (buildNFAStep): Instead of checking the type of `token`, we check if `token` has the "operator" attribute, since we now have metacharacters that also aren't strings. (classStateStep): Add missing "caret" key to the final metacharacter output. It was already being detected, just not included in the result. (escapingStateStep): Stick to 80 columns. * tests/paca.mjs (test_characterClass, test_wildcard, test_baseNFA): Add obligatory test cases. (test_buildNFAStep): Include test case for metacharacter.
2025-07-16Differentiate an "operator" from a "meta" characterEuAndreh2-24/+72
The character class `[a-z]`, and specially the wildcard `.`, aren't operators: they really do represent themselves with their own special semantics, and they take no operands. So instead of have the "operator" type behave in two ways, with and without arguments, we instead have this new type, the "meta" character. In equivalence to the literal character, the metacharacter represents itself, and also takes no argument. We also can not touch the precedence parsing of operators by tainting it with special conditions for "." and "class", since they should behave just like literal characters: be pushed directly onto the stack. As of now, there are only 2 meta characters: "class" and ".". * src/paca.mjs (operatorChars): Remove "." from the set of operator characters. (classStateStep): Return `{ meta: "class" }` instead of `{ operator: "class" }`. (isMeta): Add equivalent to `isTransition()` and `isOperator()`. (opFor, tokenizeRegexStep): Add new `opFor()` function for classifying a given character, choosing between an operator, a metacharacter and a literal character, and use this function in the body of `tokenizeRegexStep()`. (PRECEDENCE): Remove early entry of precedence values for "class" and ".". (toPostfixStep): Instead of just checking if a character is a literal one before pushing it onto the stack, check that it isn't an operator just by checking if it is an object that has the `operator` attribute. * tests/paca.mjs (test_isOperator): Remove test case for ".", as it is no longer considered an operator. (classStateStep): Update to rename from `{ operator: "class" }` to `{ meta: "class" }`. (test_toPostfixStep, test_toPostfix): Add test cases for meta characters. (test_OPERATOR_FNS): BONUS - Use direct assignment to reset the array to an empty value instead of `arr.splice(0)`.
2025-07-15Only tolerate escaping of special charsEuAndreh2-38/+59
* src/paca.mjs (escapingStateStep): Return an error when escaping non-metacharacters. This way cases like \d, which is syntax for [0-9] which will eventually be recognized, will not change its behaviour from a noop escape of "d" to matching digits. (operatorChars, isOperator): Hoist both of these up before their usage in `escapingStateStep()`. * tests/paca.mjs (test_isOperator): Hoist its definition and position inside the `runTests([...])` array to match src/paca.mjs. (test_escapingStateStep): Adjust existing cases and add test case for good/bad escapes. (test_tokenizeRegexStep): Fix bad starting escape, that broke because it was escaping a non-metacharacter.
2025-07-15Support tokenizing `.` wildcard operator.EuAndreh2-5/+29
* src/paca.mjs (isTransition): Add new function as an improved version of the raw usage of `stateTransitionOperators`, equivalent to `isAnchor()` and `isOperator()`. (operatorChars, isOperator): Add new static set `operatorChars` as backing data of `isOperator()`, instead of ad-hoc conditional in its implementation. Also now add the `.` character as an operator by including it in the `operatorChars` set. (tokenizeRegexStep): Use the new `isTransition()` function instead of checking the set directly. Also tweak ternary to fit in 80 columns. (PRECEDENCE): Add `.` operator with lowest precedence, as it is not really operating on anything, and is instead a target to be operated on. * tests/paca.mjs (test_isTransition): Add obligatory test cases. (test_isOperator): Include test case for `.` wildcard operator.
2025-07-15Support tokenizing `^` and `$` anchorsEuAndreh2-1/+242
* src/paca.mjs (ANCHOR_FNS): Add simple handlers for ^ and $ anchors, that only look for the position of the character in the pattern as validation during tokenization. (isAnchor): Add simple boolean function to identify anchor characters. (tokenizeRegexStep): Include check if character `isAnchor()`, and call the appropriate `ANCHOR_FNS[char]` when true. * tests/paca.mjs (test_ANCHOR_FNS): Add test with 4 cases - 2 for success and 2 for errors for ^ and $. (test_isAnchor): Add obligatory simple test cases. (test_tokenizeRegexStep): Include test case for tokenizing patterns with character class.
2025-07-15tests/paca.mjs (test_shouldConcat): Also hoist import, definition and ↵EuAndreh1-35/+35
position in runTests
2025-07-15tests/paca.mjs (test_compileNFA): Add test case for empty regexEuAndreh1-0/+14
2025-07-15src/paca.mjs (PRECEDENCE): Add "range" and "class" keysEuAndreh1-0/+2
2025-07-15tests/paca.mjs (test_escapingStateStep): Add tests for `escapingStateStep()`EuAndreh1-0/+36
2025-07-15Use `shouldConcat()` in decision of `escapingStateSte()`EuAndreh2-11/+15
* src/paca.mjs (escapingStateStep): Use `shouldConcat()` instead of only checking if we're on the last char. We abuse it a bit by passing `null` as the first argument, since it is being escaped. (nonConcatOperators, shouldConcat): Hoist the definition of both above `escapingStateStep()`, so that they're defined before being used. * tests/paca.mjs (test_shouldConcat): Add test case where `null` is explicitly passed as the first argument.
2025-07-15Support tokenizing character class expressions [a-z]EuAndreh2-5/+349
* src/paca.mjs (classStateStep): New function equivalent to `rangeStateStep()` for character class expressions. For now it knowns how to handle escaping ([abc\-_]), simple ranges ([a-z]), negation ([^abc]) and the hyphen literal as the first char ([-a-z_]). * tests.paca.mjs (test_classStateStep): New test entry has a test case each scenario described above.
2025-07-13tests/paca.mjs (test_rangeStateStep): Finish test cases for rangeStateStepEuAndreh1-0/+139
2025-07-13src/paca.mjs (rangeStateStep): Fix typo in SyntaxError messageEuAndreh1-1/+1
2025-07-13src/paca.mjs (rangeStateStep): Refine indentation and alignmentEuAndreh1-3/+3
2025-07-13src/paca.mjs ({escaping,range}StateStep): Add leading underscore to ignored argsEuAndreh1-2/+2
2025-07-13src/paca.mjs (rangeStateStep): Return ValueError when range number decreasesEuAndreh1-1/+2
2025-07-13Add first test for `rangeStateStep()`EuAndreh1-0/+17
* tests/paca.mjs (test_rangeStateStep): Add first test case, for when we find a closing "}" when no comma was seen.
2025-07-13Hoist `numFromDigits()` before its usageEuAndreh2-22/+22
* src/paca.mjs (numFromDigits): Move it to before the *StateStep functions, as it is now used in `rangeStateStep()` function. So instead of letting it be defined afters its usage, move it up. * tests/paca.mjs: Do the same hoisting to the import of the `numFromDigits` name, to the definition of `test_numFromDigits` and its inclusion in the order of the call to `runTests()`.
2025-07-13Add "[" to the possible characters of TRANSITION_FNS.EuAndreh2-0/+30
Introducing "[" now we will start to write the code to parse the character class expressions, i.e. [a-z0-9]. The `context` key will contain a `set` with all the literal characters that were found, and all the ranges too. For parsing the ranges, a `range` key equivalent to the one for the {m,n} range is used. Despite the superficial syntax being simmilar, its logic, semantic and implementation will be different. * src/paca.mjs (TRANSITION_FNS) <"[">: Add new transition function for handling the start of a character class expression. * tests/paca.mjs (TRANSITION_FNS): Add a singular test entry, that exercises the conditionless body of the function.
2025-07-13Add simple test for TRANSITION_FNSEuAndreh2-2/+35
* src/paca.mjs (TRANSITION_FNS): Add trailing underscore to ignored arguments, even though it breaks the name of the `_state` and `context` destructuring arguments. * tests/paca.mjs (test_TRANSITION_FNS): Add new test function with a single case for each transition character. Since these transitions are unconditional and contain no logic, this single sample test is enough to cover for all of its behaviour.
2025-07-12Revert "src/paca.mjs: Temporarily export internal functions"EuAndreh1-4/+4
This reverts commit 15f206e4940cb80ff98eea7c376d9c618f80ed0e.
2025-07-12src/paca.mjs: Temporarily export internal functionsEuAndreh1-4/+4
2025-07-11src/paca.mjs (tokenizeRegexStep): Simplify bodyEuAndreh1-105/+127
When handling a custom state, dispatch it to the appropriate function in `STATE_FNS`; and when looking for chars that enters these custom states, dispatch it to the appropriate function in `TRANSITION_FNS`. The body of each part didn't change, so no tests had to be modified. But now we can write specific tests for each case, and remove the bulk of the logic out of `tokenizeRegexFn()`.
2025-07-11tests/paca.mjs (test_tokenizeRegexStep): Simplify table valuesEuAndreh1-704/+322
2025-07-11src/paca.mjs (tokenizeRegexStep): Fix missing concat when escapingEuAndreh1-5/+8
2025-07-11tests/paca.mjs: Add tests for numFromDigits()EuAndreh1-0/+17
2025-07-11src/paca.mjs: Remove calls to arr.concat([]) with unneeded wrapping ↵EuAndreh1-14/+6
singleton array
2025-07-11src/paca.mjs (tokenizeRegexStep): Support tokenizing range exps {m,n}EuAndreh2-2/+638
2025-07-11src/paca.mjs (tokenizeRegexStep): Include `context` key in reduced stateEuAndreh2-121/+152
2025-07-11src/paca.mjs: Move error detection from tokenizeRegexStep => tokenizeRegexEuAndreh2-26/+34
2025-07-11tests/paca.mjs (test_tokenizeRegexStep): Compute `char` and `index`EuAndreh1-20/+7
2025-07-11src/paca.mjs: Remove unused repeat(3) importEuAndreh1-1/+1
2025-07-09Makefile: Minor fixes of indentation and alignmentEuAndreh1-3/+3
2025-07-09Finish implementation of unit testsEuAndreh2-31/+881
2025-07-07Implement v0 version of NFA and DFA; WIP testsEuAndreh6-0/+1898