summaryrefslogtreecommitdiff
path: root/tests/paca.mjs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Differentiate an "operator" from a "meta" characterEuAndreh2025-07-161-16/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The character class `[a-z]`, and specially the wildcard `.`, aren't operators: they really do represent themselves with their own special semantics, and they take no operands. So instead of have the "operator" type behave in two ways, with and without arguments, we instead have this new type, the "meta" character. In equivalence to the literal character, the metacharacter represents itself, and also takes no argument. We also can not touch the precedence parsing of operators by tainting it with special conditions for "." and "class", since they should behave just like literal characters: be pushed directly onto the stack. As of now, there are only 2 meta characters: "class" and ".". * src/paca.mjs (operatorChars): Remove "." from the set of operator characters. (classStateStep): Return `{ meta: "class" }` instead of `{ operator: "class" }`. (isMeta): Add equivalent to `isTransition()` and `isOperator()`. (opFor, tokenizeRegexStep): Add new `opFor()` function for classifying a given character, choosing between an operator, a metacharacter and a literal character, and use this function in the body of `tokenizeRegexStep()`. (PRECEDENCE): Remove early entry of precedence values for "class" and ".". (toPostfixStep): Instead of just checking if a character is a literal one before pushing it onto the stack, check that it isn't an operator just by checking if it is an object that has the `operator` attribute. * tests/paca.mjs (test_isOperator): Remove test case for ".", as it is no longer considered an operator. (classStateStep): Update to rename from `{ operator: "class" }` to `{ meta: "class" }`. (test_toPostfixStep, test_toPostfix): Add test cases for meta characters. (test_OPERATOR_FNS): BONUS - Use direct assignment to reset the array to an empty value instead of `arr.splice(0)`.
* Only tolerate escaping of special charsEuAndreh2025-07-151-26/+37
| | | | | | | | | | | | | | | | | * src/paca.mjs (escapingStateStep): Return an error when escaping non-metacharacters. This way cases like \d, which is syntax for [0-9] which will eventually be recognized, will not change its behaviour from a noop escape of "d" to matching digits. (operatorChars, isOperator): Hoist both of these up before their usage in `escapingStateStep()`. * tests/paca.mjs (test_isOperator): Hoist its definition and position inside the `runTests([...])` array to match src/paca.mjs. (test_escapingStateStep): Adjust existing cases and add test case for good/bad escapes. (test_tokenizeRegexStep): Fix bad starting escape, that broke because it was escaping a non-metacharacter.
* Support tokenizing `.` wildcard operator.EuAndreh2025-07-151-0/+19
| | | | | | | | | | | | | | | | | | | * src/paca.mjs (isTransition): Add new function as an improved version of the raw usage of `stateTransitionOperators`, equivalent to `isAnchor()` and `isOperator()`. (operatorChars, isOperator): Add new static set `operatorChars` as backing data of `isOperator()`, instead of ad-hoc conditional in its implementation. Also now add the `.` character as an operator by including it in the `operatorChars` set. (tokenizeRegexStep): Use the new `isTransition()` function instead of checking the set directly. Also tweak ternary to fit in 80 columns. (PRECEDENCE): Add `.` operator with lowest precedence, as it is not really operating on anything, and is instead a target to be operated on. * tests/paca.mjs (test_isTransition): Add obligatory test cases. (test_isOperator): Include test case for `.` wildcard operator.
* Support tokenizing `^` and `$` anchorsEuAndreh2025-07-151-0/+194
| | | | | | | | | | | | | | | | * src/paca.mjs (ANCHOR_FNS): Add simple handlers for ^ and $ anchors, that only look for the position of the character in the pattern as validation during tokenization. (isAnchor): Add simple boolean function to identify anchor characters. (tokenizeRegexStep): Include check if character `isAnchor()`, and call the appropriate `ANCHOR_FNS[char]` when true. * tests/paca.mjs (test_ANCHOR_FNS): Add test with 4 cases - 2 for success and 2 for errors for ^ and $. (test_isAnchor): Add obligatory simple test cases. (test_tokenizeRegexStep): Include test case for tokenizing patterns with character class.
* tests/paca.mjs (test_shouldConcat): Also hoist import, definition and ↵EuAndreh2025-07-151-35/+35
| | | | position in runTests
* tests/paca.mjs (test_compileNFA): Add test case for empty regexEuAndreh2025-07-151-0/+14
|
* tests/paca.mjs (test_escapingStateStep): Add tests for `escapingStateStep()`EuAndreh2025-07-151-0/+36
|
* Use `shouldConcat()` in decision of `escapingStateSte()`EuAndreh2025-07-151-0/+4
| | | | | | | | | | | * src/paca.mjs (escapingStateStep): Use `shouldConcat()` instead of only checking if we're on the last char. We abuse it a bit by passing `null` as the first argument, since it is being escaped. (nonConcatOperators, shouldConcat): Hoist the definition of both above `escapingStateStep()`, so that they're defined before being used. * tests/paca.mjs (test_shouldConcat): Add test case where `null` is explicitly passed as the first argument.
* Support tokenizing character class expressions [a-z]EuAndreh2025-07-151-2/+224
| | | | | | | | | * src/paca.mjs (classStateStep): New function equivalent to `rangeStateStep()` for character class expressions. For now it knowns how to handle escaping ([abc\-_]), simple ranges ([a-z]), negation ([^abc]) and the hyphen literal as the first char ([-a-z_]). * tests.paca.mjs (test_classStateStep): New test entry has a test case each scenario described above.
* tests/paca.mjs (test_rangeStateStep): Finish test cases for rangeStateStepEuAndreh2025-07-131-0/+139
|
* Add first test for `rangeStateStep()`EuAndreh2025-07-131-0/+17
| | | | | * tests/paca.mjs (test_rangeStateStep): Add first test case, for when we find a closing "}" when no comma was seen.
* Hoist `numFromDigits()` before its usageEuAndreh2025-07-131-17/+17
| | | | | | | | | * src/paca.mjs (numFromDigits): Move it to before the *StateStep functions, as it is now used in `rangeStateStep()` function. So instead of letting it be defined afters its usage, move it up. * tests/paca.mjs: Do the same hoisting to the import of the `numFromDigits` name, to the definition of `test_numFromDigits` and its inclusion in the order of the call to `runTests()`.
* Add "[" to the possible characters of TRANSITION_FNS.EuAndreh2025-07-131-0/+18
| | | | | | | | | | | | | | Introducing "[" now we will start to write the code to parse the character class expressions, i.e. [a-z0-9]. The `context` key will contain a `set` with all the literal characters that were found, and all the ranges too. For parsing the ranges, a `range` key equivalent to the one for the {m,n} range is used. Despite the superficial syntax being simmilar, its logic, semantic and implementation will be different. * src/paca.mjs (TRANSITION_FNS) <"[">: Add new transition function for handling the start of a character class expression. * tests/paca.mjs (TRANSITION_FNS): Add a singular test entry, that exercises the conditionless body of the function.
* Add simple test for TRANSITION_FNSEuAndreh2025-07-131-0/+33
| | | | | | | | | | * src/paca.mjs (TRANSITION_FNS): Add trailing underscore to ignored arguments, even though it breaks the name of the `_state` and `context` destructuring arguments. * tests/paca.mjs (test_TRANSITION_FNS): Add new test function with a single case for each transition character. Since these transitions are unconditional and contain no logic, this single sample test is enough to cover for all of its behaviour.
* tests/paca.mjs (test_tokenizeRegexStep): Simplify table valuesEuAndreh2025-07-111-704/+322
|
* tests/paca.mjs: Add tests for numFromDigits()EuAndreh2025-07-111-0/+17
|
* src/paca.mjs (tokenizeRegexStep): Support tokenizing range exps {m,n}EuAndreh2025-07-111-0/+540
|
* src/paca.mjs (tokenizeRegexStep): Include `context` key in reduced stateEuAndreh2025-07-111-118/+144
|
* src/paca.mjs: Move error detection from tokenizeRegexStep => tokenizeRegexEuAndreh2025-07-111-18/+22
|
* tests/paca.mjs (test_tokenizeRegexStep): Compute `char` and `index`EuAndreh2025-07-111-20/+7
|
* Finish implementation of unit testsEuAndreh2025-07-091-17/+864
|
* Implement v0 version of NFA and DFA; WIP testsEuAndreh2025-07-071-0/+1351