summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Only tolerate escaping of special charsEuAndreh2025-07-152-38/+59
| | | | | | | | | | | | | | | | | * src/paca.mjs (escapingStateStep): Return an error when escaping non-metacharacters. This way cases like \d, which is syntax for [0-9] which will eventually be recognized, will not change its behaviour from a noop escape of "d" to matching digits. (operatorChars, isOperator): Hoist both of these up before their usage in `escapingStateStep()`. * tests/paca.mjs (test_isOperator): Hoist its definition and position inside the `runTests([...])` array to match src/paca.mjs. (test_escapingStateStep): Adjust existing cases and add test case for good/bad escapes. (test_tokenizeRegexStep): Fix bad starting escape, that broke because it was escaping a non-metacharacter.
* Support tokenizing `.` wildcard operator.EuAndreh2025-07-152-5/+29
| | | | | | | | | | | | | | | | | | | * src/paca.mjs (isTransition): Add new function as an improved version of the raw usage of `stateTransitionOperators`, equivalent to `isAnchor()` and `isOperator()`. (operatorChars, isOperator): Add new static set `operatorChars` as backing data of `isOperator()`, instead of ad-hoc conditional in its implementation. Also now add the `.` character as an operator by including it in the `operatorChars` set. (tokenizeRegexStep): Use the new `isTransition()` function instead of checking the set directly. Also tweak ternary to fit in 80 columns. (PRECEDENCE): Add `.` operator with lowest precedence, as it is not really operating on anything, and is instead a target to be operated on. * tests/paca.mjs (test_isTransition): Add obligatory test cases. (test_isOperator): Include test case for `.` wildcard operator.
* Support tokenizing `^` and `$` anchorsEuAndreh2025-07-152-1/+242
| | | | | | | | | | | | | | | | * src/paca.mjs (ANCHOR_FNS): Add simple handlers for ^ and $ anchors, that only look for the position of the character in the pattern as validation during tokenization. (isAnchor): Add simple boolean function to identify anchor characters. (tokenizeRegexStep): Include check if character `isAnchor()`, and call the appropriate `ANCHOR_FNS[char]` when true. * tests/paca.mjs (test_ANCHOR_FNS): Add test with 4 cases - 2 for success and 2 for errors for ^ and $. (test_isAnchor): Add obligatory simple test cases. (test_tokenizeRegexStep): Include test case for tokenizing patterns with character class.
* tests/paca.mjs (test_shouldConcat): Also hoist import, definition and ↵EuAndreh2025-07-151-35/+35
| | | | position in runTests
* tests/paca.mjs (test_compileNFA): Add test case for empty regexEuAndreh2025-07-151-0/+14
|
* src/paca.mjs (PRECEDENCE): Add "range" and "class" keysEuAndreh2025-07-151-0/+2
|
* tests/paca.mjs (test_escapingStateStep): Add tests for `escapingStateStep()`EuAndreh2025-07-151-0/+36
|
* Use `shouldConcat()` in decision of `escapingStateSte()`EuAndreh2025-07-152-11/+15
| | | | | | | | | | | * src/paca.mjs (escapingStateStep): Use `shouldConcat()` instead of only checking if we're on the last char. We abuse it a bit by passing `null` as the first argument, since it is being escaped. (nonConcatOperators, shouldConcat): Hoist the definition of both above `escapingStateStep()`, so that they're defined before being used. * tests/paca.mjs (test_shouldConcat): Add test case where `null` is explicitly passed as the first argument.
* Support tokenizing character class expressions [a-z]EuAndreh2025-07-152-5/+349
| | | | | | | | | * src/paca.mjs (classStateStep): New function equivalent to `rangeStateStep()` for character class expressions. For now it knowns how to handle escaping ([abc\-_]), simple ranges ([a-z]), negation ([^abc]) and the hyphen literal as the first char ([-a-z_]). * tests.paca.mjs (test_classStateStep): New test entry has a test case each scenario described above.
* tests/paca.mjs (test_rangeStateStep): Finish test cases for rangeStateStepEuAndreh2025-07-131-0/+139
|
* src/paca.mjs (rangeStateStep): Fix typo in SyntaxError messageEuAndreh2025-07-131-1/+1
|
* src/paca.mjs (rangeStateStep): Refine indentation and alignmentEuAndreh2025-07-131-3/+3
|
* src/paca.mjs ({escaping,range}StateStep): Add leading underscore to ignored argsEuAndreh2025-07-131-2/+2
|
* src/paca.mjs (rangeStateStep): Return ValueError when range number decreasesEuAndreh2025-07-131-1/+2
|
* Add first test for `rangeStateStep()`EuAndreh2025-07-131-0/+17
| | | | | * tests/paca.mjs (test_rangeStateStep): Add first test case, for when we find a closing "}" when no comma was seen.
* Hoist `numFromDigits()` before its usageEuAndreh2025-07-132-22/+22
| | | | | | | | | * src/paca.mjs (numFromDigits): Move it to before the *StateStep functions, as it is now used in `rangeStateStep()` function. So instead of letting it be defined afters its usage, move it up. * tests/paca.mjs: Do the same hoisting to the import of the `numFromDigits` name, to the definition of `test_numFromDigits` and its inclusion in the order of the call to `runTests()`.
* Add "[" to the possible characters of TRANSITION_FNS.EuAndreh2025-07-132-0/+30
| | | | | | | | | | | | | | Introducing "[" now we will start to write the code to parse the character class expressions, i.e. [a-z0-9]. The `context` key will contain a `set` with all the literal characters that were found, and all the ranges too. For parsing the ranges, a `range` key equivalent to the one for the {m,n} range is used. Despite the superficial syntax being simmilar, its logic, semantic and implementation will be different. * src/paca.mjs (TRANSITION_FNS) <"[">: Add new transition function for handling the start of a character class expression. * tests/paca.mjs (TRANSITION_FNS): Add a singular test entry, that exercises the conditionless body of the function.
* Add simple test for TRANSITION_FNSEuAndreh2025-07-132-2/+35
| | | | | | | | | | * src/paca.mjs (TRANSITION_FNS): Add trailing underscore to ignored arguments, even though it breaks the name of the `_state` and `context` destructuring arguments. * tests/paca.mjs (test_TRANSITION_FNS): Add new test function with a single case for each transition character. Since these transitions are unconditional and contain no logic, this single sample test is enough to cover for all of its behaviour.
* Revert "src/paca.mjs: Temporarily export internal functions"EuAndreh2025-07-121-4/+4
| | | | This reverts commit 15f206e4940cb80ff98eea7c376d9c618f80ed0e.
* src/paca.mjs: Temporarily export internal functionsEuAndreh2025-07-121-4/+4
|
* src/paca.mjs (tokenizeRegexStep): Simplify bodyEuAndreh2025-07-111-105/+127
| | | | | | | | | | When handling a custom state, dispatch it to the appropriate function in `STATE_FNS`; and when looking for chars that enters these custom states, dispatch it to the appropriate function in `TRANSITION_FNS`. The body of each part didn't change, so no tests had to be modified. But now we can write specific tests for each case, and remove the bulk of the logic out of `tokenizeRegexFn()`.
* tests/paca.mjs (test_tokenizeRegexStep): Simplify table valuesEuAndreh2025-07-111-704/+322
|
* src/paca.mjs (tokenizeRegexStep): Fix missing concat when escapingEuAndreh2025-07-111-5/+8
|
* tests/paca.mjs: Add tests for numFromDigits()EuAndreh2025-07-111-0/+17
|
* src/paca.mjs: Remove calls to arr.concat([]) with unneeded wrapping ↵EuAndreh2025-07-111-14/+6
| | | | singleton array
* src/paca.mjs (tokenizeRegexStep): Support tokenizing range exps {m,n}EuAndreh2025-07-112-2/+638
|
* src/paca.mjs (tokenizeRegexStep): Include `context` key in reduced stateEuAndreh2025-07-112-121/+152
|
* src/paca.mjs: Move error detection from tokenizeRegexStep => tokenizeRegexEuAndreh2025-07-112-26/+34
|
* tests/paca.mjs (test_tokenizeRegexStep): Compute `char` and `index`EuAndreh2025-07-111-20/+7
|
* src/paca.mjs: Remove unused repeat(3) importEuAndreh2025-07-111-1/+1
|
* Makefile: Minor fixes of indentation and alignmentEuAndreh2025-07-091-3/+3
|
* Finish implementation of unit testsEuAndreh2025-07-092-31/+881
|
* Implement v0 version of NFA and DFA; WIP testsEuAndreh2025-07-076-0/+1898
|
* Initial empty commitEuAndreh2025-06-300-0/+0