paca - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Files	Lines
2025-08-01	src/paca.mjs: Improve implementation of interpretMetacharactersHEAD main	EuAndreh	1	-28/+172

2025-07-31	Makefile: Install like a Node.js package	EuAndreh	2	-6/+10

2025-07-20	tests/paca.mjs: Add WIP tests	EuAndreh	1	-0/+9

2025-07-20	src/paca.mjs: Rename buildDFA -> toDFA	EuAndreh	2	-22/+25

2025-07-20	src/paca.mjs: Support returning multiple options from `performTransition()`	EuAndreh	1	-5/+6

2025-07-20	Add initial support for caret and dollar metacharacters	EuAndreh	2	-22/+200

2025-07-17	.gitignore: Remove trailing slash from node_modules rule	EuAndreh	1	-1/+1

2025-07-17	src/paca.mjs: Rename {start,end}ID => {start,end}	EuAndreh	1	-44/+42

2025-07-17	Do away with the "nextID" attribute	EuAndreh	2	-53/+10
	Instead of being an increment over "end" that is carried along on NFA transformation, now the id is computed directly as an increment on "end". During this refactor, I even saw that "end" and "nextID" of `concat()` are computed differently, despite arriving at the same result: "end" is rhs.end, while "nextID" is the max of the nextID from lhs and rhs.
2025-07-16	Support searching in the NFA using the metacharacters.	EuAndreh	2	-1/+123
	* src/paca.mjs (searchNFAStep): Now instead of just checking if the node has a transition via a character literal directly, we also check (via the `performTransition()` function) if a metacharacter interpretation allows a transition to happen. (intepretMetacharacter): Add function that "executes" the action representation in the "meta" attribute of the object, when present. It is somewhat ad-hoc now, doing checks that implicitly only exist for "." or "class" metacharacters, but OK for now, given the possibilities. (performTransition): Do the fallback to `interpretMetacharacter()`, giving it an empty object when the node doesn't have the "meta" attribute. * tests/paca.mjs (test_{interpretMetacharacter,performTransition): Add routine test implementation.
2025-07-16	Compress character class when compiling NFA.	EuAndreh	2	-3/+174
	Do not change any observable behaviour outside of `characterClass()`, as the new output is 100% semantically compatible, but faster and most importantly, much smaller. * src/paca.mjs (characterClass): Leverage `compressCharacterRanges()` when processing the given set. Also use the same compressed range to filter out set matches when they are already within the range. (compareRange): Add function that sorts based on the first numeric element of the range. When they're equal, use the second element as the tie breaker. (compressRangeStep): Add function that takes ranges in sorted order tries to mush them together when the start of the second (`from`) is contained by the first (`curr[1]`). Since the ranges are given to us sorted, we already know that the start of the second is greater than or equal to the start of the first. When this is the case, we pick the largest ending to merge the ranges, otherwise we just place them one next to the other, sequentially. (compressCharacterRanges): Add function that sorts the ranges and reduces them with `compressRangeStep()`. Here we have to give an empty array as the initial value to prevent `compressRangeStep()` to be given ranges as `[m, n]`, instead of either `[]` or `[[m, n]]`. This is also why there's a check for `!curr` in `compressRangeStep()` - to plug the other side of this adaptation. (inRange): Add function that checks if the given character is contained by any of the ranges of the given object. At the end, instead of looking through every `from`, all we need is a single match, so we use `.some()` instead. * tests/paca.mjs (test_characterClass): Add test case for a range that collapses many literal character matches and many ranges into a single range. (test_{compareRange,compressRangeStep,compressCharacterRanges,inRange): Add routine sample-based test cases.
2025-07-16	Build NFA nodes for "." and "class" metacharacters	EuAndreh	2	-38/+305
	* src/paca.mjs (characterClass): Add function that builds the NFA node for `{ meta: "class" }`. This node leaves the "direct" and "transitions" keys empty, and add its data under the "meta" key. One option was to use an inline function that could simply be called directly during the search to check for a match, but instead I chose a data representation instead, in order to keep the NFA literal as obvious and self-representing as possible. Later, the searching part will have to properly interpret the data of "meta" properly, instead of blindly executing an opaque function. This does separate the compilation from execution logic, but keep the NFA clean of opaque closures. (wildcard): Add function that buildl the NFA node for `{ meta: "." }`. Similar to `characterClass()`, the new "meta" key contains pure data that represents the execution of the metacharacter during search. (baseNFA, literal): Rename the existing `baseNFA()` to `literal()`. Then add a new `baseNFA()` function that decides between a character literal and a metacharacter. (buildNFAStep): Instead of checking the type of `token`, we check if `token` has the "operator" attribute, since we now have metacharacters that also aren't strings. (classStateStep): Add missing "caret" key to the final metacharacter output. It was already being detected, just not included in the result. (escapingStateStep): Stick to 80 columns. * tests/paca.mjs (test_characterClass, test_wildcard, test_baseNFA): Add obligatory test cases. (test_buildNFAStep): Include test case for metacharacter.
2025-07-16	Differentiate an "operator" from a "meta" character	EuAndreh	2	-24/+72
	The character class `[a-z]`, and specially the wildcard `.`, aren't operators: they really do represent themselves with their own special semantics, and they take no operands. So instead of have the "operator" type behave in two ways, with and without arguments, we instead have this new type, the "meta" character. In equivalence to the literal character, the metacharacter represents itself, and also takes no argument. We also can not touch the precedence parsing of operators by tainting it with special conditions for "." and "class", since they should behave just like literal characters: be pushed directly onto the stack. As of now, there are only 2 meta characters: "class" and ".". * src/paca.mjs (operatorChars): Remove "." from the set of operator characters. (classStateStep): Return `{ meta: "class" }` instead of `{ operator: "class" }`. (isMeta): Add equivalent to `isTransition()` and `isOperator()`. (opFor, tokenizeRegexStep): Add new `opFor()` function for classifying a given character, choosing between an operator, a metacharacter and a literal character, and use this function in the body of `tokenizeRegexStep()`. (PRECEDENCE): Remove early entry of precedence values for "class" and ".". (toPostfixStep): Instead of just checking if a character is a literal one before pushing it onto the stack, check that it isn't an operator just by checking if it is an object that has the `operator` attribute. * tests/paca.mjs (test_isOperator): Remove test case for ".", as it is no longer considered an operator. (classStateStep): Update to rename from `{ operator: "class" }` to `{ meta: "class" }`. (test_toPostfixStep, test_toPostfix): Add test cases for meta characters. (test_OPERATOR_FNS): BONUS - Use direct assignment to reset the array to an empty value instead of `arr.splice(0)`.
2025-07-15	Only tolerate escaping of special chars	EuAndreh	2	-38/+59
	* src/paca.mjs (escapingStateStep): Return an error when escaping non-metacharacters. This way cases like \d, which is syntax for [0-9] which will eventually be recognized, will not change its behaviour from a noop escape of "d" to matching digits. (operatorChars, isOperator): Hoist both of these up before their usage in `escapingStateStep()`. * tests/paca.mjs (test_isOperator): Hoist its definition and position inside the `runTests([...])` array to match src/paca.mjs. (test_escapingStateStep): Adjust existing cases and add test case for good/bad escapes. (test_tokenizeRegexStep): Fix bad starting escape, that broke because it was escaping a non-metacharacter.
2025-07-15	Support tokenizing `.` wildcard operator.	EuAndreh	2	-5/+29
	* src/paca.mjs (isTransition): Add new function as an improved version of the raw usage of `stateTransitionOperators`, equivalent to `isAnchor()` and `isOperator()`. (operatorChars, isOperator): Add new static set `operatorChars` as backing data of `isOperator()`, instead of ad-hoc conditional in its implementation. Also now add the `.` character as an operator by including it in the `operatorChars` set. (tokenizeRegexStep): Use the new `isTransition()` function instead of checking the set directly. Also tweak ternary to fit in 80 columns. (PRECEDENCE): Add `.` operator with lowest precedence, as it is not really operating on anything, and is instead a target to be operated on. * tests/paca.mjs (test_isTransition): Add obligatory test cases. (test_isOperator): Include test case for `.` wildcard operator.
2025-07-15	Support tokenizing `^` and `$` anchors	EuAndreh	2	-1/+242
	* src/paca.mjs (ANCHOR_FNS): Add simple handlers for ^ and $ anchors, that only look for the position of the character in the pattern as validation during tokenization. (isAnchor): Add simple boolean function to identify anchor characters. (tokenizeRegexStep): Include check if character `isAnchor()`, and call the appropriate `ANCHOR_FNS[char]` when true. * tests/paca.mjs (test_ANCHOR_FNS): Add test with 4 cases - 2 for success and 2 for errors for ^ and $. (test_isAnchor): Add obligatory simple test cases. (test_tokenizeRegexStep): Include test case for tokenizing patterns with character class.
2025-07-15	tests/paca.mjs (test_shouldConcat): Also hoist import, definition and ↵	EuAndreh	1	-35/+35
	position in runTests
2025-07-15	tests/paca.mjs (test_compileNFA): Add test case for empty regex	EuAndreh	1	-0/+14

2025-07-15	src/paca.mjs (PRECEDENCE): Add "range" and "class" keys	EuAndreh	1	-0/+2

2025-07-15	tests/paca.mjs (test_escapingStateStep): Add tests for `escapingStateStep()`	EuAndreh	1	-0/+36

2025-07-15	Use `shouldConcat()` in decision of `escapingStateSte()`	EuAndreh	2	-11/+15
	* src/paca.mjs (escapingStateStep): Use `shouldConcat()` instead of only checking if we're on the last char. We abuse it a bit by passing `null` as the first argument, since it is being escaped. (nonConcatOperators, shouldConcat): Hoist the definition of both above `escapingStateStep()`, so that they're defined before being used. * tests/paca.mjs (test_shouldConcat): Add test case where `null` is explicitly passed as the first argument.
2025-07-15	Support tokenizing character class expressions [a-z]	EuAndreh	2	-5/+349
	* src/paca.mjs (classStateStep): New function equivalent to `rangeStateStep()` for character class expressions. For now it knowns how to handle escaping ([abc\-_]), simple ranges ([a-z]), negation ([^abc]) and the hyphen literal as the first char ([-a-z_]). * tests.paca.mjs (test_classStateStep): New test entry has a test case each scenario described above.
2025-07-13	tests/paca.mjs (test_rangeStateStep): Finish test cases for rangeStateStep	EuAndreh	1	-0/+139

2025-07-13	src/paca.mjs (rangeStateStep): Fix typo in SyntaxError message	EuAndreh	1	-1/+1

2025-07-13	src/paca.mjs (rangeStateStep): Refine indentation and alignment	EuAndreh	1	-3/+3

2025-07-13	src/paca.mjs ({escaping,range}StateStep): Add leading underscore to ignored args	EuAndreh	1	-2/+2

2025-07-13	src/paca.mjs (rangeStateStep): Return ValueError when range number decreases	EuAndreh	1	-1/+2

2025-07-13	Add first test for `rangeStateStep()`	EuAndreh	1	-0/+17
	* tests/paca.mjs (test_rangeStateStep): Add first test case, for when we find a closing "}" when no comma was seen.
2025-07-13	Hoist `numFromDigits()` before its usage	EuAndreh	2	-22/+22
	* src/paca.mjs (numFromDigits): Move it to before the StateStep functions, as it is now used in `rangeStateStep()` function. So instead of letting it be defined afters its usage, move it up. tests/paca.mjs: Do the same hoisting to the import of the `numFromDigits` name, to the definition of `test_numFromDigits` and its inclusion in the order of the call to `runTests()`.
2025-07-13	Add "[" to the possible characters of TRANSITION_FNS.	EuAndreh	2	-0/+30
	Introducing "[" now we will start to write the code to parse the character class expressions, i.e. [a-z0-9]. The `context` key will contain a `set` with all the literal characters that were found, and all the ranges too. For parsing the ranges, a `range` key equivalent to the one for the {m,n} range is used. Despite the superficial syntax being simmilar, its logic, semantic and implementation will be different. * src/paca.mjs (TRANSITION_FNS) <"[">: Add new transition function for handling the start of a character class expression. * tests/paca.mjs (TRANSITION_FNS): Add a singular test entry, that exercises the conditionless body of the function.
2025-07-13	Add simple test for TRANSITION_FNS	EuAndreh	2	-2/+35
	* src/paca.mjs (TRANSITION_FNS): Add trailing underscore to ignored arguments, even though it breaks the name of the `_state` and `context` destructuring arguments. * tests/paca.mjs (test_TRANSITION_FNS): Add new test function with a single case for each transition character. Since these transitions are unconditional and contain no logic, this single sample test is enough to cover for all of its behaviour.
2025-07-12	Revert "src/paca.mjs: Temporarily export internal functions"	EuAndreh	1	-4/+4
	This reverts commit 15f206e4940cb80ff98eea7c376d9c618f80ed0e.
2025-07-12	src/paca.mjs: Temporarily export internal functions	EuAndreh	1	-4/+4

2025-07-11	src/paca.mjs (tokenizeRegexStep): Simplify body	EuAndreh	1	-105/+127
	When handling a custom state, dispatch it to the appropriate function in `STATE_FNS`; and when looking for chars that enters these custom states, dispatch it to the appropriate function in `TRANSITION_FNS`. The body of each part didn't change, so no tests had to be modified. But now we can write specific tests for each case, and remove the bulk of the logic out of `tokenizeRegexFn()`.
2025-07-11	tests/paca.mjs (test_tokenizeRegexStep): Simplify table values	EuAndreh	1	-704/+322

2025-07-11	src/paca.mjs (tokenizeRegexStep): Fix missing concat when escaping	EuAndreh	1	-5/+8

2025-07-11	tests/paca.mjs: Add tests for numFromDigits()	EuAndreh	1	-0/+17

2025-07-11	src/paca.mjs: Remove calls to arr.concat([]) with unneeded wrapping ↵	EuAndreh	1	-14/+6
	singleton array
2025-07-11	src/paca.mjs (tokenizeRegexStep): Support tokenizing range exps {m,n}	EuAndreh	2	-2/+638

2025-07-11	src/paca.mjs (tokenizeRegexStep): Include `context` key in reduced state	EuAndreh	2	-121/+152

2025-07-11	src/paca.mjs: Move error detection from tokenizeRegexStep => tokenizeRegex	EuAndreh	2	-26/+34

2025-07-11	tests/paca.mjs (test_tokenizeRegexStep): Compute `char` and `index`	EuAndreh	1	-20/+7

2025-07-11	src/paca.mjs: Remove unused repeat(3) import	EuAndreh	1	-1/+1

2025-07-09	Makefile: Minor fixes of indentation and alignment	EuAndreh	1	-3/+3

2025-07-09	Finish implementation of unit tests	EuAndreh	2	-31/+881

2025-07-07	Implement v0 version of NFA and DFA; WIP tests	EuAndreh	6	-0/+1898