aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorRyo Nihei <nihei.dev@gmail.com>2021-09-11 00:40:05 +0900
committerRyo Nihei <nihei.dev@gmail.com>2021-09-11 22:57:17 +0900
commit96a555a00f000704c618c226485fa6d87ce66d9d (patch)
tree9d7398033a2c015390f0de7ab69b6fd37bb1ba30 /README.md
parentRemove --debug option from the lex command (diff)
downloadtre-96a555a00f000704c618c226485fa6d87ce66d9d.tar.gz
tre-96a555a00f000704c618c226485fa6d87ce66d9d.tar.xz
Define a lexical specification interface
Diffstat (limited to 'README.md')
-rw-r--r--README.md29
1 files changed, 14 insertions, 15 deletions
diff --git a/README.md b/README.md
index 5ec3111..44e5a6f 100644
--- a/README.md
+++ b/README.md
@@ -47,18 +47,18 @@ If you want to make sure that the lexical specification behaves as expected, you
⚠️ An encoding that `maleeni lex` and the driver can handle is only UTF-8.
```sh
-$ echo -n 'The truth is out there.' | maleeni lex clexspec.json | jq -r '[.kind_id, .kind_name, .text, .eof] | @csv'
-2,"word","The",false
-1,"whitespace"," ",false
-2,"word","truth",false
-1,"whitespace"," ",false
-2,"word","is",false
-1,"whitespace"," ",false
-2,"word","out",false
-1,"whitespace"," ",false
-2,"word","there",false
-3,"punctuation",".",false
-0,"","",true
+$ echo -n 'The truth is out there.' | maleeni lex clexspec.json | jq -r '[.kind_name, .lexeme, .eof] | @csv'
+"word","The",false
+"whitespace"," ",false
+"word","truth",false
+"whitespace"," ",false
+"word","is",false
+"whitespace"," ",false
+"word","out",false
+"whitespace"," ",false
+"word","there",false
+"punctuation",".",false
+"","",true
```
The JSON format of tokens that `maleeni lex` command prints is as follows:
@@ -72,8 +72,7 @@ The JSON format of tokens that `maleeni lex` command prints is as follows:
| kind_name | string | A name of a lexical kind. |
| row | integer | A row number where a lexeme appears. |
| col | integer | A column number where a lexeme appears. Note that `col` is counted in code points, not bytes. |
-| match | array of integers | A byte sequense of a lexeme. |
-| text | string | A string representation of a lexeme. |
+| lexeme | array of integers | A byte sequense of a lexeme. |
| eof | bool | When this field is `true`, it means the token is the EOF token. |
| invalid | bool | When this field is `true`, it means the token is an error token. |
@@ -336,7 +335,7 @@ For instance, you can define a subset of [the string literal of golang](https://
In the above specification, when the `"` mark appears in default mode (it's the initial mode), the driver transitions to the `string` mode and interprets character sequences (`char_seq`) and escape sequences (`escaped_char`). When the `"` mark appears the next time, the driver returns to the `default` mode.
```sh
-$ echo -n '"foo\nbar"foo' | maleeni lex go-string-cspec.json | jq -r '[.mode_name, .kind_name, .text, .eof] | @csv'
+$ echo -n '"foo\nbar"foo' | maleeni lex go-string-cspec.json | jq -r '[.mode_name, .kind_name, .lexeme, .eof] | @csv'
"default","string_open","""",false
"string","char_seq","foo",false
"string","escaped_char","\n",false