Installation• Uninstallation• How to useChatette?• ChatettevsChatito?• Development• Credits
Chatetteis a Python program that generates training datasets forRasa NLUgiven template files. If you want to make large datasets of example data for Natural Language Understanding tasks without too much of a headache,Chatetteis a project for you.
Specifically,Chatetteimplements a Domain Specific Language (DSL) that allows you to define templates to generate a large number of sentences, which are then saved in the input format(s) ofRasa NLU.
TheDSLused is a near-superset of the excellent projectChatitocreated by Rodrigo Pimentel. (Note: the DSL is actually a superset of Chatito v2.1.x for Rasa NLU, not for all possible adapters.)
An interactive mode is available as well:
To runChatette,you will need to havePythoninstalled. Chatetteworks with both Python 2.7 and 3.x (>= 3.4).
Chatetteis available onPyPI,and can thus be installed usingpip
:
pip install chatette
Alternatively,you can clone theGitHub repositoryand install the requirements:
pip install -r requirements/common.txt
You can then install the project (as an editable package) using pip, by executing the following command from the directoryChatette/chatette/
:
pip install -e.
You can then run the module by using the commands below in the cloned directory.
You can just use pip to uninstallChatette:
pip uninstall chatette
The data thatChatetteuses and generates is loaded from and saved to files. You will thus have:
-
One or severalinput file(s)containing the templates. There is no need for a specific file extension. The syntax of theDSLto make those templates is described on thewiki.
-
One or severaloutput file(s),which will be generated byChatetteand will contain the generated examples. Those files can be formatted inJSON(by default) or inMarkdownand can be directly fed toRasa NLU.It is also possible to use aJSONLformat.
OnceChatetteis installed and you created the template files, run the following command:
python -m chatette<path_to_template>
wherepython
is your Python interpreter (some operating systems usepython3
as the alias to the Python 3.x interpreter).
You can specify the name of the output file as follows:
python -m chatette<path_to_template>-o<output_directory_path>
<output_directory_path>
is specified relatively to the directory from which the script is being executed.
The output file(s) will then be saved in numbered.json
files in<output_directory_path>/train
and<output_directory_path>/test
.If you didn't specify a path for the output directory, the default one isoutput
.
Other program arguments and are describedin the wiki.
TL;DR:main selling point:it is easier to deal with large projects usingChatette,and you can transform mostChatitoprojects into aChatetteone without any modification.
A perfectly legitimate question is:
Why doesChatetteexist whenChatitoalready fulfills the same purposes?
The two projects actually have different goals:
Chatitoaims to be a generic but powerfulDSL,that should stay very legible. While it is perfectly fine for small projects, when projects get larger, the simplicity of itsDSLmay become a burden: your template file becomes overwhelmingly large, to the point you get lost inside it.
Chatettedefines a more complexDSLto be able to manage larger projects and tries to stay as interoperable withChatitoas possible. Here is a non-exhaustive list of featuresChatettehas and thatChatitodoes not have:
- Ability to break down templates intomultiple files
- Possibility to specify theprobability of generating some partsof the sentences
- Conditional generation of some partsof the sentences, given which other parts were generated
- Choice syntaxto prevent copy-pasting rules with only a few changes and to easily modify the generation behavior of parts of sentences
- Ability todefine the value of each slot (entity)whatever the generated example
- Syntax for generating words withdifferent casefor the leading letter
- Argument supportso that some templates may be filled by different strings in different situations
- Indentation is permissiveand must only be somewhat coherent
- Support forsynonyms
- Interactive command interpreter
- Output for Rasa inJSONor inMarkdownformats
As theChatette's DSL is a superset ofChatito's one, input files used forChatitoare most of the time completely usable withChatette(not the other way around). Hence, it is easy to start usingChatetteif you usedChatitobefore.
As an example, thisChatitodata:
// This template defines different ways to ask for the location of toilets (Chatito version)
%[ask_toilet]('training': '3')
~[sorry?] ~[tell me] where the @[toilet#singular] is ~[please?]?
~[sorry?] ~[tell me] where the @[toilet#plural] are ~[please?]?
~[sorry]
sorry
Sorry
excuse me
Excuse me
~[tell me]
~[can you?] tell me
~[can you?] show me
~[can you]
can you
could you
would you
~[please]
please
@[toilet#singular]
toilet
loo
@[toilet#plural]
toilets
could be directly given as input toChatette,but thisChatettetemplate would produce the same results:
// This template defines different ways to ask for the location of toilets (Chatette version)
%[&ask_toilet](3)
~[sorry?] ~[tell me] where the @[toilet#singular] is [please?]?
~[sorry?] ~[tell me] where the @[toilet#plural] are [please?]?
~[sorry]
sorry
excuse me
~[tell me]
~[can you?] [tell|show] me
~[can you]
[can|could|would] you
@[toilet#singular]
toilet
loo
@[toilet#plural]
toilets
TheChatitoversion is arguably easier to read, but theChatetteversion is shorter, which may be very useful when dealing with lots of templates and potential repetition.
Beware that, as always with machine learning, having too much data may cause your models to perform less well because of overfitting. While this script can be used to generate thousands upon thousands of examples, it isn't advised for machine learning tasks.
Chatetteis named afterChatito:-ettein French could be translated to-itaor-itoin Spanish. Note that the lasteinChatetteis not pronouced (as is the case in "note" ).
For developers, you can clone therepoand install the development requirements:
pip install -r requirements/develop.txt
Then, install the module as editable:
pip install -e <path-to-chatette-module>
-
Runpylint:
tox -e pylint
-
Runpycodestyle:
tox -e pycodestyle
-
Runpytest:
tox -e pytest
Disclaimer: This is a side-project I'm not paid for, don't expect me to work 24/7 on it.
Many thanks to them!