Sentgen requires a text file (usually suffixed ".grm") that contains a phrase-structure grammar script. The following outlines what is required in that script. Sentgen works on a command-line simply: sentgen grammarfile.grm -e noexamples -s seed -h humanoutput -t tlearnoutput -v (verbose)
You can create comments using !
Each .grm script/file must be headed by a sentence node ("highest-ranked" constituent)
Terminal nodes are defined as elements for which no rewrite rule exists. In the above example, dog, cat, eats, sleeps would be chosen randomly (uniformly) for print-out.
Probabilities for rewrite are generated by using a period after the constituent, as follows:
In the above example, a prepositional phrase gets printed with .3 probability.
You can include options within nodes as follows:
Self-explanatory. You can have any number of options:
Note that the option only applies to elements delimited within commas. In the previous example, transitive and intransitive verbs are the options, and NP and PP have their own respective rewrite probabilities.