Zenlish
¶ ↑
What is Zenlish ?¶ ↑
Zenlish
= Zen + English
Zenlish will be a Controlled Natural Language based on English.
A Controlled Natural Language is a subset of a natural language -here English- limited to specific problem domains.
What is the purpose of Zenlish ?¶ ↑
The goal of this project is to implement a toolkit for a subset of the English language. With Zenlish it should be possible for a Ruby application to interact with users with a language that is close enough to English.
Project status¶ ↑
The project is still in inception. Currently, zenlish is able to parse all sentences from lessons 1-A up to 3-G from {Learn These Words First}[http://learnthesewordsfirst.com/].
The parser is able to cope with syntactical ambiguities generating parse forests instead of parse trees.
The intent is to deliver gem versions in small increments.
Zenlish
as a library (gem)¶ ↑
Over time, the zenlish gem will contain: - A tokenizer (tagging, lemmatizer) - A lexicon [STARTED] - A context-free grammar [STARTED] - A parser [STARTED] - Feature unification (for number, gender agreement,…) - A simplified ontology
Some project metrics (v. 0.2.00)¶ ↑
|Metric|Value|
|:-:|:-:| | Number of lemmas in dictionary | 141 | | Coverage 100 commonest English words | 61% | | Number of production rules in grammar | 185 | | Number of lessons covered | 23 | | Number of sentences in spec files | 352 |
Installation…¶ ↑
…with Rubygem¶ ↑
Install the gem yourself as:
$ gem install zenlish
…with Bundler¶ ↑
Add this line to your application's Gemfile:
gem 'zenlish'
And then execute:
$ bundle
Some code snippets¶ ↑
Interacting with the dictionary:¶ ↑
require 'zenlish' # Retrieving a "word" (more precisely, a lexeme) from the dictionary. lexeme = Zenlish::Lang::Dictionary.get_lexeme('move') # What is the Ruby class of a lexeme? p lexeme.class # => Zenlish::Lex::Lexeme # What is the word class of verb 'move'? p lexeme.wclass.class # => Zenlish::WClasses::RegularVerb # Here is some Zenlish text to analyze: some_text = 'one person can move to the same place.' p some_text some_text.scan(/(?:\w+)|[\.,:"]/).each do |entry| lexeme = Zenlish::Lang::Dictionary.get_lexeme(entry) p lexeme.wclass.class end # Loop result should be: # Zenlish::WClasses::Cardinal # Zenlish::WClasses::CommonNoun # Zenlish::WClasses::ModalVerbCan # Zenlish::WClasses::RegularVerb # Zenlish::WClasses::Preposition # Zenlish::WClasses::DefiniteArticle # Zenlish::WClasses::Adjective # Zenlish::WClasses::CommonNoun # Rley::Syntax::Terminal
Demo of lexeme inflections
# Demo inflection (aka declension, conjugation) require 'zenlish' # The Zenlish dictionary is more than a list of words... dict = Zenlish::Lang::Dictionary # What are the spellings of a given common noun? noun_body = dict.get_lexeme('body') p noun_body.all_inflections # => ["body", "bodies"] # What are the word forms of a personal pronoun (3rd person)? p_3rd_pn = dict.get_lexeme('it') p p_3rd_pn.all_inflections # => ["she", "her", "he", "him", "it", "they", "them"] # What are the distinct forms of a regular verb? vb_touch = dict.get_lexeme('touch') p vb_touch.all_inflections # => ["touch", "touching", "touched", "touches"] # What are the forms of the (highly) irregular verb be? vb_be = dict.get_lexeme('be', Zenlish::WClasses::IrregularVerbBe) p vb_be.all_inflections # => ["am", "being", "was", "been", "are", "were", "is"]
More to come…
Principles behind the Zenlish
language¶ ↑
Minimalism¶ ↑
The name of the language is a combination of 'Zen' and 'English'.
It reflects a desire to make Zenlish
a simple language:
- The focus is put on a simplified syntax, - A limited lexicon. Priority on most commonly used words.
Expressiveness¶ ↑
Zenlish
should be rich enough to express ideas, facts in a fluid way (vs. contrived, artificial way). Litmus test: a Zenlish
text should be easy to read to a English reading person.
Roadmap¶ ↑
Here a tentative roadmap:
A) Ability to parse sentences from Learn These Words First¶ ↑
STARTED. 24% complete
This website advocates the idea of a multi-layered dictionary. At the core, there are about 300 essential words.
The choice of these words is inspired by the semantic primitives of {NSM (Natural Semantic Metalanguage)}[https://en.wikipedia.org/wiki/Natural_semantic_metalanguage].
The essential words are introduced in twelve lessons. Each lesson put the words in exemplar sentences and pictures.
The milestone sub-goals are: - To inject the 300 core words into Zenlish
lexicon, - Zenlish
should be able to parse all the example sentences
B) Associate lexical features to terms in lexicon¶ ↑
STARTED The sub-goals are: - To enrich the lexicon entries with lexical and syntactical features. - Zenlish
should be able to derive the declensions of nouns, conjugation of verbs, - Also Zenlish
should detect agreement errors - Ideally, Zenlish
should have a lemmatizer
C) Enrich lexicon entries with semantical features and relationships¶ ↑
The sub-goals are: - To enrich the lexicon entries with lexical and syntactical features. - Zenlish
should be able to derive the declensions of nouns, conjugation of verbs, - Also Zenlish
should detect agreement errors
D) Build a generic ontology and map Zenlish
text to it.¶ ↑
The sub-goals are: - To have a simplified ontology that covers the concepts covered in the lesson sentences. - Hopefully Zenlish
should be answer to queries related to the lesson sentences.
E) Capability to parse a complete book¶ ↑
A good candidate book is “The Edge of the Sky” by Roberto Trotta (ISBN 978-0-465-04471-9 : hardcover, ISBN 978-0-465-04490-0 : ebook).
Professor Trotta challenged himself by writing a book on Cosmology with the 1000 most used words. More details here.
In order to achieve this goal, Zenlish
should: - Incorporate the 1000 words in its lexicon - Have a grammar that allows the parsing of the sentences in the book.
F) Capability to interpret the meaning of a complete book¶ ↑
Probably, far-fetched. But it will be nice to launch query to Zenlish
to check if it has some understanding of the text it reads (i.e. has a semantic representation).
Usage¶ ↑
TODO: Write usage instructions here
Contributing¶ ↑
Bug reports and pull requests are welcome on GitHub at github.com/famished-tiger/Zenlish. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License¶ ↑
The gem is available as open source under the terms of the MIT License.
Code of Conduct¶ ↑
Everyone interacting in the Zenlish
project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.