|
|||
Notes on Formal Language Design by Crutcher Dunnavant current status "Work in Progress" last updated 2005-04-14 Abstract
A great deal of work has been done in Linguistics, Semiotics, Semantics,
Computer Science, and Mathematics towards developing methods for
analyzing the articulations of formal languages, describing their semantic
fields, and the relationships between them, and producing translators and
interpreters by which these languages may be given impetus to affect the
world. By contrast, very little work has been done to provide teachable
techniques for the design and development of the semantic fields and
articulations of formal languages. This work attempts to address some of
the issues in this space. Trading on the incredible importance of formal
languages in a technical society, any addition to the field of language
design would have an immense value.
Preface
This document is much more than a draft, it is the kernel of my doctoral
dissertation. As such, it will undergo many, many revisions during the next
several years. Owing to positive past experiences, I hold continuous
review in exceedingly high esteem, so the most current version of this
document will always be online. Sections will frequently change, or be
woefully incomplete, and spell checking, which I view as a cleanup process,
will be done with incredible infrequency. I welcome your feedback, just
not on the spelling.
1. Introduction
1.1. What is a Formal Language?
In which is described Natural and Formal Language, and in which the
decision process of "Does a Formal Semantics exist for a language?" is
given to distinguish between the two.
1.2. Why Formal Language Design is Needed
In which Formal Language Design is placed in context with other fields.
1.3. Who Should Read This Book
In which the educational background of the assumed reader is described,
along with needs which this book can meet for different individuals.
2. Do you need a new Formal Language?
Overview: A discussion which will guide the reader through a cost/benefit analysis of
the language the reader wishes to implement. Note that this discussion
must be biased towards "No, you don't need a new language"; most situations
do not require a new language (implementation and education costs will
never be free) and the act of proving that a new language is
needed, by listing points of inadequacy of existing languages, will have
the side effect of pre-labeling most of the points of articulation around
which the new language should be structured (though the reader may not
realize this while they construct their proof).
2.1. A Language Creation Decision Process
In which is presented a process by which can be decided if a new language
is indicated. This process will generate a collection of needs which will
be used in later stages.
3. Linguistics for Formal Language Design
3.1. History
In which is presented a framework for understanding the history of
linguistics up to the present, with citations and references to major
works and influential writers. Including: de Saussure [saussure:linguistics], Chomsky, Derrida,
Lyons, etc.
3.2. Signification
In which is presented a sketch of Semiotics.
3.3. Dialog
In which is presented a sketch of Semantics.
3.4. Differentiation and Analogical Reasoning
In which is presented the forces of differentiation and analogical
reasoning on language, and the dynamic balance which exists between them.
Including an attempt to characterize languages which are in balance (good,
effective, efficient) against those which are not (verbose, ambiguous).
3.5. A Linguistic Framework
In which is presented Semiotics, Semantics, Grammar, and Transformation as
framework for the analysis and understanding of language.
4. Semiotics for Formal Language Design
Overview: A presentation of Semiotics, followed by a discussion which guides the
reader through an evolution of their proof of need to a description of the
paradigmatic fields of their language. The presentation of Semiotics should be
grounded in theory (with appropriate references for deeper study), but
extremely light on history.
4.1. The Structure of the Sign
Overview: an introduction to the internal structure of the sign, denotation,
connotation, and paradigmatic structural connotation.
4.2. The Arbitrariness of the Sign
Overview: an introduction to the Arbitrariness of the Sign, motivated and
un-motivated sign choice, and more and less relatively motivated languages.
4.3. Oppositional Relationships
Overview: introduction to semantic fields / paradigms.
4.4. Positional Relationships
Overview: introduction to syntactic relationships.
4.5. Lexical Fields
The sense of a lexeme is therefore a conceptual area
within a conceptual field, and any conceptual area that is
associated with a lexeme, as its sense, is a concept.
[lyons:semantics1 pp. 254]
Additionally, the set of lexemes which collectively cover a conceptual field make up the covering lexical field. So, to use the cononical example, if we wish to discuss color in a language, then the collection of all conceptual understandings of color make up the conceptual field of color; and we break this field up into various conceptual areas, each of which we associate with a lexeme. The set of these lexemes make up the lexical field of color in the language. 5. Semantics for Formal Language Design
Overview: A presentation of Semantics, followed by a discussion which guides the
reader through an evolution of their proof of need and paradigmatic fields to a
description of the syntactic relationships of their language. The
presentation of Semantics should be grounded in theory (with appropriate
references for deeper study), but extremely light on history.
5.1. A Pattern Language for Computer Language Structure
In which is presented a Pattern Language for Computer Language Structure,
as an extension of the language structure work in
A Pattern Language for Language
Implementation.
This is to guide language semantics within structures which are
common in Computing.
6. Grammar for Formal Language Design
Overview: A presentation of Grammar, followed by a discussion which guides the reader
through an evolution of their proof of need and paradigmatic and syntactic
fields to a developed formal grammar in a known family of parsable
languages. The presentation of Grammar should be grounded in theory (with
appropriate references for deeper study), but extremely light on history.
7. Transformation for Formal Language Design
Overview: A presentation of Language Transformation, followed by a discussion which
guides the reader through an evolution of their proof of need, paradigmatic
and syntactic fields, and developed formal grammar to a description of a
transformation system for giving impetus to their language's semantics.
The presentation of Transformation should be grounded in theory (with
appropriate references for deeper study), but extremely light on history.
The presentation of Transformation should study mention non-deterministic
transformation, but should focus on deterministic transformation.
Interpreters shall be considered a form of transformation, as the target
language is syntactically structured in time, rather than space.
8. Testing Formal Languages
Overview: The development of a new Formal Language should not stop with the
completion of a transformation environment.
The main purpose of developing a new Formal Language is for providing good
semantic compression for a given domain, so it now becomes necessary to
test the language.
This section details how one tests and debugs a new language in parallel with its implementation, techniques include:
9. A Language Design Process
Overview: A description of a full design process for Formal Language design
and maintenance.
9.1. The Language Waterfall
In which is presented the Waterfall process, explicitly tuned for the needs
of language design.
9.2. The Language Lifecycle
In which is presented a lifecycle of language evolution, from initial
design, rapid prototyping, and deployment, through refactoring, performance
tuning, and mature feature integration; all the way to graceful obsolescence
and replacement integration.
10. Language Design in the Software Development Process
Overview: A speculative discussion of how to integrate language design into
the software development lifecycle.
10.1. Semantic Abstraction
The basic idea: strong programmers / architects write compilers for
application specific languages (ASLs), everyone else writes the application
in the ASLs. Benefit: strong programmers acts as a multiplier on everyone
else's talent.
10.2. Semantic Compression
One stage of the development cycle is added which attempts to maintain a
feature set while increasing documentation and reducing total token count.
(Done by adding generation layers in cheap languages.)
11. Glossary
Overview: A complete glossary of all technical terms used in the document, with
references to their point of introduction.
Appendix A. Normative Linguistics
Part of the evolution of modern linguistics has been a deliberate movement away from Normative Linguistics towards Positive Linguistics, a democratization in the comparative study of language; built upon a de-emphasis of the importance of a culture's economically and educationally preferred proper and literature languages (that language which evolves as the proper written variant of a culture's language). It has been a principle of comparative linguistics that languages are not better or worse, but only different. While this process has produced cleaner discussions of social sub-groups and class systems, and has greatly aided the teaching of language (and, indeed, other subjects, as teaching material is now sometimes modified for various dialects), it leaves the modern language designer lacking a basic vocabulary for making value judgments while comparing languages, a problem which we seek to address in the development of an art of language design. 1. Efficiency
Our first concept will therefore be efficiency. The efficiency of a language varies inversely with the expected length of a statement of that language. Notionally, we can define the expected length as the sum of over every possible statement in a language of the statement's length multiplied by the statement's probability of occurring; practically, as many languages are capable of producing an infinite number of statements, this is not a metric which we are likely to every calculate, so we will settle for estimators of the expected length. All other things being equal, we prefer more efficient languages to less efficient ones. When comparing two languages covering the same domain, the language with the lower expected length is more efficient. 2. Balance
Our second concept will be balance, but this will require the ancillary concepts of linguistic distance and semantic distance be defined. Our first concept will therefore be linguistic distance. We shall say that the linguistic distance between two utterances is the edit distance between, not their lexical representation (which is linear), but their structural representation (the concrete syntax tree for a given expression). While it would be possible to mathematically describe the edit distance between to statements in this way, it will not be necessary for our purposes. Note: The concept of edit distance is much discussed in the field of computer science as it applies to strings, and we abstract it here to a general form - the edit distance between two statements is the minimal number of edit operations (often given as replacement, addition, and subtraction) which need to be applied to one statement in order to produce the other. Our second concept will be that of semantic distance. We shall say that the semantic distance between two statements is the edit distance between their meaning, deep structure, or model form. Unfortunately, this will always be an ambiguous definition, but in any given formal language context, it should be possible to roughly describe the model form (the non-serialized, multi-dimensional structural form which the language models) which statements in such languages communicate. Now, given linguistic distance and semantic distance, we are ready to discuss balance. A language is balanced to the extent that the expected linguistic distance between two statements is proportional to the expected semantic distance. 3. Putting it Together
Therefore, we desire balanced efficient languages; and when comparing two languages for a given domain, we will prefer the language which is more balanced and efficient, though we must make judgment calls when one language is more balanced, and one is more efficient. Appendix B. General Properties of Language
In discussing the question, "What is Language?", linguists frequently resort to an attempt to describe the characteristic features which any language must posses. While many features have been proposed as general properties, only four are accepted by all schools of linguistics. These four properties are: Arbitrariness, Duality, Productivity, and Discreteness[lyons:semantics1 pp. 70-79]. 1. Arbitrariness
The arbitrariness of the sign[saussure:linguistics pp. ??],
upon which the study of Semiotics is based, is a core feature of language.
This is a complex concept, and needs further development.
2. Duality
Duality, or double-articulation, is the property where by the discrete
elements of a language expression themselves make up second level language
elements. In text, this would be the lexical and grammatical levels.
Duality greatly enables Productivity.
[lyons:semantics1 pp. 71-72]
3. Productivity
By productivity, as we shall employ the term, is meant that property of the
language-system which enables native speakers to construct and understand
an indefinitely large number of utterances, including utterances that they
have never previously encountered.
[lyons:semantics1 pp. 78]
4. Discreteness
The term discreteness applies to the signal-elements of a semiotic system.
If the elements are discrete, in the sense that the difference between them
is absolute and does not admit of graduation in terms of more or less, the
system is said to be discrete; otherwise it is continuous.
[lyons:semantics1 pp. 78]
|
Misc
Groups
Business
Kick Ass Web Services
Publications
Reference
|
||