Troff version at unlser1.unl.csi.cuny.edu
Bill Campbell and Joel Spolsky (joelonsoftware.com) state to have be informed personally by Doug Klunder that this text is in the "public domain" - see copyright info. I claim no rights to my HTML version (byteshift webdesign/info@byteshift.de), yet a backlink would be appreciated, should you want to mirror this page.
This document describes a set of naming conventions used by the IEMIS project in development of the software. The initial naming conventions where taken from a NAMING CONVENTIONS document authored by Doug Klunder at MicroSoft. These conventions commonly go by the name "Hungarian," referring both to the nationality of their original developer, Charles Simonyi, and also to the fact that to an uninitiated programmer they are somewhat confusing. Once you have gained familiarity with Hungarian, however, we believe that you will find that the clarity of code is enhanced. For convenience, this memo first describes how to use Hungarian, and then describes why it is useful; the general approach is from a programming viewpoint, rather than a mathematical one.
Hungarian is largely language independent; it is equally applicable to a microprocessor assembly language and to a fourth-generation database application language (and has been used in both). However, there is a little flavor of C, in that arrays and pointers to arrays are not clearly dis tinguished. While this may sound confusing, in practice there is little ambiguity.
< prefix > < base type > < qualifier >
The most common type of identifier is a variable name. All variable names are composed of three elements: prefixes, base type, and qualifier. (These are also referred to as constructors, tag, and qualifier). Not all elements are present in all variable names; the only part that is always present is the base type. This type should not be confused with the types supported directly by the programming language; most types are application specific. For example, an 1b1 type could refer to a structure containing symbol information; a co could be a value specifying a color.
Type that are not defined must be added As the above exam- ples indicate, tags should be short (typically two or three letters) and somewhat mnemonic. Because of the brevity, the mnemonic value will be useful only as a reminder to someone who knows the application, and has been told what the basic types are; the name will not be sufficient to inform (by itself) a casual viewer what is being referred to. For example, a co could just as easily refer to a geometric coordinate, or to a commanding officer. Within the context of a given application, however, a co would always have a specific meaning; all co's would refer to the same type of object, and all references to such an object would use the term co.
One should resist the natural first impulse to use a short descriptive generic English term as a type name. This is almost always a mistake. One should not preempt the most useful English phrases for the provincial purposes of any given version of a given program. Chances are that the same generic term could be equally applicable to many more types in the same program. How will we know which is the one with the pretty "logical" name, and which have the more arbitrary variants typically obtained by omitting various vowels or by other disfigurement? Also, in communicating with other pro- grammers, how do we distinguish the generic use of the com- mon term from the reserved technical usage? In practice, it seems best to use some abbreviated or form of the generic term, or perhaps an acronym. In speech, the tag may be spelled out, or a pronounceable nickname may be used. In time, the exact derivation of the tag may be forgotten, but its meaning will still be clear.
As is probably obvious from the above, it is essential that all tags used in a given application be clearly documented. This is extremely useful in helping a new programmer learn the code; it not only enables him (or her) to decode the otherwise cryptic names, but it also serves to describe the underlying concepts of the program, since the data types tend to determine how the program works. It is also worth pointing out that this is not nearly as onerous as it sounds; while there may be tens of thousands of variables in a program, the number of types is likely to be quite small.
Although most types are particular to a given application, there are a few standard ones that appear in many different ones; synonyms for these types should never be used:
There are some more types that appear in many applications; they should only be used for the most generic purposes:
There a few types that are used widely within the applications group, but may not be applicable to others:
Base types are not by themselves sufficient to fully describe the type of a variable, since variables often refer to more complex items. The more complex items are always derived from some combination of simple items, with a few operations. For example, there may be a pointer to an lbl, or an array of them, or a count of co's. These operations are represented in Hungarian by prefixes; the combination of the prefixes and base type represent the complete type of an entity. Note that a type may consist of multiple prefixes in addition to the base type (e.g., a pointer to a count of co's); the prefixes are read right to left, with each prefix applying to the remainder of the type (see examples below). The term constructor is used because a new type is constructed from the combination of the operation and the base type.
In theory, new prefixes can be created, just as new types are routinely created for each application. In practice, very few new prefixes have been created over the years, as the set that already exists is rather comprehensive for operations likely to be applied to types. Prefixes that have been added tend to deal with the specifics of machine architecture, and are variations on existing prefixes (i.e., different flavors of pointers). Once can go overboard in refusing to create a new prefix, however; some new concepts really are logically expressed as prefixes, not types. A couple of examples of incorrect usage in the list below derived from the reluctance to create a new prefix.
The standard prefixes are:
Since the prefixes and base types both appear in lower case, with no separating punctuation, ambiguity can arise. Is pfc a tag of its own (e.g., for a private first class), or is it a pointer to an fc? Such questions can be answered only if one is familiar with the specific types used in a program. To avoid problems like this it is often wise to avoid creat- ing base type names that begin with any of the common pre- fixes. In practice, ambiguity does not seem to be a prob- lem. The idea of additional punctuation to remove the ambi- guity has been shown to be impractical.
The following list contains both common and rarer usages:
While the prefixes and base type are sufficient to fully specify the type of a variable, this may not be sufficient to distinguish the vable. If there are two variables of the same type within the same context, further specification is required to disambiguate. This is done with qualifiers. A qualifier is a short descriptive word (or facsimile; good English is not required) that indicates what the variable is used for. In some cases, multiple words may be used. Some distinctive punctuation should be used to separate the qual ifier from the type; in C and other languages that support it, this is done by making the first letter of the qualifier upper-case. (If multiple words are used, the first letter of each should be upper-case; the mainder of the name, both type and quali always lower-case. There is one special case to watch out for; defined constants specifying the size of a type are often of the form cbFOO or cwFOO, where foo is the type. Strictly speaking only the F in FOO should be capitalized, but the incorrect usage is fairly common.)
Exactly what constitutes a naming context is language specific; within C the contexts are individual blocks (com pound statements), procedures, data structures (for naming fields), or the entire program (globals). As a matter of good programming style, it is not recommended that hiding of names be used; this means that any context should be con sidered to include all of its subcontexts. (In other words, don't give a local the same name as a global.) If there is no conflict within a given context (only one variable of a given type), it is not necessary to use a qualifier; the type alone serves to identify the variable. In small con texts (data structures or small procedures), a qualifier should not be used except in case of conflict; in larger contexts it is often a good idea to use a qualifier even when not necessary, since later modification of the code may make it necessary. In cases of ambiguity, one of the vari ables may be left with no qualifier; this should only be done if it is clearly more important than the other vari ables of the same type (no qualifier implies primary usage).
Since many uses of variables fall into the same basic categories, there are several standard qualifiers. If applicable, one of these should be used, since they specify meaning with no chance of confusion. In the case of multi ple word qualifiers, the order of the words is not crucial, and should be chosen for clarity; if one of the words is a standard qualifier, it should probably come last (unfor tunately, this suggestion is by no means uniformly fol lowed). The standard qualifiers are:
for(ich=ichFirst; ich<=ichLast; ich++)
for (cp=cpFirst,cpLim=cpFirst+dcp; cp<cpLim; cp++)
for(pch=pchMin; pch)
for (pch=pchMin; pch<=pchMost; pch++)
Note that the above qualifiers have a strict relationship: Min<=Mic<=First<=Last<=Most<Lim<=Mac<=Max
envSav=envMem;
if (SetJmp(&envMem))
envMem=envSav;
rwSav=rwAct;
for (rwAct=rwFirst; rwAct<=rwLast; rwAct++)
rwAct=rwSav;
When possible, structure members are named the same way variables are. Since the context is small (only the structure), conflicts are less likely, and qualifiers are often neither needed nor used. If the language does not support separate contexts for each structure (e.g., masm), the structure name is appended to the member as a qualifier. Thus, the following declarations are equivalent (the one on the left is for C, the one on the right for masm):
typedef struct FOO
struc {
pchFoo dw ?
char *pch;
wFoo dw ?
int w;
rgchFoo db 10
dup(?)
} FOO;
In some cases, one type is a special instance of another type. When this is the case, the special instance names should consist of the base instance name plus a character. For example, in Word there is a base type of CHR (character run); special instances are CHRF (formula character run), CHRT (tab character run), and CHRV (vanished character run).
Unfortunately, the simple rules used for variable names do not work as well for procedures. Whereas the type of a variable is always quite important, specifying how that variable may be used, the important part of a procedure is typically what it does; this is especially true for procedures that don't return a value. In addition, the context for procedures is usually the entire program, so there is more chance for conflict. To handle these issues, a few modifications are made to the simple rules:
Macros should be handled exactly the same way as procedures; for historical reasons, you may find some macros that do not follow the correct rules (e.g., min, bltbyte).
Labels can be considered to be a variant on procedures; they are after all effectively identifiers specifying a chunk of code. Within C, they are named similarly to procedures; they obviously neither return a value nor take parameters, so no types are specified. The first letter is upper case, and the name itself is just a few words specifying the condition that causes the label to be reached (either by falling though, or via a goto). Since the context of a label is limited to its procedure, these can be pretty generic terms; typical examples are GotErr, OutOfMem, LoopDone.
Within assembly, labels are somewhat trickier. First off, there are many more labels used. Second, depending on the assembler, all labels may have global (or at least filewide) context. To deal with these constraints, the rules may be modified somewhat. For labels that are inserted solely because of assembler constraints (i.e., jumps corresponding to high level control flow constructs), temporary labels should be used. If the assembler supports true temporary labels (valid only within the current
procedure, or up to the next global label), they should be used, in ascending numeric order. If true temporary labels are not available, the most common convention is to use the initials of the procedure, followed by a number, in ascending order. Of course, gaps should be left between numbers to facilitate later modification (initially setting to multiples of 10 works well). This is far from perfect, and can create conflicts between procedures that have the same initials; some people prefer to give all labels, temporary or not, full English names for clarity. For labels that correspond to true C labels, C conventions can be used; to avoid conflict, it is often useful to prefix with the procedure initials.
As much as possible, defined constants should look just like variables of the same type. For many types, defined constants will exist for the Nil, Max, Min, and/or Last values. The program text will read exactly as if they are variables. There are three common exceptions, all originating in the mists of time, and unlikely to change soon. NULL is defined to be 0, and is used with all pointer types; TRUE and FALSE are defined to be 1 and 0, and are used with f types (correct Hungarian, practiced by some projects uses fTrue and fFalse instead of TRUE and FALSE).
There are often types for which each value is a defined constant; these are essentially equivalent to enumeration types supported by some languages (including some variations of C). These are typically types used for table- drive algorithms, for specifying options to a procedure, or specifying possible return values. Note that these types are in fact separate types; they are not all examples of the same type, nor are they values for the w type. Since they are special purpose (you can't pass an option for procedure x to procedure y with any meaning), they must be a new type. You can of course use your own method to name these types; it is often convenient to just use the initials of the procedure that takes or returns the type (watching out for conflicts). since they are type quantities, the type must be present in the names; possible values for colors are not RED, BLUE, YELLOW, GREEN, BLACK, etc., but rather could be coRed, coBlue, coYellow, coGreen, coBlack, etc.
Each structure is almost by definition its own type, and should be names as a type (two or three letters with some possible mnemonic value). By convention in C, the entire type name is capitalized in the definition of the type. The same rules apply to unions. Many of the projects find it convenient to include typedef's for each structure type; this means that the word struct is not included in declarations, and allows the redefinition of the type to an array, or even a simple type, without having to change all the declarations. Typedef's may also be suitable for non- structure types, particularly any that are not simple int's.
The two questions are actually very closely related; answering the first will usually give an answer to the second, since knowing the goals allows one to see how closely a solution will meet the goals. For naming conventions, many of the goals are well-known, if not often formalized; most good programmers already attempt to meet the goals in a variety of ways.
An important need in naming objects in a program is the ability to remember what the name is, so that when the objects used, the programmer can quickly determine the name (which is the only way it can be used). Traditionally, this need has been met by using descriptive names for variables; for a given programmer working continually on a given program this is usually adequate. Problems arise, however, when a different programmer works on the project, or when the same programmer returns after a hiatus. What was once descriptive now has to be relearned. Hungarian helps some- what in this respect, though it is not complete. The first part of a variable name can always be determined with no effort (it is the type), and if it is a standard use, the
qualifier can also be determined (since it is one of the standard qualifiers). Non-standard qualifiers and procedure names can not be immediately determined; however, the situation is certainly no worse than the traditional situation, since the qualifier or procedure name has as much descriptive value as a traditional name. Furthermore, since there are fewer names that must be remembered (since one need not remember the standard ones), it is easier to remember them.
At least as important as being able to go from an object to a name (the mnemonic value) is the ability to go from a name to an object (the suggestive value). This is most important when reading code written by someone else; this affects almost all programs today, either because multiple people are working on them, or because they are outgrowths of earlier programs. Again, the traditional approach has been to use names descriptive in some manner; Hungarian again improves the situation somewhat. For the relatively small cost of learning the types used in a given program, a reader gains a much better understanding of what the program does, since the types used in a statement often help determine the meaning of the statement. This is enhanced even more by the use of standard qualifiers; again, the non-standard qualifiers are at least as clear as the traditional names.
Partially an aesthetic idea ("the code looks better"), this is also an important concept for the readability of code. Just as Americans often have an extremely difficult job following the action of Russian novels, since the same charac- ter goes by many different names, a programmer will have a difficult time understanding code in which the same object is referred to in unrelated ways. Similarly, it is confusing to find the same name referring to unrelated objects. This is a serious problem in traditional contexts, since English is a rich enough language to have many terms that roughly describe the same concept, and also terms that can describe multiple concepts. This problem is exacerbated when programmers resolve name conflicts by use of abbreviations, variant spellings, or homonyms; all of these methods are prone to accidental misuse, through typographical errors or simple failure to understand subtle differences. Hungarian resolves this problem by the use of detailed rules; since all names are created using the same rules, they are consistent in usage.
It is desirable to minimize the amount of time spent on determining names; in a sense this is wasted time, since getting the "right" name doesn't improve the program's efficiency or functionality. Since the traditional naming methods rely on good descriptions to meet the above goals, a programmer has to spend a goodly amount of time to in fact invent good descriptions; speedy name decisions are likely to result in unmnemonic, unsuggestive, or inconsistent names. In Hungarian, on the other hand, only a few name the same names, everyone's code will be similar, and therefore easy to read and modify. Traditional naming schemes are extremely unlikely to reach this goal, since English has far too many ambiguities to expect different individuals to describe things in identical terms. It would be naive to expect that Hungarian will cause all programmers to write code identically, or even to use identical names. The names are likely to be much more similar, however, since they are composed using the same rules, with the same types and standard qualifiers.
Hungarian is a useful set of rules used to determine the names used in a program. There is no denying that it takes a little time to become familiar with it; true enlightenment comes only with effort. We strongly believe the results are worth the effort. The Applications Development group has been using Hungarian since its inception in 1981, and people at Xerox PARC were using it even earlier. The consistent use of Hungarian makes the programmer's job easier; it is both easier to write in Hungarian (there are fewer superfluous choices to make) and easier to read and modify existing code. The set of conventions is sufficient to deal with most current situations by itself; it has also proven adaptable to changes in the programming environment. Perhaps the best testimonial for Hungarian is the fact that a number of pro- grammers have continued to use Hungarian even after leaving the jobs in which they encountered it; they have felt that the advantages were great enough to warrant the effort necessary to promote its use elsewhere. We hope that you will feel this way as well, once you become familiar with Hungarian's usage.
Date Action
09/04/87 Original (DBK)
09/15/87 Moved to word; added some rarer types and pre-
fixes and cleaned up other definitions in
response to feedback. (DBK)
01/18/88 Added v type, sh prefix, and explanation of ha
and 1 as prefixes applied to p for huge and far
pointers. Also clarified use of From in pro-
cedure names. (DBK)
04/10/90 Moved to nroff, changed references to OSAC to
IEMIS (WKC)
Date: Fri, 20 Sep 91 15:12:53 -0700 From: sgihbtn!billc@uunet.UU.NET (Bill Campbell) Message-Id: <9109202212.AA04579@shared> To: uunet!umiacs.UMD.EDU!dalamb@uunet.UU.NET Subject: Re: Hungarian Notation References ... I have just talked with Doug Klunder at Microsoft. He says that publication of his paper is perfectly okay, they have not copyrighted it. It is, of course, their preference that the document be respresented as his/MS's work. From: "Joel Spolsky" <spolsky@fogcreek.com> To: <info@byteshift.de> Sent: Thursday, May 12, 2005 12:57 AM Subject: RE: joelonsoftware.com/articles/Wrong.html - Hungarian Notation/D. Klunder: link to clean HTML version ... Also I have heard personally from Doug Klunder himself that this file is in the "public domain".byteshift: webdesign in berlin imprint