Entities..
XML documents can be made of information drawn from different files. These pieces of information are called entities. It might be easier to think of entities as a macro for programmers, or as aliases for more complex functions. A single entity name can take the place of a whole lot of text. Entity references cut down on the amount of typing you have to do because anytime you need to reference that bunch of text, you simply use the alias name and the processor will expand out the contents of the alias for you.
Entities allow to refer to other data and pages as shortcuts, so that declaring the same information in a document or DTD is not necessary. Entity declarations allow you to associate a name with some other fragments of the document. That construct can be a chunk of regular text, a chunk of the document type declaration, or a reference to an external file containing either text or binary data.
Entities are declared in the DTD, similar to elements and attributes.
Parsed vs. Unparsed Entities
Entities may be either parsed or unparsed. A parsed entity’s contents are referred to as its replacement text; this text is considered an integral part of the document. An unparsed entity is a resource whose contents may or may not be text, and if text, may not be XML. Each unparsed entity has an associated notation, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities. Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of ENTITY or ENTITIES attributes.
General Entities vs. Parameter Entities
General entities (or simply entities) are entities for use within the document content. Parameter entities are parsed entities for use within the DTD. These two types of entities use different forms of reference and are recognized in different contexts. Furthermore, they occupy different namespaces; a parameter entity and a general entity with the same name are two distinct entities.
The Name identifies the entity in an entity reference or, in the case of an unparsed entity, in the value of an ENTITY or ENTITIES attribute. If the same entity is declared more than once, the first declaration encountered is binding; at user option, an XML processor may issue a warning if entities are declared multiple times.
Example (general) entity declaration:
<!DOCTYPE videocollection [
<!ENTITY R "Romance">
<!ENTITY WAR "War">
<!ENTITY COM "Comedy">
<!ENTITY SF "Science Fiction">
<!ENTITY ACT "Action">
]>
These entities are then used (referred to) in a XML document like this: An (general) entity reference refers to the content of a named entity. References to parsed general entities use ampersand (&) and semicolon (;) as delimiters.
<videocollection>
<title id=”1″>Tootsie</title>
<genre>&COM;</genre>
<year>1982</year><title id=”2″>Jurassic Park</title>
<genre>&SF;</genre>
<year>1993</year><title id=”3″>Mission Impossible</title>
<genre>&ACT;</genre>
<year>1996</year>
</videocollection>
As in HTML, the name of the entity is preceded with an ampersand (&) and followed by a semicolon (;).
Parameter entity declaration is used for shortcuts within the DTD. Example parameter entity declaration: <!ENTITY % NAME “text that you want to be represented by the
entity”>
<!ENTITY % pub “Éditions Gallimard” >
<!ENTITY rights “All rights reserved” >
<!ENTITY book “La Peste: Albert Camus, © 1947 %pub;.
&rights;”>
Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters. The Parameter entity reference then is: <!ENTITY book “La Peste: Albert Camus, © 1947 %pub;. &rights;”>
The replacement text for the entity “book” is: La Peste: Albert Camus, © 1947 Éditions Gallimard.
XML expands the power of entities in a big way.There are three kinds of entities.
Internal Entities
If the entity definition is an EntityValue, the defined entity is called an internal entity. There is no separate physical storage object, and the content of the entity is given in the declaration.
Internal Entities allow for entities to be defined in DTDs so they can be used throughout the rest of the document. If, for instance, a phrase such as “Science Fiction” occurs frequently in a document, following could be put in the DTD to avoid typing the whole phrase each time. Internal entities allow you to define shortcuts for frequently typed text or text that is expected to change, such as the revision status of a document. Internal entities help avoiding misspellings and retyping of the same information.
An internal entity is a parsed entity. Example of an internal entity declaration:
<!ENTITY SF “Science Fiction”>
Whenever the full term needs to be used in the document, it sufficient to type &SF;
Internal entities can include references to other internal entities, but it is an error for them to be recursive.
External Entities
If the entity is not internal, it is an external entity. External entity references is used for replacement text that is really long. The information is then kept in another file.
External entities allow an XML document to refer to an external file. External entities contain either text or binary data. If they contain text, the content of the external file is inserted at the point of reference and parsed as part of the referring document. Binary data is not parsed and may only be referenced in an attribute. Binary data is used to reference figures and other non-XML content in the document.
The entity declaration in this example refers to documents that are located in different sections. They are placed into the XML file by using the entities, rather than cutting and pasting the contents of separate files together. You can specify an entity that has text defined external to the document by using the SYSTEM keyword such as: <!ENTITY LIagreement SYSTEM “http://www.mydomain.com/license.xml”>
<!ENTITY LOGO SYSTEM “http://www.mydomain.com/logo.gif” NDATA GIF87A>
In this case, the XML processor will parse the content of that file as if its content had been typed at the location of the entity reference.
The entity is also an external entity, but its content is binary. The LOGO entity can only be used as the value of an ENTITY (or ENTITIES) attribute (on a graphic element, perhaps). The XML processor will pass this information along to an application, but it does not attempt to process the content of /standard/logo.gif.
Predefined Entities
There are five pre defined XML entities, most of which should be well known to HTML coders:
< produces the left angle bracket <
> produces the right angle bracket >
& produces the ampersand &
' produces a single quote character ‘
" produces a double quote character “
You could also use entity references within tag attributes. For example, consider the following: <INVOICE CLIENT = “&IBM;” product = “&product_id_8762;” quantity =”5″>
You may not reference an external entity from within element attributes. The referenced text may not contain the < character because it would cause a parsing error in the element when replaced.
Note
- Note that there may not be any whitespace embedded in an entity reference. & SF; or &SF ; will cause errors.
- Entities MUST be declared in an XML document before they are referenced.

Leave a Reply