Tuesday, May 19, 2009

What is HTML

What is HTML?

HTML stands for hypertext markup language, not a programming language. A markup language is a set of markup tags. These tags are made up of angle brackets. An HTML describes the structure of text-based information in a document. It can also describe the appearance and meaning of a document. The first publicly available description of HTML was a document called HTML Tags, first mentioned on the Internet by Berners-Lee in late 1991. It describes 22 elements comprising the initial, relatively simple design of HTML. Thirteen of these elements still exist in HTML 4. HTML is a text and image formatting language used by Internet browsers to dynamically format web pages.
An HTML markup consists of many key components. These include elements, character-based data types, character references, and entity references. Document type declaration is another component, which specifies the Document Type Definition. After HTML5, which was created in January of 2008, Document Type Definition does not need to be specified.
Elements are the basic structure for HTML markup. Elements have two basic properties: attributes and content. Each attribute and each element's content has certain restrictions that must be followed for a HTML document to be considered valid. An element usually has a start tag (e.g. ) and an end tag (e.g. ). The element's attributes are contained in the start tag and content is located between the tags (e.g. Content). Some elements, such as
, do not have any content and must not have a closing tag. Listed below are several types of markup elements used in HTML. Structural markup describes the purpose of the text. Presentational markup describes the appearance of the text. Hypertext markup links parts of the document to other documents.
Most of the attributes of an element are name-value pairs, separated by "=", and written within the start tag of an element, after the element's name. The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML. If the values are left unquoted, it can be considered unsafe. There are several attributes that elements can take. The id attribute provides a document wide unique identifier for an element. The class attribute provides a way of classifying similar elements for presentation purposes. An author may use the style non-attribute codes for presentational purposes for a particular element. The title attribute is used to attach sub textual explanations to an element.
A character entity references is a reference to a particular entity that has been predefined or explicitly declared in a Document Type Definition. It allows individual characters to be written via simple markup, rather than literally. The ability to escape characters allows for the characters “<” and “&” to be interpreted as character data, rather than markup. Escaping also allows for characters that are not easily typed or that are not even available in a characters encoding to be represented. HTML defines several data types for element content, such as script data and style sheet data, and a plethora of types for attribute values, including IDs, names, URI’s, numbers, units of length, languages, media descriptors, colors, character encodings, dates and times, and so on. All of these data types are specializations of character data.
HTML documents are required to start with a Document Type Declaration. The function of the Doctype is to avoid Quirks Mode. Quirks Mode is used for maintaining backward compatibility. An HTML gives great leeway to the browser to decide how to display a page, which leaves little control to the HTML author. There is very little control of the author because the author may decide that the header is a primary section header, but the browser puts it in whatever font, face, and coloring it wants to.
Reference: “HTML Made Really Easy” by James Marshall. December 25, 2008.
What is HTML?

1 comment: