cswithpathak BY DR. AJAY KUMAR PATHAK, ASSISTANT PROFESSOR : UNIT 2 MJ–12 (Th):- WEB PROGRAMMING (UNIT NAME :- UNDERSTANDING XML )

DR. AJAY KUMAR PATHAK

ASSISTANT PROFESSOR

READ ALL THE NOTES CHAPTER WISE

SUBJECT NAME:- MJ–12 (Th):- WEB PROGRAMMING

FOR B. Sc. IT.

SEM 6 F.Y.U.G.P.

UNIT 2 (UNIT NAME):- UNDERSTANDING XML

LEARN NOTES FROM HERE

PREPARED BY DR. AJAY KUMAR PATHAK

©Copyrights

UNIT 2 :-

MJ–12 (Th) (UNIT NAME):-

UNDERSTANDING XML

B. Sc. IT. SEMESTER 6 NOTES BASED ON NEP

SUBJECT : MJ–12 (Th): WEB PROGRAMMING

(To be selected by the students from)

UNIT 2 (UNIT NAME):- UNDERSTANDING XML

Objective: The objective of the course is to enable students to

· The objective of this course is to provide students with a comprehensive understanding of network security concepts and techniques. The course aims to develop students' skills in identifying network vulnerabilities, implementing security measures, and ensuring the confidentiality, integrity, and availability of networked systems.

Learning Outcome:- After completion of this course, a student will be able to–

· Understand the principles and concepts of network security.

· Identify potential security threats and vulnerabilities in networked systems.

· Implement security measures to protect network infrastructure.

· Apply encryption and authentication techniques to secure network communication.

· Analyze and respond to security incidents in networked environments

Semester Examination and Distribution of Marks

INTERNAL MARKS :- 25 (NO PRACTICAL IN THE MJ 12 (WEB PROGRAMMING)

End Semester Examination (ESE) : 75 Marks

-: NOTES READ FROM HERE :-

SYLLABUS OF UNIT 2 :- Understanding XML: Overview of XML, Creating XML Documents, Rules for Well-Formed XML, Adding Comments, CDATA Sections, Creating a DTD-The Concept of a Valid XML Document, Creating a DTD for an existing XML File.

UNIT- 2 :- UNDERSTANDING XML:-

INTRODUCTION TO XML: Year started 1996, First published February 10, 1998.Organization World Wide Web Consortium (W3C) by James Clark.

XML stands for extensible markup language. A markup language is a set of codes, or tags that describes the text in a digital document XML is a markup language similar to HTML XML tags are not predefined. You must define your own tags. Instead, you define your own tags designed specifically for your needs. This is a powerful way to store data in a format that can be stored, searched, transport and shared. XML is platform independent and language independent

For example, Microsoft Office versions 2007 and later use XML for their document structure. So, when you save a document in Word, Excel or PowerPoint, you get the document title with an "X" at the end, which stands for XML. For a Word document, the title appears with ".DOCX" at the end.

V V I XML Tree

Elements trees are used to create XML documents.

An XML tree begins with a root element and branches to child elements.

All elements can have child elements (sub-elements):

<root>

<child>

</child>

</root>

NOTE VVI FOR COMMENT LINES

<!-- This is a comment that spans

multiple lines. -->

XML Syntax

<?xml version = "1.0"?>

<contact-info>

<name>Ajay Kumar Pathak </name>

<company>You are learning XML Program</company>

</contact-info>

YOU CAN SEE THERE ARE TWO TYPES OF DATA IN THE ABOVE MODEL –

Markup, as <contact-info>

The text, or the character information, Ajay Kumar Pathak and 1234567890.

SIMPLE EXAMPLE OF A XML PROGRAM

<?xml version = "1.0"encoding="UTF-8"?> ///The prolog specifies that the file contains XML version 1.0 data, encoded using Unicode Transformation Format 8 (UTF-8) encoding, which is the same as ASCII text and specifies the American English character set.

<MiddleName>KUMAR</MiddleName>

<LastName>PATHAK</LastName>

<City>Jamshedpur</City>

<State>Jharkhand</State>

</Address>

</Employee>

</Company>

RULES:-An XML declaration should abide with the following rules −

1) If the XML declaration is present in the XML, it must be placed as the first line in the XML document.

2) If the XML declaration is included, it must contain version number attribute.

3) The Parameter names and values are case-sensitive.

4) The names are always in lower case.

5) The order of placing the parameters is important. The correct order is: version, encoding and standalone.

6) Either single or double quotes may be used.

7) The XML declaration has no closing tag i.e. </?xml>

STRUCTURE OF AN XML DOCUMENT:

An XML Document contains some attributes.

(1)..XMLDeclaration/ definition :-The XML Declaration provides basic information about the format for the rest of the XML document. It takes the form of a Processing Instruction and can have the attributes version, encoding and standalone. It is optional, but when used, it must appear in the first line of the XML document.

EXAMPLE:-<?xml version="1.0" encoding="utf-8"?>/// IT’S ALSO COMMENT LINE Full form , Unicode Transformation FormatUTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character.///

(2).. Elements :- XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements, attributes, media objects or all of these.

Each XML document contains one or more elements, the scope of which are either enclosed by start and end tags, or for empty elements, by an empty-element tag.

Syntax

Following is the syntax to write an XML element −

Examples :- Empty Element

<MyElement/>,

Example:- Element with no content

Example:- Element with text

(3).. Text :-In XML text is the content enclosed within elements. XML is designed to represent structured data, and text within elements serves as the data that you want to store or transmit.

Example:<book>

<title>XML Basics</title>

<author>Ajay Pathak </author>

</book> ,

In this example, the <title>, <author>, and <published> elements contain text content, which is the data associated with those elements.

(4).. Attributes :- Attributes are used to provide additional information about an element. Attributes are typically name-value pairs that are placed within the start tag of an element. (Value has to be in double (" ") or single (' ') quotes. Here, attributeName is unique attribute labels.)

Example: - <elementNameattributeName="attributeValue">elementContent</elementNam

Let's break down the parts:

<elementName>: This is the name of the XML element.

attributeName: This is the name of the attribute.

attributeValue: This is the value associated with the attribute.

elementContent: This is the content of the XML element.

Example:

<street>123 Bistupur</street>

<city>Jamshedpur</city>

</address>

</person>

We have an XML element named "person" with three attributes: "firstName," "lastName," and "age." These attributes provide information about the person, such as their first name, last name, and age

(5). Entities :- Entities in XML have a similar role to variables in other programming languages. A variable is a storage location in programming. A variable is used as a storage container which instead of using the value or explicitly using something you can store it in a container and continue to use it frequently throughout your code.

Entities are used to define shortcuts to special characters within the XML documents. Entities can be primarily of four types −

1..Built-in entities

2..Character entities

3..General entities

4..Parameter entities

Entity Declaration Syntax

In general, entities can be declared internally or externally. Let us understand each of these and their syntax as follows −

Internal Entity:- If an entity is declared within a DTD it is called as internal entity.

Syntax

Following is the syntax for internal entity declaration −

<!ENTITYentity_name "entity_value">

In the above syntax −

entity_name is the name of entity followed by its value within the double quotes or single quote.

entity_value holds the value for the entity name.

The entity value of the Internal Entity is de-referenced by adding prefix & to the entity name i.e. &entity_name.

Example

Following is a simple example for internal entity declaration −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>

<!DOCTYPE address

[

<!ELEMENT address (#PCDATA)> // parsed character data//

<!ENTITY name "Tanmaypatil">

<!ENTITY company "AjayTutorials">

<!ENTITYphone_no "014154234567">

&name;

&company;

&phone_no;

</address>

In the above example, the respective entity names name, company and phone_no are replaced by their values in the XML document. The entity values are de-referenced by adding prefix & to the entity name.

Save this file as sample.xml and open it in any browser, you will notice that the entity values for name, company, phone_no are replaced respectively.

External Entity :- If an entity is declared outside a DTD it is called as external entity. You can refer to an external Entity by either using system identifiers or public identifiers.

Syntax

Following is the syntax for External Entity declaration −

<!ENTITY name SYSTEM "URI/URL">

In the above syntax −

name is the name of entity.

SYSTEM is the keyword.

URI/URL is the address of the external source enclosed within the double or single quotes.

XML NAMING RULES

XML elements must follow these naming rules:

1) Element names are case-sensitive

2) Element names must start with a letter or underscore

3) Element names cannot start with the letters xml (or XML, or Xml, etc)

4) Element names can contain letters, digits, hyphens, underscores, and periods

5) Element names cannot contain spaces

6) Any name can be used, no words are reserved (except xml).

XML Advantages

1) XML is extendable.

2) Can be read and understood by all.

3) Completely portable and also compatible with JAVA.

4) XML is a platform-independent programming language; hence can be used by any system.

5) XML supports Unicode

6) Using XML, data can be stored and transported at any point in time without affecting data presentation.

7) XML document is free of any syntax error.

8) Data sharing between various systems is simplified using XML.

XML Disadvantages

1) Compared to other text-based formats, XML is redundant and verbose.

2) When data volume is large, it results in high storage and transportation cost due to redundancy in XML syntax.

3) Compared to other text-based formats, XML is less readable.

4) Due to its lengthy nature, the XML file size is very large.

5) XML does not support an array.

DEFERENCE BETWEEN XML AND HTML

HTML	XML
Is a markup language.	Is a standard markup language that defines other markup languages.
Is not case sensitive.	Is case sensitive.
Doubles up as a presentation language.	Is not a presentation language nor a programming language.
Has its own predefined tags.	Tags are defined as per the need of the programmer. XML is flexible as tags can be defined when needed.
Closing tags are not necessarily needed.	Closing tags are used mandatorily.
White spaces are not preserved.	Capable of preserving white spaces.
Showcases the design of a web page in the way it is displayed on client-side.	Enables transportation of data from database and related applications.
Used for displaying data.	Used for transferring data.
Static in nature.	Dynamic in nature.
Offers native support.	With the help of elements and attributes, objects are expressed by conventions.
Null value is natively recognized.	Xsi:nil on elements is needed in an XML instance document.
Extra application code is not needed to parse text.	XML DOM (Document Object Model )application and implementation code is needed to map text back into JavaScript objects.
EXAMPLE <!DOCTYPE html> <html><head><title> Page title </title></head><body> <hl> First Heading</hl><p> First paragraph.</p></body></html>	<?xml version="1.0> <address><name> AJAY KUMAR PATHAK</name><contact>1234567890</contact><email>ajay1@gmail.com </email><birthdate>1980-09-27</birthdate></address>

VVI :XML DOES NOT DO ANYTHING

Maybe it is a little hard to understand, but XML does not DO anything.

<note>

<from>JAMSHEDPUR</from>

<body>WELCOME TO ALL </body>

</note>

The XML above is quite self-descriptive:

1. It has sender information

2. It has receiver information

3. It has a heading

4. It has a message body

But still, the XML above does not DO anything. XML is just information wrapped in tags.

CREATING XML DOCUMENTS : -

STEPS:-

1. Open your text editor of choice.

2. On the first line, write an XML declaration.

3. Set your root element below the declaration.

4. Add your child elements within the root element.

5. Review your file for errors.

6. Save your file with the .xml file extension.

7. Test your file by opening it in the browser window.

EXAMPLE WITH QUESTIONS

EXAMPLE 1: SIMPLE XML FOR BOOK LIST (LIB1 . XML)

<?xml version="1.0" encoding="UTF-8"?> <!—THIS IS DOCUMENT PROLOG, This is an XML file containing BOOKS records -->

<library> <!—FROM HERE TO LAST LINES (< / library> ) ARE CALLED DOCUMENTS ELEMENTS

<title>XML Basics</title>

<author>DR. AJAY KR PATHAK</author>

</book>

<title>Learning Python</title>

<author>MIHIR KUMAR </author>

</book>

</library>

Explanation of the Above Code

<library> is the root element.
<book> is a child element and has an attribute id.
Inside each <book> are nested tags: <title>, <author>, <price>.

This will create an BOOK.xml file:

/// ( ( Document Prolog Section :- Document Prolog comes at the top of the document, before the root element. This section contains −XML declaration Document type declaration ) ) ///

END

EXAMPLE NO 2 :- CREATING XML DOCUMENTS FILE OF LIBRARY ( LIB2 . XML)

<?xml version='1.0' encoding='utf-8'?>

<title>The Guide</title>

<author>R.K. Narayan</author>

<genre>Fiction</genre>

</book>

<title>Wings of Fire</title>

<author>A.P.J. Abdul Kalam</author>

<genre>Autobiography</genre>

</book>

<title>Train to Pakistan</title>

<author>Khushwant Singh</author>

<genre>Historical Fiction</genre>

</book>

<title>God of Small Things</title>

<author>Arundhati Roy</author>

<genre>Literary Fiction</genre>

</book>

</library>

EXAMPLE 2: USING HTML AND XSLT (ALTERNATE SOLUTION OF JAVA SCRIPT)

Since you don’t want to use JavaScript and prefer only HTML and XML, the best way to display an XML file in a browser is by using XSLT (Extensible Stylesheet Language Transformations).

STEP 1: CREATE AN XML FILE (BOOKS.XML)

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="books.xsl"?> //// The second line links to an XSLT file (books.xsl) that will format the XML for display.////

<book>

<title>WEB TECHNOLOGY </title>

<author> DR. AJAY KUMAR PATHAK </author>

</book>

<book>

<title>The Guide</title>

<author>DR. AJAY KUMAR PATHAK </author>

</book>

</library>

STEP 2: CREATE AN XSLT FILE (BOOKS.XSL)

THIS FILE FORMATS AND DISPLAYS THE XML DATA IN A TABLE WHEN OPENED IN A BROWSER.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.ajay.org/2025/XSL/Transform"> ////<xsl:stylesheet> → Defines an XSLT file ////

<xsl:template match="/"> ///xsl:template match="/"> → Starts formatting the entire XML document. ///

<html>

<head>

<title>Library Books</title>

</head>

<body>

<h2>Library Books</h2>

<tr>

<th>Title</th>

<th>Author</th>

<th>Price</th>

</tr>

<xsl:for-each select="library/book">

<tr>

</tr>

</xsl:for-each>

</table>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

Step 3: How to Run the Files

Save both files (books.xml and books.xsl) in the same folder.
Open books.xml in your browser (Chrome, Edge, Firefox).
The book list will be displayed as a table.

Final Output in Browser

Title	Author	Price
WEB TECHNOLOGY	DR. AJAY KUMAR PATHAK	800
THE GUIDE	DR. AJAY KUMAR PATHAK	800

THE END

RULES FOR WELL-FORMED XML : - When a document follows the XML markup syntax rules, it is said to be well-formed . Documents that have incorrect syntax are referred to as malformed .

Well-Formed XML

The primary rules for a well-formed XML document are:

There may be no whitespace (character spaces or line returns) before the XML declaration, if there is one.
There is a single "root" element that contains all the other elements.
An element must have both an opening and closing tag, unless it is an empty element.
If an element is empty, it must contain a closing slash before the end of the tag (for example, ).
All opening and closing tags must nest correctly and not overlap.
There may not be whitespace between the opening < and the element name in a tag.
All element attribute values must be in straight quotation marks (either single or double quotes).
An element may not have two attributes with the same name.
Comments and processing instructions may not appear inside tags.
No unescaped < or & signs may occur in the character data of an element or attribute.
The document must have a single root element, a unique element that encloses the entire document. The root element may be used only once in the document.
The element tags are case-sensitive; the beginning and end tags must match exactly. Tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, nor a space character, and cannot start with -, ., or a numeric digit.

COMPLETE EXAMPLE OF A WELL-FORMED XML DOCUMENT:-

<?xml version="1.0" encoding="UTF-8"?>

<name>MOHIT SINHA</name>

<email>mohit@gmail.com</email>

</student>

<name>SANDIP NANDI</name>

<email>sanip@gmail.com</email>

</student>

</school>

THE END

DISCERNING STRUCTURE : -

In XML, "Discerning Structure" refers to understanding and identifying the hierarchical organization (Means tree structures) and components of an XML document. It’s about recognizing how the elements are nested, how they relate to one another, and how data is represented in a tree-like format.

Key Concepts in Discerning XML Structure:

(NOTE:- An XML document is always descriptive. The tree structure is often referred to as XML Tree and plays an important role to describe any XML document easily. The tree structure contains root (parent) elements, child elements and so on. By using tree structure, you can get to know all succeeding branches and sub-branches starting from the root. The parsing starts at the root, then moves down the first branch to an element, take the first branch from there, and so on to the leaf nodes )

1. Root Element:- Every XML document must have one root element that contains all other elements.

Example: <bookstore>  </bookstore>

2. Child Elements (Sub-elements):- Elements nested inside another element.

Example: <book> <title> XML Basics </title>

<author> DR. AJAY KR PATHAK </author>

</book>

3. Attributes:- Provide additional information about elements.

Example:- <book category="programming"> <title>XML Guide</title> </book>

4. Hierarchy (Parent-Child Relationship):-

· Shows how elements are nested within each other.

· The bookstore is the parent of book, and book is the parent of title, author, etc.

5. Well-Formedness:

· Proper nesting and closing of tags.

· Example of well-formed:

<note>

<to>Student</to>

<from>Teacher</from>

</note>

6. Comments and Declarations:-

· XML Declaration: <?xml version="1.0" encoding="UTF-8"?>

· Comments:

EXAMPLE: DISCERNING STRUCTURE:-

<?xml version="1.0" encoding="UTF-8"?>

<title>Learn XML</title>

<author> ANSHU SINGH </author>

</book>

<title>Advanced XML</title>

<author> DR. AJAY KR PATHAK </author>

</book>

</library>

EXPLANATION :-

· Root Element: library

· Child Elements of library: book (2 of them)

· Attributes: id="b1" and id="b2"

· Sub-elements of book: title, author, year

WORKING WITH MIXED CONTENT:-

(Mixed content in an XML file means an element can contain:

Text
Child elements )

(“Mixed content means combining plain text and XML tags inside the same element.")

Example Program: Let’s say we want to write a paragraph where some text is bold and some is italic.

Hello, this is bold and this is italic text!

</message>

What does it mean?

Hello, = plain text
this is bold = bold text inside child element
this is italic = italic text inside child element
text! = again plain text

So, the <message> element contains text and tags together mixed content.

Step-by-Step Explanation:

Step 1: Start with XML Declaration

<?xml version="1.0" encoding="UTF-8"?>

This tells the XML processor about the version and encoding.

Step 2: Define the Root Element

This is the main element wrapping the entire content.

Step 3: Mix Text and Elements

Inside <message>, we put:

Normal text: Hello,
Child element: this is bold
Child element: this is italic
Normal text: text!

Full XML Code:

<?xml version="1.0" encoding="UTF-8"?>

Hello, this is bold and this is italic text!

</message>

Important Rules for Mixed Content:

Text and elements must be allowed in the element’s structure (in DTD or schema).
Whitespace is considered text too.
Mixed content is common in document-type XML, not database-type.

Optional: Sample DTD for Mixed Content

If you're using a DTD to validate the XML, define it like this:

<!DOCTYPE message [

<!ELEMENT message (#PCDATA | b | i)*>

<!ELEMENT b (#PCDATA)>

<!ELEMENT i (#PCDATA)>

Here,

#PCDATA = Parsed Character Data (text)
| = OR
* = Zero or more times

So, message can have text, , or in any order.

MORE EXAMPLE WITH DEFINITION OF WORKING WITH MIXED CONTENT:-

What are "Mixed Content" models in XML?

In XML, "Mixed Content" refers to an element content model that allows both text and child elements within an element. This means that the element can contain both character data (i.e. text content) and child elements in any order. For example, consider an XML element named "paragraph" that can contain both text and a child element named "emphasis":

<!ELEMENT paragraph (#PCDATA | emphasis)*>

<!ELEMENT emphasis (#PCDATA)>

In this example, the "paragraph" element is defined as having mixed content by using the asterisk (*) to indicate that any number of occurrences of either character data or the "emphasis" element can appear in any order. The "emphasis" element is defined as containing only character data.

With this content model, the following are all valid examples of "paragraph" elements:

<paragraph>This is a simple paragraph.</paragraph>

<paragraph><emphasis>This</emphasis> is an <emphasis>important</emphasis> paragraph.</paragraph>

<paragraph><emphasis>This</emphasis> paragraph contains <emphasis>multiple</emphasis> emphases.</paragraph>

Mixed content models can be useful when the content of an element needs to contain both text and other elements, such as in the case of HTML-like markup languages or rich-text document formats. However, they can also make it more difficult to validate and process the XML document, as the order and content of child elements can vary.

XML Mixed Content

Mixed content models enable you to include both text and element content within a single content model. To create a mixed content model in XML Schemas, simply include the mixed attribute with the value true in your <complexType> definition, like so:

</choice>

</complexType>

</element>

The preceding example declares a <description> element, which can contain an infinite number of , , and elements. Because the complex type is declared as mixed, text can be interspersed throughout these elements. An allowable <description> element might look like the following:

<description> Joe is a developer

& author for Beginning XML 5th edition

</description>

In this <description> element, textual content is interspersed throughout the elements declared within the content model. As the schema validator is processing the preceding example, it skips over the textual content and entities while performing standard validation on the elements. Because the elements , , and may appear repeatedly

(maxOccurs="unbounded"), the example is valid

END

ADDING COMMENTS IN XML :-

In XML, comments are notes or explanations written within the code that help humans understand the document but are ignored by the XML parser (i.e., the software reading the XML).

In XML and in every programming language, comments play an important role because they can be used to explain a particular piece of code. XML comments are a single-character or paragraph statement that, in addition to code, provides formal documentation and helps you understand common tags used in an XML file. Keep in mind that an inline comment should be used carefully, as it will make the code harder to read. Comments are possible anywhere in the XML code. Unlike JSON, XML supports comments out of the box. In XML, comments are identified by the following syntax: "". You can add a text note between these characters.

Syntax of XML Comments

They do not affect the structure or the data and are often used for:

Documentation
Descriptions
Reminders
Temporary disabling of code/data

Single-Line Comment |  | Simple descriptions

Multi-Line Comment |  | Detailed notes/documentation

Commented XML Block |  | Temporarily disabling code/data

Rules of Writing XML Comments:-

1	A comment must start with <!-- and end with -->.
2	Comments cannot be nested (no comments inside comments).
3	The string -- is not allowed inside comments.
4	Comments can be placed anywhere in the XML document.
5	Comments are not displayed or processed by the parser.

Types of Comment Lines in XML

Technically, XML supports only one type of comment syntax, but based on how and where comments are used, :

1Single-Line Comments

2 Multi-Line Comments

EXAMPLE OF THE COMMENTS

<?xml version="1.0" encoding="UTF-8"?>

<name>Ravi Kumar</name>

</student>

<name>Priya Sharma</name>

</student>

<!-- The following student is on hold, will be activated later

<name>Rahul Mehta</name>

</student>

-->

</students>

END

WHAT IS XML PARSER:- ( In simple words: An XML Parser is like a reader or checker that goes through your XML file to: Read the content , Check if it's correct (follows XML rules), Understand the structure and data, Think of it like a teacher checking your exam paper:- If everything follows the rules → then All good!, If there are mistakes like missing tags or wrong characters → then Error!

Example:-

</person>

Invalid XML (Parser will show error):

</student>

)

Full explanation of the parser: -XML parser is a software library or a package that provides interface for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents. Modern day browsers have built-in XML parsers.

Following diagram shows how XML parser interacts with XML document

To ease the process of parsing, some commercial products are available that facilitate the breakdown of XML document and yield more reliable results.

Some commonly used parsers are listed below :-

MSXML (Microsoft Core XML Services) − This is a standard set of XML tools from Microsoft that includes a parser.
System.Xml.XmlDocument − This class is part of .NET library, which contains a number of different classes related to working with XML.
Java built-in parser − The Java library has its own parser. The library is designed such that you can replace the built-in parser with an external implementation such as Xerces from Apache or Saxon.
Saxon − Saxon offers tools for parsing, transforming, and querying XML.
Xerces − Xerces is implemented in Java and is developed by the famous open source Apache Software Foundation.

CDATA SECTIONS:- Full form CDATA , Character Data) (It is also called Unparsed Character data), The term CDATA means, Character Data. CDATA is defined as blocks of text that are not parsed (passed or read or recognized ) by the parser, but are otherwise recognized as markup.

The predefined entities such as &amplt ( (less than(<));, &ampgt (greater than(>));, and &ampamp; (< and > are HTML entities and stand for less than(<) and greater than(>) respectively. HTML entities are reserved characters that are used to represent some special characters within HTML ) require typing and are generally difficult to read in the markup. In such cases, CDATA section can be used. By using CDATA section, you are commanding the parser that the particular section of the document contains no markup and should be treated as regular text.

WHY DO WE NEED CDATA? XML doesn't allow certain characters like:

1.. < (less than)

2.. > (greater than)

3.. & (ampersand)

If we include these characters in XML content, it will cause errors or confusion for the XML parser. So, we wrap such content in <![CDATA[ ... ]]> to safely include any text.

CDATA Syntax:

<![CDATA[

Your text here...

]]>

Everything inside <![CDATA[ and ]]> will be treated as text, not XML code.

EXAMPLE:-

Let’s take a simple XML example without CDATA and see the problem.

Example Without CDATA (Problem):

<note>

</note>

This will cause an error! Because < is interpreted as the start of a new tag.

CORRECT EXAMPLE USING CDATA:

<note>

</note>

Explanation:

<![CDATA[ starts the CDATA section

Use 5 < 10 in your code is now just plain text

]]> ends the CDATA section

Now, XML parser will NOT try to interpret < 10 as a tag.

ANOTHER REAL-LIFE EXAMPLE

<code>

<![CDATA[

if (x < 5 && y > 10) {

console.log("Hello, world!");

}

]]>

</code>

Here:

<, >, &&, and other symbols will NOT break your XML
It's a safe way to store code or scripts

Key Points to Remember:

Feature	Description
Start	<![CDATA[
End	]]>
Safe for	Symbols like <, >, &, etc.
Used for	Including raw text, code, special characters
Parser Behavior	Ignores tags or special characters inside CDATA

Final XML CDATA Example:

<book>

<title>Learning XML</title>

<description><![CDATA[

This book explains XML basics.

It also includes code like <tag> and &entity; safely.

]]></description>

</book>

Here, the description contains:

<tag> (won’t be confused as an XML tag)
&entity; (won’t be treated as a special entity)

END

PCDATA IN MXL :- Meaning: It is the text that XML can read, understand, and process. In XML, when you write normal text inside a tag, that text is called PCDATA. This data is read and checked (parsed) by the XML parser. (MEANS IN HINDI JO XML KA PROGRAME HAM LOG LIKHTE HAI WAHI PLAIN DATA PCDATA KAHALATA HAI)

Simple Example: <name>AJAY KUMAR</name> Here, AJAY KUMAR is PCDATA. It is inside the <name> tag, and XML will read and understand it as the value for the name.

Special Characters must be written carefully

Some characters have special meanings in XML, so you can’t use them directly in PCDATA.

Symbol	Use in XML PCDATA
<	<
>	>
&	&

Example: <note>5 < 10</note> (This means: 5 < 10, XML will understand it correctly because < is written as <)

EXAMPLE OF PROGRAM OF XML USING PCDATA (Student Information)

<?xml version="1.0" encoding="UTF-8"?>

<name>AJAY KUMAR </name>

<college>MRS. KMPM VC College</college>

</student>

HERE All THE ABOVE CODES ARE CALLED PCDATA . PCDATA MEANS PLAIN TEXT INSIDE THE TAGS, WHICH XML CAN UNDERSTAND.

Parsed Character Data (PCDATA) is a data definition that originated in Standard Generalized Markup Language (SGML), and is used also in Extensible Markup Language (XML) Document Type Definition (DTD) to designate mixed content XML elements.

DIFFERENCE BETWEEN IN CDATA AND PC DATA IN XML :-

Point

PCDATA (Parsed Character Data)

CDATA (Character Data)

Full Form

Parsed Character Data

Character Data

Parser checks the text?

Yes, XML parser reads and checks it

No, XML parser ignores the content

Allows special characters like <, >?

No (must use codes like <)

Yes (write characters directly)

Can include XML tags inside?

Yes (but treated as plain text, not as real tags)

Used for

Normal data like name, city, roll no.

Raw text, code, HTML, scripts

Syntax Example

Parsed by XML?

Yes

Best for

Safe and clean text data

Complex or technical text data

WHAT IS DTD (CREATING A DTD ):-

DTD :- (In hindi:- jo bhi data transmit hora hai kishi others documents mai through the .DTD (kiyeki . XML, . DTD and. .XSD (XML Schema Definition) only kisibhi data / information ko carry karta hai display nahi karta hai kishi bhi browser mai) ko and wah data ak well formed mai hai ya nahi usko legal checked karta hai because agar .DTD ka data ham transmit kiye kishi others applications mai or wah DTD data others application mai access ho kar waha per errors dene lagega to waha per problems hoga esliye DTD data ko pahele check and verify kar lega ki yah data kishi others documents mai execute ho raha hai ya nahi)

DOCUMENT TYPE DEFINITIONS (DTD) :- The XML Document Type Declaration, commonly known as DTD, is a way to describe XML language accurately (An application can use a DTD to verify that XML data is valid and well formed). DTDs check vocabulary (SPELLING) and validity of the structure of XML documents against grammatical rules of appropriate XML language.

An XML DTD can be either specified inside the document, or it can be kept in a separate document and then liked separately.

A document type definition (DTD) provides you with the means to validate XML files against a set of rules. When you create a DTD file, you can specify rules that control the structure of any XML files that reference the DTD file.

A DTD can contain declarations that define elements, attributes, notations, and entities for any XML files that reference the DTD file. It also establishes constraints for how each element, attribute, notation, and entity can be used within any of the XML files that reference the DTD file.

To be considered a valid XML file, the document must be accompanied (go with OR go along with ) by a DTD (or an XML schema), and conform to all of the declarations in the DTD (or XML schema).

Certain XML parsers have the ability to read DTDs and check to see if the XML file it is reading follows all of those rules. While the parser is reading the XML file, it will check each line to be sure that it conforms to the rules that are laid (put, arranged) out in the DTD file. If there is a problem, the parser generates an error and points to where the error occurs in the XML file. This kind of parser is called a validating parser because it validates the content of the XML file against the DTD.

Purpose of DTD :- Its main purpose is to define the structure of an XML document. It contains a list of legal elements and define the structure with the help of them.

Syntax

Basic syntax of a DTD is as follows −

<!DOCTYPE element DTD identifier

[

declaration1

declaration2

........

IN THE ABOVE SYNTAX,

The DTD starts with <!DOCTYPE delimiter.

An element tells the parser to parse the document from the specified root element.

DTD identifier is an identifier for the document type definition, which may be the path to a file on the system or URL to a file on the internet. If the DTD is pointing to external path, it is called External Subset.

The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.

A complete example of well-formed and valid XML document. It follows all the rules of DTD.

Save file :- employee.xml

<?xml version="1.0"?>

<!DOCTYPE employee SYSTEM "employee.dtd"> // EXTERNAL DTD WHICH IS EMPLOYEE.DTD IS THE FILE NAME//

<lastname>PATHAK</lastname>

<email>ajay@example.com</email>

</employee>

In the above example, the DOCTYPE declaration refers to an external DTD file. The content of the file is shown in below paragraph.

File name :- employee.dtd

<!ELEMENT employee (firstname,lastname,email)>

<!ELEMENTfirstname (#PCDATA)>

<!ELEMENTlastname (#PCDATA)>

<!ELEMENT email (#PCDATA)>

OUTPUT

DATA TYPES IN XML

There are 2 data types,

(1)PCDATA is parsed character data (Full form parsed character data).

(2) CDATA is character data, not usually parsed (full form character data).

(1)..PCDATATYPE :- PCDATA (Parsed Character Data) is text that will be parsed by the XML parser.

Tags inside the PCDATA will be treated as markup and entities will be expanded.

PCDATA , It refers to the character data within an XML element that will be parsed by the XML processor. PCDATA can contain text, but certain characters like '<', '>', and '&' must be escaped using predefined entities (<,>, &) to ensure they are treated as data and not markup.

Example demonstrating the use of PCDATA in an XML document:

SAVE AS FILE:- *.XML

<?xml version="1.0" encoding="UTF-8"?>

<title>PROGRAMMING</title>

<author>AJAY KUMAR PATHAK </author>

This is a PCDATA example. It contains text with special characters like <, >, and &,

which must be escaped in XML.

</description>

</book>

</bookstore>

In this example:

<title>, <author>, <year>, and <description> are elements in the XML.

The text within the <title>, <author>, and <year> elements represents PCDATA.

The <description> element contains PCDATA that includes special characters like <, >, and &, which are escaped as entities (<,>, &) to ensure proper parsing by an XML processor.

PCDATA allows you to include textual data within XML elements while ensuring that special characters are properly represented and parsed as data rather than XML markup.

EXAMPLE OF PCDATA TYPE PROGRAM WITH .DTD FOR PRACTICAL ALSO

XML FILE (example.xml):

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE library SYSTEM "example.dtd">

<book>

<title>XML Basics</title>

<author>AJAY PATHAK</author>

<published_year>2023</published_year>

<description>This is a book about XML basics.</description>

</book>

<book>

<title>Learning XPath</title>

<author>PATHAK AJAY</author>

<published_year>2022</published_year>

<description>An introduction to XPath for XML querying.</description>

</book>

</library>

DTD File (example.dtd):

<!ELEMENT library (book+)>

<!ELEMENT book (title, author, published_year, description)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT author (#PCDATA)>

<!ELEMENTpublished_year (#PCDATA)>

<!ELEMENT description (#PCDATA)>

Explanation:

In the XML file, there is a <library> element containing multiple <book> elements.

Each <book> element consists of child elements like <title>, <author>, <published_year>, and <description>.

The DTD file defines the structure of the XML document:

<!ELEMENT library (book+)> specifies that the library element can contain one or more book elements.

<!ELEMENT book (title, author, published_year, description)> defines the structure of the book element containing title, author, published_year, and description elements in that order.

<!ELEMENT title (#PCDATA)>, <!ELEMENT author (#PCDATA)>, <!ELEMENT published_year (#PCDATA)>, <!ELEMENT description (#PCDATA)> declare that the child elements contain PCDATA(Parsed Character Data), meaning they can contain text content.

This structure outlines the hierarchy and content model for the XML document using the DTD, specifying that certain elements must contain text data.

OUTPUT

THE END

(2). CDATA TYPE: -

CDATA stands for Character Data and is a way to include blocks of text in XML documents without the need for escaping special characters (such as <, >, &, etc.). CDATA sections begin with <![CDATA[ and end with ]]> and are used to encapsulate blocks of text that might contain characters that would otherwise be recognized as markup.

Here's an example demonstrating the use of CDATA in an XML document:

SAVE AS FILE:- *.XML

<?xml version="1.0" encoding="UTF-8"?>

<title>PROGRAMMING</title>

<author>AJAY KUMAR PATHAK</author>

<description><![CDATA[

The web tech is Markup language.

This CDATA section allows including text without worrying about escaping special characters,

such as <, >, or &.

]]></description>

</book>

</bookstore>

In this example:

<title>, <author>, <year>, and <description> are elements in the XML.

The text within the CDATA section of the <description> element contains multiple instances of characters like <, >, and &. However, because it is wrapped within <![CDATA[ and ]]>, these characters

are treated as regular text and do not need to be escaped.

CDATA sections are useful when you want to include large blocks of text within an XML element without having to escape special characters. This makes it easier to include content that contains a lot of reserved or problematic characters without disturbing about XML parsing or validation issues.

EXAMPLE OF PCDATA TYPE PROGRAM WITH .DTD FOR PRACTICAL ALSO

CDATA sections begin with <![CDATA[ and end with ]]>.

XML File (example.xml):

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE notes SYSTEM "example.dtd">

<notes>

<note>

<content><![CDATA[

This is a sample note with special characters like < and >.

It doesn't affect the XML parsing because it's wrapped in a CDATA section.

Special characters like & and ' also don't need escaping here.

]]></content>

</note>

</notes>

DTD File (example.dtd):

<!ELEMENT notes (note*)>

<!ELEMENT note (title, content)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT content (#PCDATA)>

In this example,

Explanation:

<!ELEMENT ...>: This declares an element in DTD.
notes: This is the name of the element being defined.
(note*): This specifies the content model – what kind of child elements the <notes> element can contain.

MEANING:

The notes element can contain zero or more note elements.
The asterisk * means "zero or more occurrences".

EXAMPLE:

Here’s how this would look in a real XML document that follows this DTD rule:

<notes>

<note>

<to>Alice</to>

<message>Hello!</message>

</note>

<note>

<to>Charlie</to>

<from>David</from>

<message>Hi there!</message>

</note>

</notes>

This is valid because:

<notes> contains two <note> elements.
Even if there were no <note> elements inside <notes>, it would still be valid.

the <![CDATA[ and ]]> tags wrap the text inside <title> and <content> elements. The DTD (example.dtd) defines the structure of the XML document, specifying that title and content elements contain parsed character data (#PCDATA), which can include text or CDATA sections.

CDATA sections are especially useful when you want to include blocks of text containing characters like <, >, &, or ', which have special meanings in XML. The parser treats the content inside a CDATA section as regular character data, not as markup, avoiding potential conflicts with the XML syntax.

OUTPUT

CREATING A DTD FOR AN EXISTING XML FILE.

TYPE OF DTD :

1. Internal DTD

2. External DTD

1. Internal DTD:- A DTD is referred to as an internal DTD if elements are declared within the XML files and file must be save file_name . xml . To refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the declaration works independent of an external source.

Syntax

Following is the syntax of internal DTD −

<!DOCTYPE root-element [element-declarations]>

where root-element is the name of root element and element-declarations is where you declare the elements.

Example

Simple example of internal DTD( Savefile_name . xml ):-

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?> // STANDALONE = "YES" IT MEANS THIS XML FILE IS COMPLETE BY ITSELF, AND DOESN’T NEED ANY EXTERNAL FILE LIKE A DTD (DOCUMENT TYPE DEFINITION). //

<!DOCTYPE address [

<!ELEMENT address (name,company,phone)>

<!ELEMENT name (#PCDATA)>// PCDATA means parsed character data //

<!ELEMENT company (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

<name>ajay pathak</name>

<company>MRSKMPM VC </company>

</address>

LET US GO THROUGH THE ABOVE CODE −

Start Declaration − Begin the XML declaration with the following statement.

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

DTD − Immediately after the XML header, the document type declaration follows, commonly referred to as the DOCTYPE −

<!DOCTYPE address [

The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The DOCTYPE informs the parser that a DTD is associated with this XML document.

DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you declare elements, attributes, entities, and notations.

<!ELEMENT address (name,company,phone)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT company (#PCDATA)>

<!ELEMENTphone_no (#PCDATA)>

Several elements are declared here that make up the vocabulary (terminology) of the <name> document. <!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". Here #PCDATA means parse-able text data.

End Declaration − Finally, the declaration section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This effectively ends the definition, and thereafter, the XML document follows immediately.

VVI : RULES FOR INTERNAL DTD

1. THE DOCUMENT TYPE DECLARATION MUST APPEAR AT THE START OF THE DOCUMENT (PRECEDED ONLY BY THE XML HEADER) − IT IS NOT PERMITTED ANYWHERE ELSE WITHIN THE DOCUMENT.

2. SIMILAR TO THE DOCTYPE DECLARATION, THE ELEMENT DECLARATIONS MUST START WITH AN EXCLAMATION MARK.

3. THE NAME IN THE DOCUMENT TYPE DECLARATION MUST MATCH THE ELEMENT TYPE OF THE ROOT ELEMENT.

EXAMPLE : (1) PROGRAM OF INTERNAL DTD FOR PRACTICAL

Save file_name .xml

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>

<!DOCTYPE address [

<!ELEMENT address (name,company,phone)>

<!ELEMENT name (#PCDATA)> // PCDATA means parsed character data //

<!ELEMENT company (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

<name>ajay pathak</name>

<company>MRSKMPM VC </company>

</address>

EXAMPLE : (2) PROGRAM OF INTERNAL DTD FOR PRACTICAL

Save file_name .xml

<?xml version="1.0"?>

<!DOCTYPE note [

<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

<note>

<from>PAWAN, RAJU SINGH </from>

<heading>MEETING </heading>

<body>Meeting on Next Sunday Morning at 12 Noon.</body>

</note>

OUTPUT

EXAMPLE : (3) PROGRAM OF INTERNAL DTD FOR PRACTICAL

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note [

<!ELEMENT note (to, from, message)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT message (#PCDATA)>

<note>

<from>Seema</from>

<message>Hello! How are you?</message>

</note>

2.. External DTD : SAVE AS FILE_NAME.DTD ,

In External DTD elements are declared outside the XML file. They are accessed by specifying the system attributes which may be either the legal .dtd file or a valid URL. To refer it as external DTD, standalone attribute in the XML declaration must be set as no.

This means, declaration includes information from the external source.

Syntax

Following is the syntax for external DTD −

<!DOCTYPE root-element SYSTEM "file-name">

WHERE FILE-NAME IS THE FILE WITH . DTD EXTENSION.

Example

The following example shows external DTD usage −

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?> //STANDALONE = "NO" THIS MEANS THE XML NEEDS HELP FROM ANOTHER FILE, USUALLY AN EXTERNAL DTD (DOCUMENT TYPE DEFINITION), TO UNDERSTAND OR VALIDATE THE STRUCTURE. //

<!DOCTYPE address SYSTEM "address.dtd"> // HERE WE ARE ASSIGNED / CALLED ("address.dtd") EXTERNAL .DTD FILE WHICH IS CREATED IN OTHER FILE//

<name>AJA PATHAK</name>

<company>MRSKMPM VC </company>

</address>

The content of the DTD file address.dtd is as shown −

<!ELEMENT address (name,company,phone)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT company (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

Explanation Types

You can refer to an external DTD by using either system identifiers or public identifiers.

System Identifiers

A system identifier enables you to specify the location of an external file containing DTD declarations. Syntax is as follows −

<!DOCTYPE name SYSTEM "address.dtd" [...]>

As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of the document.

EXAMPLE : (1) PROGRAM OF EXTERNAL DTD FOR PRACTICAL

SAVE AS FILE_NAME . XML

<?xml version="1.0"?>

<!DOCTYPE note SYSTEM "note.dtd">

<note>

<to>AJAY KUMAR PATHAK</to>

<from>PAWAN, RAJ SINGH</from>

<heading>MEETING </heading>

<body>Meeting on NEXT SUNDAY Morning at 12 NOON.</body>

</note>

SAVE AS FILE_NAME .DTD (Note.dtd)

<!DOCTYPE note [

<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

OUTPUT

THE END

THE CONCEPT OF A VALID XML DOCUMENT:-

Most XML browsers will check your document to see if it is well formed. Some of them can also check whether it's valid. An XML document is valid if there is a document type definition (DTD) or XML schema associated with it, and if the document complies with that DTD or schema.

An XML document is called valid when: When , It is well-formed AND, It follows the rules defined in a DTD (Document Type Definition) or XML Schema (XSD) , So, "valid" means the XML file is correct in structure and it follows a defined set of rules.

Well-formed means:

1. Tags are properly opened and closed.,,

2. Tags are nested correctly.,,

3. There is one root element.,,

4. Attribute values are in quotes.

Example (Well-formed XML):

</student>

Valid XML (with DTD) Example: Valid XML with Internal DTD

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE student [

<!ELEMENT student (name, age)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT age (#PCDATA)>

</student>

(FOR MORE , WELL FORMED PLEASE SEE THE PREVIOUS ABOVE NOTES AND

DTD – COMPONENTS :-

A DTD will basically contain declarations of the following XML components

(1)..Element : - XML elements can be defined as building blocks of an XML document. Elements can behave as a container to hold text, elements, attributes, media objects or mix of all.

A DTD element is declared with an ELEMENT declaration. When an XML file is validated by DTD, parser initially checks for the root element and then the child elements are validated.

Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or empty elements.

Example

A simple example of XML elements

<name>

AJAY PATHAK

</name>

As we can, see we have defined a <name> tag. There's a text between start and end tag of <name>. Elements,

(2)..Attributes : -

Attributes are part of the XML elements. An element can have any number of unique attributes. Attributes give more information about the XML element or more precisely ( exactly) it defines a property of the element. An XML attribute is always a name-value pair.

Example

A simple example of XML attributes

Here img is the element name whereas src is an attribute name and flower.jpg is a value given for the attribute src.

If attributes are used in an XML DTD then these need to be declared.

(3)..Entities:-Entities are placeholders in XML, Entities are used to define shortcuts to special characters within the XML documents. Entities can be primarily of four types

( Note:- Entities also divided in two parts (a) Internal entities (b) External entities

(a) Internal entities :- If an entity is declared within a DTD it is called as internal entity.

Syntax

Following is the syntax for internal entity declaration:-

<!ENTITYentity_name "entity_value">

In the above syntax

entity_name is the name of entity followed by its value within the double quotes or single quote.

entity_value holds the value for the entity name.

The entity value of the Internal Entity is de-referenced by adding prefix & to the entity name i.e. &entity_name.

Example of internal entities :-

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>

<!DOCTYPE address [

<!ELEMENT address (#PCDATA)>

<!ENTITY name "RAJ SINGH">

<!ENTITY company "RAJSINGH.COM">

<!ENTITYphone_no "12345678">

&name;

&company;

&phone_no;

</address>

NOTE:- SAVE THIS FILE AS SAMPLE.XML AND OPEN IT IN ANY BROWSER, YOU WILL NOTICE THAT THE ENTITY VALUES FOR NAME, COMPANY, PHONE_NO ARE REPLACED RESPECTIVELY.

(b) External entities :-

If an entity is declared outside a DTD it is called as external entity. You can refer to an external Entity by

either using system identifiers or public identifiers.

Syntax

Following is the syntax for External Entity declaration −

<!ENTITY name SYSTEM "URI/URL">

In the above syntax :−

name is the name of entity.

SYSTEM is the keyword.

URI/URL is the address of the external source enclosed within the double or single quotes.

Types of external entities :-

You can refer to an external DTD by either using −

(1).. System Identifiers − A system identifier enables you to specify the location of an external file containing DTD declarations.

As you can see it contains keyword SYSTEM and a URI reference pointing to the document's location. Syntax is as follows −

<!DOCTYPE name SYSTEM "address.dtd" [...]>

(2).. Public Identifiers − Public identifiers provide a mechanism to locate DTD resources and are written as below −

As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a catalog. Public identifiers can follow any format; however, a commonly used format is called Formal Public Identifiers, or FPIs.

EXAMPLE:-<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">

Example

Let us understand the external entity with the following example −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>

<!DOCTYPE address SYSTEM "address.dtd">

<name>

RAJ SINGH

</name>

RAJSINGH.COM

</company>

<phone>

12345678

</phone>

</address>

BELOW IS THE CONTENT OF THE DTD FILE ADDRESS.DTD

<!ELEMENT address (name, company, phone)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT company (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

THE END

(1).. Built-in entities or Entities references or pre define Entities

(2).. Character entities

(3). General entities

(4). Parameter entities

(1).. Built-in entities or Entities references or pre define Entities :- In XML, predefined entity references are used to represent special characters that have special meanings in XML syntax. These entities help in displaying reserved characters within XML content without causing syntax errors.

In general, you can use these entity references anywhere. You can also use normal text within the XML document, such as in element contents and attribute values.

There are five built-in entities that play in well-formed XML, they are :-

(1). < for < (less than)

(2). > for > (greater than)

(3). & for & (ampersand)

(4). " for " (double quote)

(5). ' for ' (single quote or apostrophe)

EXAMPLE OF BUILT-IN ENTITIES OR ENTITIES REFERENCES OR PRE DEFINE ENTITIES, HOW TO USE IN PROGRAM PRACTICAL

SAVE AS FILE_NAME .XML

<?xml version="1.0" encoding="UTF-8"?>

<title>Example of Predefined Entity References</title>

<example1> xml & html </example1> // THIS IS WRONG TAG //

<example1> xml & html </example1> // THIS IS CORRECT TAG //

<example2> 10 < 20 </example2> // THIS IS WRONG TAG //

<example2> 10 < </example2> // THIS IS CORRECT TAG //

<example3>20 >10 </example3> // THIS IS WRONG TAG //

<example3>20 > 10</example3> // THIS IS CORRECT TAG //

<example4>xml “ “html </example4> // THIS IS WRONG TAG //

<example4>xml " html</example4> // THIS IS CORRECT TAG //

<example5>xml ‘ ‘ html</example5> // THIS IS WRONG TAG //

<example5>xml ' html</example5> // THIS IS CORRECT TAG //

<!-- //(NOTE IF WE WANT TO WRITE < (LESS THEN) IN XML FILE SO WE CAN NOT WRITE IT, THERE FOR IN PLACE OF <, >, AND ,WE CAN WRITE ONLY < , > , & etc.// -- !>

This is an example of using predefined entity references in XML. </paragraph>

XML supports special characters like &lt;, &gt;, &amp;, &quot;, and &apos;.

</paragraph>

<special_characters>

< > & " '

</special_characters>

</content>

</document>

OUTPUT

IN ABOVE EXAMPLE:

The <title> element contains simple text without any special characters.

The <paragraph> elements showcase text content using <,>, &, ", and ' to display the reserved characters <, >, &, ", and '.

The <special_characters> element directly displays the reserved characters using their respective entity references.

When this XML file is parsed, the XML processor will interpret the entity references and display the reserved characters as intended without causing syntax errors.

(2).. Character entities : -

Character Entities are used to name some of the entities which are symbolic representation of information i.e characters that are difficult or impossible to type can be substituted by Character Entities.

Example

Following example demonstrates the character entity declaration −

<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?>

<!DOCTYPE author[

<!ELEMENT author (#PCDATA)>

<!ENTITY writer "AJAY PATHAK">

<!ENTITY copyright "{">

<author>&writer;&copyright;</author>

You will notice here we have used { as value for copyright character. Save this file as sample.xml and open it in your browser and you will see that copyright is replaced by the character ©.

(3). General entities : -

General entities must be declared within the DTD before they can be used within an XML document. Instead of representing only a single character, general entities can represent characters, paragraphs, and even entire documents.

Syntax

To declare a general entity, use a declaration of this general form in your DTD −

<!ENTITYename "text">

Example

Following example demonstrates the general entity declaration −

<?xml version = "1.0"?>

<!DOCTYPE note [

<!ENTITY source-text "AJAYPATHAK">

<note>

&source-text;

</note>

Whenever an XML parser encounters a reference to source-text entity, it will supply the replacement text to the application at the point of the reference.

(4). Parameter entities : -

The purpose of a parameter entity is to enable you to create reusable sections of replacement text.

Syntax

Following is the syntax for parameter entity declaration −

<!ENTITY % ename "entity_value">

ü entity_value is any character that is not an '&', '%' or ' " '.

Example

Following example demonstrates the parameter entity declaration. Suppose you have element declarations as below –

<!ELEMENT residence (name, street, pincode, city, phone)>

<!ELEMENT apartment (name, street, pincode, city, phone)>

<!ELEMENT office (name, street, pincode, city, phone)>

<!ELEMENT shop (name, street, pincode, city, phone)>

Now suppose you want to add additional eleement country, then then you need to add it to all four declarations. Hence we can go for a parameter entity reference. Now using parameter entity reference the above example will be −

<!ENTITY % area "name, street, pincode, city">

<!ENTITY % contact "phone">

Parameter entities are dereferenced in the same way as a general entity reference, only with a percent sign instead of an ampersand −

<!ELEMENT residence (%area;, %contact;)>

<!ELEMENT apartment (%area;, %contact;)>

<!ELEMENT office (%area;, %contact;)>

<!ELEMENT shop (%area;, %contact;)>

When the parser reads these declarations, it substitutes the entity's replacement text for the entity reference.

END OF THE UNDERSTANDING XML , UNIT 2 IS COMPLETED

cswithpathak BY DR. AJAY KUMAR PATHAK, ASSISTANT PROFESSOR

Monday, March 16, 2026

UNIT 2 MJ–12 (Th):- WEB PROGRAMMING (UNIT NAME :- UNDERSTANDING XML )

Example Program: Let’s say we want to write a paragraph where some text is bold and some is italic.

Optional: Sample DTD for Mixed Content

No comments:

Post a Comment

UNIT 5 SOFTWARE TESTING (UNIT NAME) :- TEST AUTOMATION TOOLS AND EMERGING TRENDS

Report Abuse

MENU FOR SEARCH