The XML C parser and toolkit of Gnome

Note: this is the flat content of the web site

libxml, a.k.a. gnome-xml

"Programming with libxml2 is like the thrilling embrace of an exotic stranger." Mark Pilgrim

Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), it is free software available under the MIT License. XML itself is a metalanguage to design markup languages, i.e. text language where semantic and structure are added to the content using extra "markup" information enclosed between angle brackets. HTML is the most well-known markup language. Though the library is written in C a variety of language bindings make it available in other environments.

Libxml2 is known to be very portable, the library should build and work without serious troubles on a variety of systems (Linux, Unix, Windows, CygWin, MacOS, MacOS X, RISC Os, OS/2, VMS, QNX, MVS, ...)

Libxml2 implements a number of existing standards related to markup languages:

In most cases libxml2 tries to implement the specifications in a relatively strictly compliant way. As of release 2.4.16, libxml2 passed all 1800+ tests from the OASIS XML Tests Suite.

To some extent libxml2 provides support for the following additional specifications but doesn't claim to implement them completely:

A partial implementation of XML Schemas Part 1: Structure is being worked on but it would be far too early to make any conformance statement about it at the moment.

Separate documents:

Logo designed by Marc Liyanage.

Introduction

This document describes libxml, the XML C parser and toolkit developed for the Gnome project. XML is a standard for building tag-based structured documents/data.

Here are some key points about libxml:

Warning: unless you are forced to because your application links with a Gnome-1.X library requiring it, Do Not Use libxml1, use libxml2

FAQ

Table of Contents:

License(s)

  1. Licensing Terms for libxml

    libxml2 is released under the MIT License; see the file Copyright in the distribution for the precise wording

  2. Can I embed libxml2 in a proprietary application ?

    Yes. The MIT License allows you to keep proprietary the changes you made to libxml, but it would be graceful to send-back bug fixes and improvements as patches for possible incorporation in the main development tree.

Installation

  1. Do Not Use libxml1, use libxml2
  2. Where can I get libxml ?

    The original distribution comes from xmlsoft.org or gnome.org

    Most Linux and BSD distributions include libxml, this is probably the safer way for end-users to use libxml.

    David Doolin provides precompiled Windows versions at http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/

  3. I see libxml and libxml2 releases, which one should I install ?
  4. I can't install the libxml package, it conflicts with libxml0

    You probably have an old libxml0 package used to provide the shared library for libxml.so.0, you can probably safely remove it. The libxml packages provided on xmlsoft.org provide libxml.so.0

  5. I can't install the libxml(2) RPM package due to failed dependencies

    The most generic solution is to re-fetch the latest src.rpm , and rebuild it locally with

    rpm --rebuild libxml(2)-xxx.src.rpm.

    If everything goes well it will generate two binary rpm packages (one providing the shared libs and xmllint, and the other one, the -devel package, providing includes, static libraries and scripts needed to build applications with libxml(2)) that you can install locally.

Compilation

  1. What is the process to compile libxml2 ?

    As most UNIX libraries libxml2 follows the "standard":

    gunzip -c xxx.tar.gz | tar xvf -

    cd libxml-xxxx

    ./configure --help

    to see the options, then the compilation/installation proper

    ./configure [possible options]

    make

    make install

    At that point you may have to rerun ldconfig or a similar utility to update your list of installed shared libs.

  2. What other libraries are needed to compile/install libxml2 ?

    Libxml2 does not require any other library, the normal C ANSI API should be sufficient (please report any violation to this rule you may find).

    However if found at configuration time libxml2 will detect and use the following libs:

  3. Make check fails on some platforms

    Sometimes the regression tests' results don't completely match the value produced by the parser, and the makefile uses diff to print the delta. On some platforms the diff return breaks the compilation process; if the diff is small this is probably not a serious problem.

    Sometimes (especially on Solaris) make checks fail due to limitations in make. Try using GNU-make instead.

  4. I use the SVN version and there is no configure script

    The configure script (and other Makefiles) are generated. Use the autogen.sh script to regenerate the configure script and Makefiles, like:

    ./autogen.sh --prefix=/usr --disable-shared

  5. I have troubles when running make tests with gcc-3.0

    It seems the initial release of gcc-3.0 has a problem with the optimizer which miscompiles the URI module. Please use another compiler.

Developer corner

  1. Troubles compiling or linking programs using libxml2

    Usually the problem comes from the fact that the compiler doesn't get the right compilation or linking flags. There is a small shell script xml2-config which is installed as part of libxml2 usual install process which provides those flags. Use

    xml2-config --cflags

    to get the compilation flags and

    xml2-config --libs

    to get the linker flags. Usually this is done directly from the Makefile as:

    CFLAGS=`xml2-config --cflags`

    LIBS=`xml2-config --libs`

  2. I want to install my own copy of libxml2 in my home directory and link my programs against it, but it doesn't work

    There are many different ways to accomplish this. Here is one way to do this under Linux. Suppose your home directory is /home/user. Then:

  3. xmlDocDump() generates output on one line.

    Libxml2 will not invent spaces in the content of a document since all spaces in the content of a document are significant. If you build a tree from the API and want indentation:

    1. the correct way is to generate those yourself too.
    2. the dangerous way is to ask libxml2 to add those blanks to your content modifying the content of your document in the process. The result may not be what you expect. There is NO way to guarantee that such a modification won't affect other parts of the content of your document. See xmlKeepBlanksDefault () and xmlSaveFormatFile ()
  4. Extra nodes in the document:

    For an XML file as below:

    <?xml version="1.0"?>
    <PLAN xmlns="http://www.argus.ca/autotest/1.0/">
    <NODE CommFlag="0"/>
    <NODE CommFlag="1"/>
    </PLAN>

    after parsing it with the function pxmlDoc=xmlParseFile(...);

    I want to the get the content of the first node (node with the CommFlag="0")

    so I did it as following;

    xmlNodePtr pnode;
    pnode=pxmlDoc->children->children;

    but it does not work. If I change it to

    pnode=pxmlDoc->children->children->next;

    then it works. Can someone explain it to me.

    In XML all characters in the content of the document are significant including blanks and formatting line breaks.

    The extra nodes you are wondering about are just that, text nodes with the formatting spaces which are part of the document but that people tend to forget. There is a function xmlKeepBlanksDefault () to remove those at parse time, but that's an heuristic, and its use should be limited to cases where you are certain there is no mixed-content in the document.

  5. I get compilation errors of existing code like when accessing root or child fields of nodes.

    You are compiling code developed for libxml version 1 and using a libxml2 development environment. Either switch back to libxml v1 devel or even better fix the code to compile with libxml2 (or both) by following the instructions.

  6. I get compilation errors about non existing xmlRootNode or xmlChildrenNode fields.

    The source code you are using has been upgraded to be able to compile with both libxml and libxml2, but you need to install a more recent version: libxml(-devel) >= 1.8.8 or libxml2(-devel) >= 2.1.0

  7. Random crashes in threaded applications

    Read and follow all advices on the thread safety page, and make 100% sure you never call xmlCleanupParser() while the library or an XML document might still be in use by another thread.

  8. The example provided in the web page does not compile.

    It's hard to maintain the documentation in sync with the code <grin/> ...

    Check the previous points 1/ and 2/ raised before, and please send patches.

  9. Where can I get more examples and information than provided on the web page?

    Ideally a libxml2 book would be nice. I have no such plan ... But you can:

  10. What about C++ ?

    libxml2 is written in pure C in order to allow easy reuse on a number of platforms, including embedded systems. I don't intend to convert to C++.

    There is however a C++ wrapper which may fulfill your needs:

  11. How to validate a document a posteriori ?

    It is possible to validate documents which had not been validated at initial parsing time or documents which have been built from scratch using the API. Use the xmlValidateDtd() function. It is also possible to simply add a DTD to an existing document:

    xmlDocPtr doc; /* your existing document */
    xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */
    
            dtd->name = xmlStrDup((xmlChar*)"root_name"); /* use the given root */
    
            doc->intSubset = dtd;
            if (doc->children == NULL) xmlAddChild((xmlNodePtr)doc, (xmlNodePtr)dtd);
            else xmlAddPrevSibling(doc->children, (xmlNodePtr)dtd);
              
  12. So what is this funky "xmlChar" used all the time?

    It is a null terminated sequence of utf-8 characters. And only utf-8! You need to convert strings encoded in different ways to utf-8 before passing them to the API. This can be accomplished with the iconv library for instance.

  13. etc ...

Developer Menu

There are several on-line resources related to using libxml:

  1. Use the search engine to look up information.
  2. Check the FAQ.
  3. Check the extensive documentation automatically extracted from code comments.
  4. Look at the documentation about libxml internationalization support.
  5. This page provides a global overview and some examples on how to use libxml.
  6. Code examples
  7. John Fleck's libxml2 tutorial: html or pdf.
  8. If you need to parse large files, check the xmlReader API tutorial
  9. James Henstridge wrote some nice documentation explaining how to use the libxml SAX interface.
  10. George Lebl wrote an article for IBM developerWorks about using libxml.
  11. Check the TODO file.
  12. Read the 1.x to 2.x upgrade path description. If you are starting a new project using libxml you should really use the 2.x version.
  13. And don't forget to look at the mailing-list archive.

Reporting bugs and getting help

Well, bugs or missing features are always possible, and I will make a point of fixing them in a timely fashion. The best way to report a bug is to use the Gnome bug tracking database (make sure to use the "libxml2" module name). I look at reports there regularly and it's good to have a reminder when a bug is still open. Be sure to specify that the bug is for the package libxml2.

For small problems you can try to get help on IRC, the #xml channel on irc.gnome.org (port 6667) usually have a few person subscribed which may help (but there is no guarantee and if a real issue is raised it should go on the mailing-list for archival).

There is also a mailing-list xml@gnome.org for libxml, with an on-line archive (old). To subscribe to this list, please visit the associated Web page and follow the instructions. Do not send code, I won't debug it (but patches are really appreciated!).

Please note that with the current amount of virus and SPAM, sending mail to the list without being subscribed won't work. There is *far too many bounces* (in the order of a thousand a day !) I cannot approve them manually anymore. If your mail to the list bounced waiting for administrator approval, it is LOST ! Repost it and fix the problem triggering the error. Also please note that emails with a legal warning asking to not copy or redistribute freely the information they contain are NOT acceptable for the mailing-list, such mail will as much as possible be discarded automatically, and are less likely to be answered if they made it to the list, DO NOT post to the list from an email address where such legal requirements are automatically added, get private paying support if you can't share information.

Check the following before posting:

Then send the bug with associated information to reproduce it to the xml@gnome.org list; if it's really libxml related I will approve it. Please do not send mail to me directly, it makes things really hard to track and in some cases I am not the best person to answer a given question, ask on the list.

To be really clear about support:

Of course, bugs reported with a suggested patch for fixing them will probably be processed faster than those without.

If you're looking for help, a quick look at the list archive may actually provide the answer. I usually send source samples when answering libxml2 usage questions. The auto-generated documentation is not as polished as I would like (i need to learn more about DocBook), but it's a good starting point.

How to help

You can help the project in various ways, the best thing to do first is to subscribe to the mailing-list as explained before, check the archives and the Gnome bug database:

  1. Provide patches when you find problems.
  2. Provide the diffs when you port libxml2 to a new platform. They may not be integrated in all cases but help pinpointing portability problems and
  3. Provide documentation fixes (either as patches to the code comments or as HTML diffs).
  4. Provide new documentations pieces (translations, examples, etc ...).
  5. Check the TODO file and try to close one of the items.
  6. Take one of the points raised in the archive or the bug database and provide a fix. Get in touch with me before to avoid synchronization problems and check that the suggested fix will fit in nicely :-)

Downloads

The latest versions of libxml2 can be found on the xmlsoft.org server ( FTP and rsync are available), there are also mirrors (Australia( Web), France) or on the Gnome FTP server as source archive , Antonin Sprinzl also provide a mirror in Austria. (NOTE that you need both the libxml(2) and libxml(2)-devel packages installed to compile applications using libxml.)

You can find all the history of libxml(2) and libxslt releases in the old directory. The precompiled Windows binaries made by Igor Zlatovic are available in the win32 directory.

Binary ports:

If you know other supported binary ports, please contact me.

Snapshot:

Contributions:

I do accept external contributions, especially if compiling on another platform, get in touch with the list to upload the package, wrappers for various languages have been provided, and can be found in the bindings section

Libxml2 is also available from SVN:

Releases

Items not finished and worked on, get in touch with the list if you want to help those

The change log describes the recents commits to the SVN code base.

Here is the list of public releases:

2.7.3: Jan 18 2009

2.7.2: Oct 3 2008

2.7.1: Sep 1 2008

2.7.0: Aug 30 2008

2.6.32: Apr 8 2008

2.6.31: Jan 11 2008

2.6.30: Aug 23 2007

2.6.29: Jun 12 2007

2.6.28: Apr 17 2007

2.6.27: Oct 25 2006

2.6.26: Jun 6 2006

2.6.25: Jun 6 2006:

Do not use or package 2.6.25

2.6.24: Apr 28 2006

2.6.23: Jan 5 2006

2.6.22: Sep 12 2005

2.6.21: Sep 4 2005

2.6.20: Jul 10 2005

2.6.19: Apr 02 2005

2.6.18: Mar 13 2005

2.6.17: Jan 16 2005

2.6.16: Nov 10 2004

2.6.15: Oct 27 2004

2.6.14: Sep 29 2004

2.6.13: Aug 31 2004

2.6.12: Aug 22 2004

2.6.11: July 5 2004

2.6.10: May 17 2004

2.6.9: Apr 18 2004

2.6.8: Mar 23 2004

2.6.7: Feb 23 2004

2.6.6: Feb 12 2004

2.6.5: Jan 25 2004

2.6.4: Dec 24 2003

2.6.3: Dec 10 2003

2.6.2: Nov 4 2003

2.6.1: Oct 28 2003

2.6.0: Oct 20 2003

2.5.11: Sep 9 2003

A bugfix only release:

2.5.10: Aug 15 2003

A bugfixes only release

2.5.9: Aug 9 2003

2.5.8: Jul 6 2003

2.5.7: Apr 25 2003

2.5.6: Apr 1 2003

2.5.5: Mar 24 2003

2.5.4: Feb 20 2003

2.5.3: Feb 10 2003

2.5.2: Feb 5 2003

2.5.1: Jan 8 2003

2.5.0: Jan 6 2003

2.4.30: Dec 12 2002

2.4.29: Dec 11 2002

2.4.28: Nov 22 2002

2.4.27: Nov 17 2002

2.4.26: Oct 18 2002

2.4.25: Sep 26 2002

2.4.24: Aug 22 2002

2.4.23: July 6 2002

2.4.22: May 27 2002

2.4.21: Apr 29 2002

This release is both a bug fix release and also contains the early XML Schemas structures and datatypes code, beware, all interfaces are likely to change, there is huge holes, it is clearly a work in progress and don't even think of putting this code in a production system, it's actually not compiled in by default. The real fixes are:

2.4.20: Apr 15 2002

2.4.19: Mar 25 2002

2.4.18: Mar 18 2002

2.4.17: Mar 8 2002

2.4.16: Feb 20 2002

2.4.15: Feb 11 2002

2.4.14: Feb 8 2002

2.4.13: Jan 14 2002

2.4.12: Dec 7 2001

2.4.11: Nov 26 2001

2.4.10: Nov 10 2001

2.4.9: Nov 6 2001

2.4.8: Nov 4 2001

2.4.7: Oct 30 2001

2.4.6: Oct 10 2001

2.4.5: Sep 14 2001

1.8.16: Sep 14 2001

2.4.4: Sep 12 2001

2.4.3: Aug 23 2001

2.4.2: Aug 15 2001

2.4.1: July 24 2001

2.4.0: July 10 2001

2.3.14: July 5 2001

2.3.13: June 28 2001

1.8.14: June 28 2001

2.3.12: June 26 2001

2.3.11: June 17 2001

2.3.10: June 1 2001

2.3.9: May 19 2001

Lots of bugfixes, and added a basic SGML catalog support:

1.8.13: May 14 2001

2.3.8: May 3 2001

2.3.7: April 22 2001

2.3.6: April 8 2001

2.3.5: Mar 23 2001

2.3.4: Mar 10 2001

2.3.3: Mar 1 2001

2.3.2: Feb 24 2001

2.3.1: Feb 15 2001

2.3.0: Feb 8 2001 (2.2.12 was on 25 Jan but I didn't kept track)

2.2.11: Jan 4 2001

2.2.10: Nov 25 2000

2.2.9: Nov 25 2000

2.2.8: Nov 13 2000

2.2.7: Oct 31 2000

2.2.6: Oct 25 2000:

2.2.5: Oct 15 2000:

2.2.4: Oct 1 2000:

2.2.3: Sep 17 2000

1.8.10: Sep 6 2000

2.2.2: August 12 2000

2.2.1: July 21 2000

2.2.0: July 14 2000

1.8.9: July 9 2000

2.1.1: July 1 2000

2.1.0 and 1.8.8: June 29 2000

2.0.0: Apr 12 2000

2.0.0beta: Mar 14 2000

1.8.7: Mar 6 2000

1.8.6: Jan 31 2000

1.8.5: Jan 21 2000

1.8.4: Jan 13 2000

1.8.3: Jan 5 2000

1.8.2: Dec 21 1999

1.8.1: Dec 18 1999

1.8.0: Dec 12 1999

1.7.4: Oct 25 1999

1.7.3: Sep 29 1999

1.7.1: Sep 24 1999

1.7.0: Sep 23 1999

XML

XML is a standard for markup-based structured documents. Here is an example XML document:

<?xml version="1.0"?>
<EXAMPLE prop1="gnome is great" prop2="&amp; linux too">
  <head>
   <title>Welcome to Gnome</title>
  </head>
  <chapter>
   <title>The Linux adventure</title>
   <p>bla bla bla ...</p>
   <image href="linus.gif"/>
   <p>...</p>
  </chapter>
</EXAMPLE>

The first line specifies that it is an XML document and gives useful information about its encoding. Then the rest of the document is a text format whose structure is specified by tags between brackets. Each tag opened has to be closed. XML is pedantic about this. However, if a tag is empty (no content), a single tag can serve as both the opening and closing tag if it ends with /> rather than with >. Note that, for example, the image tag has no content (just an attribute) and is closed by ending