UNICODE
=======

Log4cplus uses the expression "UNICODE" in at least two not so equal
meanings:

1. the [Unicode][unicode] standard as defined by the Unicode Consortium

2. compiler's and/or C++ standard library's support for strings of
   `wchar_t`s and their manipulation

[unicode]: http://unicode.org/


`wchar_t` support
-----------------

Log4cplus is aimed to be portable and to have as little 3rd party
dependencies as possible. To fulfill this goal it has to use
facilities offered by the operating systems and standard libraries it
runs on. To offer the best possible level of support of national
character, it has to support usage of `wchar_t` and it has to use
`wchar_t` support (especially on Windows) provided by operating system
and standard C and C++ libraries.

This approach to portability has some limitations. One of the
limitations is lacking support for C++ locales in various operating
systems and standard C++ libraries. Some standard C++ libraries do not
support other than the "C" and "POSIX" locales. This usually means
that `wchar_t`↔`char` conversion using `std::codecvt<>` facet is
impossible. On such deficient platforms, log4cplus can use either
standard C locale support or `iconv()` (through libiconv or
built--in).


Unicode and file appenders
--------------------------

Another limitation related to Unicode support is then inability to
write `wchar_t` messages that contain national characters that do not
map to any code point in single byte code page to log files using
`FileAppender`. This is a problem mainly on Windows. Linux and other
Unix--like systems can avoid it because they do not need to use
`wchar_t` interfaces to have Unicode aware applications. They usually
(as of year 2012) use UTF-8 based locales. With proper C++ locale
setup in client applications, national characters can come through
into log files unharmed. But if they choose to use `wchar_t` strings,
they face the problem as well.


Unix--like platforms
--------------------

To support output of non-ASCII characters in `wchar_t` message on
Unix--like platforms, it is necessary to use UTF-8 based locale (e.g.,
`en_US.UTF-8`) and to set up global locale with `std::codecvt<>` facet
or imbue individual `FileAppender`s with that facet. The following
code can be used to get such `std::locale` instance and to set it into
global locale:

~~~~{.cpp}
std::locale::global (     // set global locale
    std::locale (         // using std::locale constructed from
        std::locale (),   // global locale
                          // and codecvt facet from user locale
        new std::codecvt_byname<wchar_t, char, std::mbstate_t>("")));
~~~~


Windows
-------

Windows do not support UTF-8 based locales. The above approach will
yield a `std::locale` instance converting `wchar_t`s to current
process' code page. Such locale will not be able to convert Unicode
code points outside the process' code page. This is true at least with
the `std::codecvt` facet implemented in Visual Studio 2010. Instead,
with Visual Studio 2010 and later, it is possible to use
`std::codecvt_utf8` facet:

~~~~{.cpp}
std::locale::global (     // set global locale
    std::locale (         // using std::locale constructed from
        std::locale (),   // global locale
                          // and codecvt_utf8 facet
        new std::codecvt_utf8<tchar, 0x10FFFF,
            static_cast<std::codecvt_mode>(std::consume_header
                | std::little_endian)>));
~~~~