|
|
|
|
UNICODE
|
|
|
|
|
=======
|
|
|
|
|
|
|
|
|
|
Log4cplus uses the expression "UNICODE" in at least two not so equal
|
|
|
|
|
meanings:
|
|
|
|
|
|
|
|
|
|
1. the [Unicode][unicode] standard as defined by the Unicode Consortium
|
|
|
|
|
|
|
|
|
|
2. compiler's and/or C++ standard library's support for strings of
|
|
|
|
|
`wchar_t`s and their manipulation
|
|
|
|
|
|
|
|
|
|
[unicode]: http://unicode.org/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`wchar_t` support
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
Log4cplus is aimed to be portable and to have as little 3rd party
|
|
|
|
|
dependencies as possible. To fulfill this goal it has to use
|
|
|
|
|
facilities offered by the operating systems and standard libraries it
|
|
|
|
|
runs on. To offer the best possible level of support of national
|
|
|
|
|
character, it has to support usage of `wchar_t` and it has to use
|
|
|
|
|
`wchar_t` support (especially on Windows) provided by operating system
|
|
|
|
|
and standard C and C++ libraries.
|
|
|
|
|
|
|
|
|
|
This approach to portability has some limitations. One of the
|
|
|
|
|
limitations is lacking support for C++ locales in various operating
|
|
|
|
|
systems and standard C++ libraries. Some standard C++ libraries do not
|
|
|
|
|
support other than the "C" and "POSIX" locales. This usually means
|
|
|
|
|
that `wchar_t`↔`char` conversion using `std::codecvt<>` facet is
|
|
|
|
|
impossible. On such deficient platforms, log4cplus can use either
|
|
|
|
|
standard C locale support or `iconv()` (through libiconv or
|
|
|
|
|
built--in).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Unicode and file appenders
|
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
|
|
Another limitation related to Unicode support is then inability to
|
|
|
|
|
write `wchar_t` messages that contain national characters that do not
|
|
|
|
|
map to any code point in single byte code page to log files using
|
|
|
|
|
`FileAppender`. This is a problem mainly on Windows. Linux and other
|
|
|
|
|
Unix--like systems can avoid it because they do not need to use
|
|
|
|
|
`wchar_t` interfaces to have Unicode aware applications. They usually
|
|
|
|
|
(as of year 2012) use UTF-8 based locales. With proper C++ locale
|
|
|
|
|
setup in client applications, national characters can come through
|
|
|
|
|
into log files unharmed. But if they choose to use `wchar_t` strings,
|
|
|
|
|
they face the problem as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Unix--like platforms
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
To support output of non-ASCII characters in `wchar_t` message on
|
|
|
|
|
Unix--like platforms, it is necessary to use UTF-8 based locale (e.g.,
|
|
|
|
|
`en_US.UTF-8`) and to set up global locale with `std::codecvt<>` facet
|
|
|
|
|
or imbue individual `FileAppender`s with that facet. The following
|
|
|
|
|
code can be used to get such `std::locale` instance and to set it into
|
|
|
|
|
global locale:
|
|
|
|
|
|
|
|
|
|
~~~~{.cpp}
|
|
|
|
|
std::locale::global ( // set global locale
|
|
|
|
|
std::locale ( // using std::locale constructed from
|
|
|
|
|
std::locale (), // global locale
|
|
|
|
|
// and codecvt facet from user locale
|
|
|
|
|
new std::codecvt_byname<wchar_t, char, std::mbstate_t>("")));
|
|
|
|
|
~~~~
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Windows
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
Windows do not support UTF-8 based locales. The above approach will
|
|
|
|
|
yield a `std::locale` instance converting `wchar_t`s to current
|
|
|
|
|
process' code page. Such locale will not be able to convert Unicode
|
|
|
|
|
code points outside the process' code page. This is true at least with
|
|
|
|
|
the `std::codecvt` facet implemented in Visual Studio 2010. Instead,
|
|
|
|
|
with Visual Studio 2010 and later, it is possible to use
|
|
|
|
|
`std::codecvt_utf8` facet:
|
|
|
|
|
|
|
|
|
|
~~~~{.cpp}
|
|
|
|
|
std::locale::global ( // set global locale
|
|
|
|
|
std::locale ( // using std::locale constructed from
|
|
|
|
|
std::locale (), // global locale
|
|
|
|
|
// and codecvt_utf8 facet
|
|
|
|
|
new std::codecvt_utf8<tchar, 0x10FFFF,
|
|
|
|
|
static_cast<std::codecvt_mode>(std::consume_header
|
|
|
|
|
| std::little_endian)>));
|
|
|
|
|
~~~~
|