Class Unicode::CEncoding

Unicorn XML Toolkit
Version 1.50.00

Namespace Unicode
Class CEncoding

class CEncoding: public CInterface {
public:
    CEncoding();
    virtual ~CEncoding();
public:
    virtual const WCHAR *GetName() = 0;
    virtual int Get(
        const BYTE *pSource,
        int nSourceLength,
        int &nSourceAdvance,
        WCHAR *chBuffer,
        int nBufferLimit,
        bool &bUnderflow) = 0;
    virtual int Put(
        const WCHAR *chSource,
        int nSourceLength,
        BYTE *pBuffer,
        int nBufferLimit,
        int &nBufferAdvance,
        bool &bUnsupported) = 0;
    virtual int Skip(
        const WCHAR *chSource,
        int nSourceLength) = 0;
    virtual const WCHAR *Signature(int &nLength) = 0;
    };
typedef XInterface<CEncoding> XEncoding;

The abstract interface to Unicode encoding algorithms.

An application must provide an implementation of this interface for each supported encoding. Standard subclasses implementing UTF-8, UTF-16 and ISO-8859-1 are provided by this namespace.

Since:: 1.00.00
Version:: 1.50.00
Author:: Alexey Gokhberg

Constructor/Destructor Summary
`CEncoding ();` ��Constructs the character encoding.
`~CEncoding ();` ��Destroys the character encoding.

�

Function Summary
`const�WCHAR�*`	`GetName ();` ��Returns a name of this character encoding.
`int`	`Get (const BYTE pSource, int nSourceLength, int &nSourceAdvance, WCHAR chBuffer, int nBufferLimit, bool &bUnderflow);` ��Converts (decodes) an array of bytes into an array of Unicode characters according to this character encoding.
`int`	`Put (const WCHAR chSource, int nSourceLength, BYTE pBuffer, int nBufferLimit, int &nBufferAdvance, bool &bUnsupported);` ��Converts (encodes) an array of characters into an array of bytes according to this character encoding.
`int`	`Skip (const WCHAR *chSource, int nSourceLength);` ��Skips a contiguous group of characters, unsupported by this encoding.
`const�WCHAR�*`	`Signature (int &nLength);` ��Returns an array of characters which is used as a signature by this encoding.

�

Constructor/Destructor Detail

CEncoding

CEncoding();

Constructs the character encoding.

~CEncoding

virtual ~CEncoding();

Destroys the character encoding.

Function Detail

GetName

virtual const WCHAR *GetName() = 0;

Returns a name of this character encoding.

Get

virtual int Get(
    const BYTE *pSource,
    int nSourceLength,
    int &nSourceAdvance,
    WCHAR *chBuffer,
    int nBufferLimit,
    bool &bUnderflow) = 0;

Converts (decodes) an array of bytes into an array of Unicode characters according to this character encoding.

This function tries to convert as many bytes from the input array as possible. However, conversion of the entire input array may be impossible, if either the end of the result buffer is reached, or if the last few bytes of the input array do not provide a full representation of a valid Unicode character (the latter case may be encountered when the large byte array is processed in smaller buffers, and a sequence of bytes representing a particular character is split at the buffer boundary).

Parameters:: pSource - the input array of bytes; nSourceLength - the number of bytes in the input array; nSourceAdvance - on return, the number of bytes in the input array actually converted; chBuffer - the result buffer; on return, contains the result of conversion; nBufferLimit - the capacity of the result buffer, in characters; bUnderflow - on return, true if the few last bytes if the input array do not provide the full representation of a valid Unicode character; false otherwise

Returns:: the number of characters placed into the result buffer

Put

virtual int Put(
    const WCHAR *chSource,
    int nSourceLength,
    BYTE *pBuffer,
    int nBufferLimit,
    int &nBufferAdvance,
    bool &bUnsupported) = 0;

Converts (encodes) an array of characters into an array of bytes according to this character encoding.

This function tries to convert as many characters from the input array as possible. However, conversion of the entire input array may be impossible, if either the end of the result buffer is reached, or a character not supported by this encoding is encountered in the input buffer.

Parameters:: chSource - the input array of characters; nSourceLength - the number of characters in the input array; pBuffer - the result buffer; on return, contains the result of conversion; nBufferLimit - the capacity of the result buffer, in bytes; nBufferAdvance - on return, the number of bytes placed into the result buffer; bUnsupported - on return, true if a character not supported by this encoding was encountered in the input buffer

Returns:: the number of characters in the input array actually converted

Skip

virtual int Skip(
    const WCHAR *chSource,
    int nSourceLength) = 0;

Skips a contiguous group of characters, unsupported by this encoding.

If the first character in the specified input buffer is supported by this encoding, this function returns 0 . If the first character in the input buffer is not supported by this encoding, this function returns the length of the longest contiguous group of input characters unsupported by this encoding, started with the first character in the buffer.

Parameters:: chSource - the input buffer; nSourceLength - the number of characters in the input buffer

Returns:: the number of characters skipped

Signature

virtual const WCHAR *Signature(int &nLength) = 0;

Returns an array of characters which is used as a signature by this encoding.

Some encoding algorithms interpret one or more leading characters in the encoded text as signatures, which contain hints about details of the encoding algorithm (for example, the byte order mark).

Parameters:: nLength - on return, the number of characters in the signature; 0 if this encoding does not use signatures

Returns:: the signature array; NULL if this encoding does not use signatures