crosspad communication protocol
$Id: PROTOCOL,v 1.11 1998/11/11 06:42:04 itojun Exp $


Protocol
========
crosspad>pc: 02 xx xx xx xx cc
	02: sync byte (we may be able to deduce the communication baud rate
		by this byte)
	xx xx xx xx: data length
	cc: checksum, xor'ing all "xx" bytes (correct?)

pc>crosspad: 06
	06: acknowledgement

crosspad>pc: 01 ss SS xx xx .... yy
	01: sync byte
	ss: sector# (starts from 00)
	SS: ff ^ sector# (starts from ff, and decends)
	xx xx: data length of the sector, 0x1000 (4096) in maximum case
	....: (data length) bytes
	yy: extra byte, meaning unknown (checksum?)

	NOTE: what happens if sector# exceeds 0xff?

pc>crosspad: 06
	06: acknowledgement

crosspad>pc: 07
	07: no more to send


File format
===========
NOTE: this is not a *.nbk file format.  This section describes the raw data 
sent from crosspad device.

** Date format

MM DD YY HH MM SS:
	all digits are in BCD.  For example, "Fri Sep 11 21:38:52 JST 1998"
	would be 09 11 98 21 38 52.

** format outline for "version 0" format

	There are several versions in encoding ink data.  CrossPad emits
	"version 0" format of the data.

	A data file contains one or more segments.
	A segment is formatted as follows:

	00 00 00 00 xx MM DD YY HH MM SS yy ...
		00 00 00 00: version identifier.  this denotes "version 0"
			format.
		xx: segment code
		MM DD YY HH MM SS: the date the segment was generated.
		yy ...: data, length and format determined by code

	NOTE: The data is like transaction log.  The order of data stream is
	defined by date/time.  Therefore, page# may go back and forth if the
	user flip page back and forth and write things on various pages.

** Segments

01: huffman encoded stroke data (data len=variable)
	00 00 00 00 01 MM DD YY HH MM SS ll xx xx yy yy zz ..... ff ff pp pp
		ll: data length from "ll" to "pp pp"
		xx xx yy yy: x/y coordinate of starting point
		zz ...: huffman encoded stroke data
		ff ff: terminator of huffman encoded stroke data
			(it is actually 0xff 0xff)
		pp pp: # of points described by huffman encoded stroke data,
			includes starting point

02: huffman encoded stroke data (data len=variable)
	Mostly same as code=01, but code=02 indicates that the pen has
	left crosspad once and reached crosspad again (i.e. code=02 seems to
	mean that radio receiver in the pad have lost the transmitter in the
	pen)

03: keyword seletion (data len=0x05?)
	00 00 00 00 03 MM DD YY HH MM SS 04 xx xx xx xx
		04: maybe length?
		xx xx xx xx: unknown, usually 00 xx 00 xx

04: page # (data len=0x04)
	00 00 00 00 04 MM DD YY HH MM SS xx xx xx xx
		xx xx xx xx: page # equals to this value from here
		NOTE: The data from the pad is like transaction log.
			The order of data stream is defined by date/time.
			Therefore, page# may go back and forth if the user
			flip page back and forth.

05: clock adjust (data len=0x06)
	00 00 00 00 05 MM DD YY HH MM SS MM DD YY HH MM SS 
		MM DD YY HH MM SS: the date/time clock was updated
			(usually same as the first set of date/time data)

06: various attributes (data len=0x0e)
	00 00 00 00 06 MM DD YY HH MM SS 00 00 00 01 02 04 00 d8 01 18 01 00 fe xx
		00 00 00 01 02 04 00 d8 01 18 01 00 fe: unknown
		xx: huffman encoding type (described later)
			Also, coodinate system needs to be shifted to some
			extent.

		NOTE: appears in original CrossPad only.

0a: filename? (data len=0x08)
	00 00 00 00 0a MM DD YY HH MM SS xx xx xx xx xx xx xx xx
		xx ...: ASCII string, "File0001" by default

0d: keyword/title? (data len=0x08)
	00 00 00 00 0d MM DD YY HH MM SS xx xx xx xx xx xx xx xx
		xx ...: ASCII string, "unknown\0" by default

0e: unknown (data len=0x01)
	00 00 00 00 0e MM DD YY HH MM SS xx
		xx: unknown
			almost always 03 on data from CrossPad
			almost always 01 on data generated by SDK

1d: various attributes (data len=0x26)
	00 00 00 00 06 MM DD YY HH MM SS xx ....
		xx ...: unknown
		huffman encoding type (described later) is always "y negated"
		style.

		NOTE: appears in CrossPad XP only.

35: ??? (data len=0x00)
	00 00 00 00 39 MM DD YY HH MM SS

		NOTE: appears in CrossPad XP only.

36: bookmark (data len=0x00)
	00 00 00 00 36 MM DD YY HH MM SS 
		MM DD YY HH MM SS: the date/time the bookmark was placed to
			this page

39: ??? (data len=0x21)
	00 00 00 00 39 MM DD YY HH MM SS xx ...
		xx ...: unknown
		appears only in SDK-generated data

3a: last download date (data len=0x00)
	00 00 00 00 3a MM DD YY HH MM SS 
		MM DD YY HH MM SS: the last date/time download ("Upload Ink"
		in pad menu) was performed

3c: ??? (data len=0x0c)
	00 00 00 00 3c MM DD YY HH MM SS xx ...
		xx ...: unknown
		appears only in SDK-generated data

3d: ??? (data len=0x0a)
	00 00 00 00 3d MM DD YY HH MM SS xx ...
		xx ...: unknown
		appears only in SDK-generated data

3e: ??? (data len=0x04)
	00 00 00 00 3d MM DD YY HH MM SS xx ...
		xx ...: unknown
		appears only in SDK-generated data

** Encoding/decoding stroke data
    Coordinate system
	Coodinate system is normal xy plane, Quadrant 3.
	(by going to right, x will be increased.  By going to bottom,
	y will be increased)

    Huffman table
	The following table is the huffman encoding/decoding table used
	in "version 0" format file.

	bit string	value
	---		---
	110101011111001	-16
	110101011110	-15
	1100100100	-14
	110010011	-13
	110101010	-12
	11010100	-11
	1000100		-10
	1010011		-9
	1101011		-8
	101000		-7
	110011		-6
	10000		-5
	11000		-4
	11011		-3
	1011		-2
	010		-1
	00		0
	011		1
	1001		2
	10101		3
	110100		4
	100011		5
	1100101		6
	1010010		7
	11001000	8
	10001010	9
	100010111	10
	100010110	11
	1100100101	12
	11010101110	13
	11010101100	14
	11010101101	15
	1101010111111	16
	11010101111101	18
	110101011111000xxxxxxxx	means "xxxxxxxx" in signed byte
	11010101111100000000000	means 0
	11111111	termination

	NOTE: missing values are to be filled.

	If "huffman encoding type" in segment 06 is 01, the huffman table will
	be used for both x axis and y axis.
	If "huffman encoding type" in segment 06 is 02, the huffman table is
	used for x axis.  For y axis, negate the value (i.e. bit string
	"011" means -1, not 1).

    Encoding stroke data
	Assume the following stroke:
		start from (10, 10), go through (12, 12), (12, 14), (12, 16)
	In this case, starting point for segment 01 (or 02) will be (10, 10).
	We have 4 points, including starting point.
	Movement will be endcoded by the differences between the coordinate.
	Therefore, we need to encode the following set of numbers:
		2 2 0 2 0 2
	Convert this into bit string, by using huffman table described above.
	(here let us assume that huffman encoding type is 01):
		1001 1001 00 1001 00 1001
	By converting this to hexadecimal value, we'll get:
		99 24 90
	Resulting huffman encoded data (segment 01) will be:
		00 00 00 00 01 MM DD YY HH MM SS 0c 00 0a 00 0a 99 24 90 ff ff 00 04
		0c: length of data portion
		00 0a 00 0a: starting point is (10, 10)
		99 24 90: huffman encoded data
		ff ff: termination
		00 04: we have 4 points, including starting point

	NOTE: padding rule for the huffman encoded data portion is unknown.
		zero-fill should be okay.

    Decoding stroke data
	Let us try decoding segment 01 with the following data bytes:
		00 00 00 00 01 MM DD YY HH MM SS 0c 00 0a 00 0a 99 24 90 ff ff 00 04
	From length field (0c), data potion of the segment is:
		00 0a 00 0a 99 24 90 ff ff 00 04
	Starting point is (10, 10) since we have "00 0a 00 0a" for coordinate.
	We have 4 points, including starting point.
	Movement of the pen is described as following hexadecimal values:
		99 24 90
	Writing this in binary, we get:
		1001 1001 0010 0100 1001 0000
	By performing longest match against the huffman table, we get:
		1001 1001 00 1001 00 1001 00 00
	Convert this into movement by using the huffman table:
		2 2 0 2 0 2 0 0 
	As a result, we can understand that the segment 01 means a stroke like:
		start from (10, 10), go through (12, 12), (12, 14), (12, 16)