04. marts 2003 - 10:58Der er
4 kommentarer og 1 løsning
Codepage i OSQL kontra Query Analyzer
Jeg har et SQL-skript som gir forskejllig resultat, avhengig av om jeg kjører det med Query Analyzer eller OSQL. Med Query Analyzer blir "frá tjóðbankum" gjengitt korrekt, men OSQL legger den teksten i basen som "frß tj¾bankum". Hvordan får jeg OSQL til å tolke inputen sin som ISO-8859-1 data ? (Eventuelt UFT-8 eller UTF-16, bare jeg får kontroll over hva den bruker.)
Collation name er SQL_Latin1_General_CP1_CI_AI .
Skript: SET DATEFORMAT ymd DELETE ROKNSKAPARKOTA insert into ROKNSKAPARKOTA (ELK_DAGUR, ROKNSK_KT, RKT_MOT_ROKNSK_KT, ROKNSK_LYSING_TX_60, ROKNSK_LYSING_TX_38) values ( '2003-03-03', '101101100', '101101100', 'Renta frá tjóðbankum', 'Renta frá tjóðbankum' )
Den moderne arbejdsplads er i stigende grad afhængig af mødelokaler til at fremme samarbejde, men dette skift medfører også stigende sikkerhedsudfordringer.
sql_set_conversion $X->sql_set_conversion([$client_cs [, $server_cs [, $direction]]]);sql_set_conversion establishes a character conversion between client and server and/or vice versa for the given handle. If you don't provide a handle, the conversion is established for the handle returned by the first call to sql_init.
To start from the back, $direction specify in which direction you want the conversion to apply. There are three possible values:
TO_CLIENT_ONLY
Convert only data that comes back from the server.
TO_SERVER_ONLY
Convert only data that is sent to the server.
TO_SERVER_CLIENT
Convert both when sending and receiving. This is the default.
$client_cs is the character set for the data on the client side, and $server_cs is the character set for the data on the server side. The chosen characters sets do not need to agree with the official character sets of the client or the server.
Specify the character sets as code pages, with or without the leading ``CP''. For $client_cs you can also specify ``OEM'' or ``ANSI'' to use the OEM or ANSI character set of the client. These are read from the key SYSTEM\CurrentControlSet\Control\Nls\CodePage in the HKEY_LOCAL_MACHINE hive. For the server character set, you can specify ``iso_1'' as an alternative to ``CP1252'' to get Latin-1.
The default for $client_cs is the local OEM character set. The default for $server_cs is the character set of the server (SQL7 and earlier) or the default collation for the server (SQL2000 and later). (Use sp_helpsort to find out if you don't know.)
To set up conversion sql_set_conversion looks in the SYSTEM32 directory of the NT installation as pointed to by the environment variable SystemRoot. In this directory it looks for the file SSSSCCCC.CPX and if this fails an attempt is made with CCCCSSSS.CPX . SSSS is the code-page number for the server character set, and CCCC is the code-page number for the client character set. For instance the file for converting to CP850 to Latin-1 (CP1252) is 12520850.CPX. If none of the files are found, execution is aborted.
Notes:
The conversion facility was added because SQL6.5 had insuffcient support in this area. The GUI tools would unconditionally convert between ANSI and OEM when reading and saving from disk. In other situations you could not get the conversion you wanted. Tools in SQL7 improved in this area, not the least BCP which offers the full range of choices just like MSSQL::Sqllib. But OSQL in SQL7 could convert at all. OSQL in SQL2000 can, because you can now specify a data source. But you still can convert to OEM if your data is ASCII and your server is OEM, because OSQL is an OEM tool.
These character-set conversions typically are not fully round-trip. That is, if you convert one character from set to another and back, you may not have the same character you started with. This is because the character sets contain different characters, and if a character is missing in the target character set, it will be converted to some other character, for instance the most similar-looking character.
You may think that the idea of a unidirectional conversion is funny, but say that you have a table in which the data is the CP850 character set, and you want it to be in Latin-1. This could be simple way of doing it: use MSSQL::Sqllib(:DEFAULT :directions) my $X = sql_init(undef, "sa", "", "db"); $X->sql_set_conversion("CP850", "iso_1", TO_SERVER_ONLY); @table = $X->sql(<<SQLEND); SELECT * INTO #tbl FROM tbl WHERE 1 = 0 -- Clone the table. SELECT * FROM tbl SQLEND foreach $row (@table) { $X->sql_insert("#tbl", $row); } $X->sql(<<SQLEND); BEGIN TRANSCTION DELETE tbl INSERT tbl SELECT * FROM #tbl COMMIT TRANSACTION SQLENDHere we are converting the data as we send it back. Alternatively we could have said:
Jeg er nok i alle fall bedre tjent med BCP enn med OSQL. Men det ser ut til at osql alltid tolker data med samme codepage som er default for bcp, jeg vet ikke hvilken det er, men det er tydeligvis ikke hverken windows-1252 eller iso-8859-1.
UNICODE nytter ikke, både UTF-8, UTF-16 og ISO-10646-UCS-2 fører til "unexpected end of data" i bcp.
Med ISO-8859-1 data fungerer bcp -c -C ACP som betyr at windows-1252 skal brukes. For god ordens skyld endret jeg så dataene til også å være windows-1252.
Man kan endre "collation" for det enkelte felt i databasen, men så langt jeg kan se, påvirker det bare sorteringsrekkefølge, ikke mappingen mellom byte og tegn?
Tror jeg vet hvorfor UNICODE går feil: MS-produktene leter etter sitattegn og tabulaturer på byte-nivå, dvs FØR de setter multibyte unicode-tegn sammen til ett tegn. For UNICODE lest som byte-streng kan inneholde bytes som representerer et hvilket som helst ASCII-tegn.
Synes godt om
Ny brugerNybegynder
Din løsning...
Tilladte BB-code-tags: [b]fed[/b] [i]kursiv[/i] [u]understreget[/u] Web- og emailadresser omdannes automatisk til links. Der sættes "nofollow" på alle links.