Avatar billede janegil Nybegynder
04. marts 2003 - 10:58 Der er 4 kommentarer og
1 løsning

Codepage i OSQL kontra Query Analyzer

Jeg har et SQL-skript som gir forskejllig resultat, avhengig av om jeg kjører det med Query Analyzer eller OSQL. Med Query Analyzer blir "frá tjóðbankum" gjengitt korrekt, men OSQL legger den teksten i basen som "frß tj¾­bankum". Hvordan får jeg OSQL til å tolke inputen sin som ISO-8859-1 data ? (Eventuelt UFT-8 eller UTF-16, bare jeg får kontroll over hva den bruker.)

Collation name er SQL_Latin1_General_CP1_CI_AI .

Skript:
    SET DATEFORMAT ymd
        DELETE ROKNSKAPARKOTA
        insert into ROKNSKAPARKOTA (ELK_DAGUR, ROKNSK_KT, RKT_MOT_ROKNSK_KT, ROKNSK_LYSING_TX_60, ROKNSK_LYSING_TX_38)
            values (
                '2003-03-03',
                '101101100',
                '101101100',
                'Renta frá tjóðbankum',
                'Renta frá tjóðbankum'
            )
Avatar billede janegil Nybegynder
04. marts 2003 - 10:59 #1
PS: SQL Server 2000
Avatar billede larildsen Nybegynder
04. marts 2003 - 18:12 #2
Jeg fandt dette på nettet, håber det hjælper

sql_set_conversion
    $X->sql_set_conversion([$client_cs [, $server_cs [, $direction]]]);sql_set_conversion establishes a character conversion between client and server and/or vice versa for the given handle. If you don't provide a handle, the conversion is established for the handle returned by the first call to sql_init.

To start from the back, $direction specify in which direction you want the conversion to apply. There are three possible values:

TO_CLIENT_ONLY

Convert only data that comes back from the server.

TO_SERVER_ONLY

Convert only data that is sent to the server.

TO_SERVER_CLIENT

Convert both when sending and receiving. This is the default.

$client_cs is the character set for the data on the client side, and $server_cs is the character set for the data on the server side. The chosen characters sets do not need to agree with the official character sets of the client or the server.

Specify the character sets as code pages, with or without the leading ``CP''. For $client_cs you can also specify ``OEM'' or ``ANSI'' to use the OEM or ANSI character set of the client. These are read from the key SYSTEM\CurrentControlSet\Control\Nls\CodePage in the HKEY_LOCAL_MACHINE hive. For the server character set, you can specify ``iso_1'' as an alternative to ``CP1252'' to get Latin-1.

The default for $client_cs is the local OEM character set. The default for $server_cs is the character set of the server (SQL7 and earlier) or the default collation for the server (SQL2000 and later). (Use sp_helpsort to find out if you don't know.)

To set up conversion sql_set_conversion looks in the SYSTEM32 directory of the NT installation as pointed to by the environment variable SystemRoot. In this directory it looks for the file SSSSCCCC.CPX and if this fails an attempt is made with CCCCSSSS.CPX . SSSS is the code-page number for the server character set, and CCCC is the code-page number for the client character set. For instance the file for converting to CP850 to Latin-1 (CP1252) is 12520850.CPX. If none of the files are found, execution is aborted.

Notes:

The conversion facility was added because SQL6.5 had insuffcient support in this area. The GUI tools would unconditionally convert between ANSI and OEM when reading and saving from disk. In other situations you could not get the conversion you wanted. Tools in SQL7 improved in this area, not the least BCP which offers the full range of choices just like MSSQL::Sqllib. But OSQL in SQL7 could convert at all. OSQL in SQL2000 can, because you can now specify a data source. But you still can convert to OEM if your data is ASCII and your server is OEM, because OSQL is an OEM tool.

These character-set conversions typically are not fully round-trip. That is, if you convert one character from set to another and back, you may not have the same character you started with. This is because the character sets contain different characters, and if a character is missing in the target character set, it will be converted to some other character, for instance the most similar-looking character.

You may think that the idea of a unidirectional conversion is funny, but say that you have a table in which the data is the CP850 character set, and you want it to be in Latin-1. This could be simple way of doing it:
    use MSSQL::Sqllib(:DEFAULT :directions)    my $X = sql_init(undef, "sa", "", "db");
    $X->sql_set_conversion("CP850", "iso_1", TO_SERVER_ONLY);
    @table = $X->sql(<<SQLEND);
      SELECT * INTO #tbl FROM tbl WHERE 1 = 0  -- Clone the table.
      SELECT * FROM tbl
    SQLEND
    foreach $row (@table) {
      $X->sql_insert("#tbl", $row);
    }
    $X->sql(<<SQLEND);
      BEGIN TRANSCTION
      DELETE tbl
      INSERT tbl SELECT * FROM #tbl
      COMMIT TRANSACTION
    SQLENDHere we are converting the data as we send it back. Alternatively we could have said:

    $X->sql_set_conversion("ANSI", "CP850", TO_CLIENT_ONLY);
Avatar billede janegil Nybegynder
05. marts 2003 - 12:30 #3
Det mest nyttige jeg finner der, ser ut til å være at det finnes et alternativ til OSQL: BCP.

Den er sikkert bedre egnet til mitt formål, så jeg prøver den...
Avatar billede janegil Nybegynder
07. marts 2003 - 11:04 #4
Jeg er nok i alle fall bedre tjent med BCP enn med OSQL. Men det ser ut til at osql alltid tolker data med samme codepage som er default for bcp, jeg vet ikke hvilken det er, men det er tydeligvis ikke hverken windows-1252 eller iso-8859-1.

UNICODE nytter ikke, både UTF-8, UTF-16 og ISO-10646-UCS-2 fører til "unexpected end of data" i bcp.

Med ISO-8859-1 data fungerer
bcp -c -C ACP
som betyr at windows-1252 skal brukes. For god ordens skyld endret jeg så dataene til også å være windows-1252.

Man kan endre "collation" for det enkelte felt i databasen, men så langt jeg kan se, påvirker det bare sorteringsrekkefølge, ikke mappingen mellom byte og tegn?
Avatar billede janegil Nybegynder
07. marts 2003 - 11:46 #5
Tror jeg vet hvorfor UNICODE går feil: MS-produktene leter etter sitattegn og tabulaturer på byte-nivå, dvs FØR de setter multibyte unicode-tegn sammen til ett tegn. For UNICODE lest som byte-streng kan inneholde bytes som representerer et hvilket som helst ASCII-tegn.
Avatar billede Ny bruger Nybegynder

Din løsning...

Tilladte BB-code-tags: [b]fed[/b] [i]kursiv[/i] [u]understreget[/u] Web- og emailadresser omdannes automatisk til links. Der sættes "nofollow" på alle links.

Loading billede Opret Preview
Kategori
Computerworld tilbyder specialiserede kurser i database-management

Log ind eller opret profil

Hov!

For at kunne deltage på Computerworld Eksperten skal du være logget ind.

Det er heldigvis nemt at oprette en bruger: Det tager to minutter og du kan vælge at bruge enten e-mail, Facebook eller Google som login.

Du kan også logge ind via nedenstående tjenester