Using Unicode in SQL Server

Unicode is the standard used in the computing industry for encoding and representing any text in the most written languages ().

SQL Server supports Unicode, thus allowing the easy storage and manipulation of data in the most languages.

As I wanted to test this functionality I performed a simple experiment with the Cyrillic alphabet.

To this end I created two tables in SQL Server 2005, the one in non-Unicode and the other one in Unicode:

CREATE TABLE tStandard(
[name] varchar(100)
)
GO

CREATE TABLE tUnicode(
[name] nvarchar(100)
)
GO

As you can see, in the tStandard table the “name” field’s data type is varchar and in the tUnicode table the “name” field’s data type is nvarchar.

The difference between the varchar and nvarchar data types is that the former uses 1 byte for representing characters where the later uses 2 bytes thus supporting Unicode.

Then I created a text file in Unicode containing three records with my name in Russian:














The next step was to import the data into the two tables using the SQL Server Import and Export Wizard. As you can see from the screenshot below, when I tried to import the data into the tStandard table the process failed because of the fact that the data did not match the target code page (GREEK_CI_AS):



But what about if we want to represent any language without relying on the target database’s code page/collation?

The answer is easy; Use nvarchar(s)

Let’s see what happened with the rest of my experiment.

Here’s the result of trying to import the data into the tUnicode table:

































Yep, the data was imported successfully!
And here’s the SELECT * query’s results when executed against the two tables:




















Conclusion
The conclusion out of this experiment is that by using nvarchar, SQL Server can easily support the storage and representation of data in most written languages.

The only thing you need to have in mind, is that when using Unicode data types such as the nvarchar, the required storage will be doubled in comparison with using non-Unicode data types (i.e. varchar).

Labels: , , ,