MySQL, collations and Rails – or server, database, connection – that is not all

Last few weeks were somewhat crazy time. But I’m seeing finish line not so far away and I hope to find more time for blogging in upcoming weeks. At least I have few things to share, which could be interesting for other Rails developers.

So let’s start with MySQL and it’s collations. I have wrote about them few months ago. Few days ago, I had finally installed Linux on my notebook (last two months I was running Linux from USB drive). And it seems that there were some minor changes in my environment. I was working on CARtoteka, and when I was running migrations they were stopped in half with error like :

Mysql::Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and
(utf8_general_ci,COERCIBLE) for operation '=': 
SELECT count(DISTINCT  companies`.id) AS count_all FROM `companies`  
LEFT OUTER JOIN `addresses` ON  addresses.company_id = companies.id 
WHERE (city = 'Pozna?' and moderated)

What the heck? I have doublechecked I have set connection collation and other – utf_general_ci as I could expect.

After some digging I have found what was a cause.

Continue reading

MySQL collate setting in Rails application

I do love MySQLs collate options. There are so many of them. One for server, one for database and one for client connection. There is one missing – One to rule them all ;))

I don’t know if it is FreeBSD specific (no it is not – I have checked only Linux box I got access to, and it is the same), but when You set UTF8 charset in database connection, You are not safe at all. Strange things still are waiting to happen :)

First – when You do use UTF8 You need give database hint how it should compare chars – it is important when somebody will send some non ASCII characters and want to compare against data You already have in database. This is what for collation is for – in most cases what You need is just utf8_general_ci. But MySQL, glorious product of Swedish engineering sets collation for latin1_swedish_ci. Do You smell cookings? :))

Now we enter into Rails world. You were smart and have started with MySQL server with --character-set-server=utf8 --collation-server=utf8_general_ci. You did MySQL dump and checked all tables that have set UTF8 as character set. Seems to be OK?

>>  ActiveRecord::Base.connection.collation
=> "utf8_general_ci"

It seems so :) but now client uses some national char for example in tag (and it seems that user is not Swedish – character is for example one from Polish chars) and then:

Mysql::Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) 
for operation 'like': SELECT * FROM `tags`     
WHERE (name LIKE '. Naj?wi?tszego Serca Pana Jezusa.')  LIMIT 1

Now we can start dig more:

>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cbdba0>
>> r.fetch_row
=> ["collation_connection", "latin1_swedish_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]

Now You have learned that connection.collation returns not connection collation but servers. But WTF? activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:

        def connect
          encoding = @config[:encoding]
          if encoding
            @connection.options(Mysql::SET_CHARSET_NAME, encoding) rescue nil
          end
          @connection.ssl_set(@config[:sslkey], @config[:sslcert], @config[:sslca], @config[:sslcapath], @config[:sslcipher]) if @con
          @connection.real_connect(*@connection_options)
          execute("SET NAMES '#{encoding}'") if encoding

SET NAMES 'utf8' in line 472 should set all the collation variables, but somehow it does not, so as quick fix I just have added to environment.rb:

ActiveRecord::Base.connection.execute "SET collation_connection = 'utf8_general_ci' "

and

>>  ActiveRecord::Base.connection.execute("set names utf8")
=> nil
>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cb1468>
>> r.fetch_row
=> ["collation_connection", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]
>>

One more note. Collations for UTF8 in MySQL can be:

  • utf8_bin: compare strings by the binary value of each character
  • utf8_general_ci: compare strings using general language rules, case insensitive
  • utf8_general_cs: compare strings using general language case sensitive

In this thread on PHP Builder forum is explanation how MySQL will behave doing comparisons depending on collation setting. Worth to read and remember.