MySQL collate setting in Rails application

I do love MySQLs collate options. There are so many of them. One for server, one for database and one for client connection. There is one missing – One to rule them all ;))

I don’t know if it is FreeBSD specific (no it is not – I have checked only Linux box I got access to, and it is the same), but when You set UTF8 charset in database connection, You are not safe at all. Strange things still are waiting to happen :)

First – when You do use UTF8 You need give database hint how it should compare chars – it is important when somebody will send some non ASCII characters and want to compare against data You already have in database. This is what for collation is for – in most cases what You need is just utf8_general_ci. But MySQL, glorious product of Swedish engineering sets collation for latin1_swedish_ci. Do You smell cookings? :))

Now we enter into Rails world. You were smart and have started with MySQL server with --character-set-server=utf8 --collation-server=utf8_general_ci. You did MySQL dump and checked all tables that have set UTF8 as character set. Seems to be OK?

>>  ActiveRecord::Base.connection.collation
=> "utf8_general_ci"

It seems so :) but now client uses some national char for example in tag (and it seems that user is not Swedish – character is for example one from Polish chars) and then:

Mysql::Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) 
for operation 'like': SELECT * FROM `tags`     
WHERE (name LIKE '. Naj?wi?tszego Serca Pana Jezusa.')  LIMIT 1

Now we can start dig more:

>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cbdba0>
>> r.fetch_row
=> ["collation_connection", "latin1_swedish_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]

Now You have learned that connection.collation returns not connection collation but servers. But WTF? activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:

        def connect
          encoding = @config[:encoding]
          if encoding
            @connection.options(Mysql::SET_CHARSET_NAME, encoding) rescue nil
          end
          @connection.ssl_set(@config[:sslkey], @config[:sslcert], @config[:sslca], @config[:sslcapath], @config[:sslcipher]) if @con
          @connection.real_connect(*@connection_options)
          execute("SET NAMES '#{encoding}'") if encoding

SET NAMES 'utf8' in line 472 should set all the collation variables, but somehow it does not, so as quick fix I just have added to environment.rb:

ActiveRecord::Base.connection.execute "SET collation_connection = 'utf8_general_ci' "

and

>>  ActiveRecord::Base.connection.execute("set names utf8")
=> nil
>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cb1468>
>> r.fetch_row
=> ["collation_connection", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]
>>

One more note. Collations for UTF8 in MySQL can be:

  • utf8_bin: compare strings by the binary value of each character
  • utf8_general_ci: compare strings using general language rules, case insensitive
  • utf8_general_cs: compare strings using general language case sensitive

In this thread on PHP Builder forum is explanation how MySQL will behave doing comparisons depending on collation setting. Worth to read and remember.

Join the Conversation

10 Comments

  1. Great thanks for this post. Just tripped over on this one and your post was great help to understand what went wrong.

    yogi

  2. Thank you for the very helpful post. I have an English/Persian site and your post really helped me get a handle on what rails was doing with mysql. Thanks! Charles

  3. WOW! Thanks for this. I have been trying to figure out how to straighten out the collation on our MySQL databases for a long time.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.