Will paginate for rescue

Will paginate is great plugin, used almost by every Rails application I have seen. But I’ve seen this plugin used to generate handful navigation among huge datasets. Well it was written with this in mind, but today I have found another use.

How often Your application does need export huge dataset as CSV for example? Quite often. First approach could be:

@l = Model.find :all
    str = ""
    CSV::Writer.generate(str) do |csv|
      @l.each {|l|
        csv << [ l.fields ]
      }    
    end
    send_data  str, :type => 'text/csv', :filename => 'model.csv'

This works, but… If You are operating with some memory limit (like on most shared hosting setups or on small instance of VPS) this could lead to problems. One of applications I have seen with this approach and exporting 35k objects (each with one has_one association loaded with :include) just after restart and this export done have used 215 MB (as reported by passenger-memory-stats). On VPS with 256 MB it will result in swaping out this app – and next requests will be handled with latency due to swap. And process won’t give back this memory until passenger kill this instance in normal life cycle.

Will_paginate for rescue!

Continue reading

MySQL collate setting in Rails application

I do love MySQLs collate options. There are so many of them. One for server, one for database and one for client connection. There is one missing – One to rule them all ;))

I don’t know if it is FreeBSD specific (no it is not – I have checked only Linux box I got access to, and it is the same), but when You set UTF8 charset in database connection, You are not safe at all. Strange things still are waiting to happen :)

First – when You do use UTF8 You need give database hint how it should compare chars – it is important when somebody will send some non ASCII characters and want to compare against data You already have in database. This is what for collation is for – in most cases what You need is just utf8_general_ci. But MySQL, glorious product of Swedish engineering sets collation for latin1_swedish_ci. Do You smell cookings? :))

Now we enter into Rails world. You were smart and have started with MySQL server with --character-set-server=utf8 --collation-server=utf8_general_ci. You did MySQL dump and checked all tables that have set UTF8 as character set. Seems to be OK?

>>  ActiveRecord::Base.connection.collation
=> "utf8_general_ci"

It seems so :) but now client uses some national char for example in tag (and it seems that user is not Swedish – character is for example one from Polish chars) and then:

Mysql::Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) 
for operation 'like': SELECT * FROM `tags`     
WHERE (name LIKE '. Naj?wi?tszego Serca Pana Jezusa.')  LIMIT 1

Now we can start dig more:

>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cbdba0>
>> r.fetch_row
=> ["collation_connection", "latin1_swedish_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]

Now You have learned that connection.collation returns not connection collation but servers. But WTF? activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:

        def connect
          encoding = @config[:encoding]
          if encoding
            @connection.options(Mysql::SET_CHARSET_NAME, encoding) rescue nil
          end
          @connection.ssl_set(@config[:sslkey], @config[:sslcert], @config[:sslca], @config[:sslcapath], @config[:sslcipher]) if @con
          @connection.real_connect(*@connection_options)
          execute("SET NAMES '#{encoding}'") if encoding

SET NAMES 'utf8' in line 472 should set all the collation variables, but somehow it does not, so as quick fix I just have added to environment.rb:

ActiveRecord::Base.connection.execute "SET collation_connection = 'utf8_general_ci' "

and

>>  ActiveRecord::Base.connection.execute("set names utf8")
=> nil
>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cb1468>
>> r.fetch_row
=> ["collation_connection", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]
>>

One more note. Collations for UTF8 in MySQL can be:

  • utf8_bin: compare strings by the binary value of each character
  • utf8_general_ci: compare strings using general language rules, case insensitive
  • utf8_general_cs: compare strings using general language case sensitive

In this thread on PHP Builder forum is explanation how MySQL will behave doing comparisons depending on collation setting. Worth to read and remember.