What Would Seth Godin Do – WordPress plugin patch

UPDATE
This patch can be applied to version 1.5 of WWSG – if You use other version, more recent, You need to change this by hand, using idea from this patch.

I do use WordPress on few sites and I like it. Well, it is not ideal, but it has very strong community, which makes WP perfect tool for blogging.

Community provided plugins are great, and one of them which I use on each site is What Would Seth Godin Do. This plugin shows each new visitor short message. Message invites to subscribe to RSS feed, or whatever else blog author will configure.

New visitors are recognized by cookies and message can be placed on the post beginning or end. Trouble strikes when You put this message ahead post and user does not support cookies. Then every time post will begin with If you’re new here, you may want to subscribe to my RSS feed. Thanks for visiting! in default WWSGD setup. So what, You ask? One of those users is Googlebot, and believe You don’t want him to see this.

Now (short) story time. I have setup 30 Day Trial blog to document my progress with different attempts on self improvement. Of course I have setup WWSGD and this time I was using message box before post. After few days I was checking if my site got indexed by Google and:

What Google see

What Does Google See - Can You Spot a Pattern?

My incentive to grab RSS feed has become main part of excerpt served by Google in search results. Not good. But. It is enough to skip this message for Googlebot and… We have simple patch:

Index: what_would_seth_godin_do.php
===================================================================
--- what_would_seth_godin_do.php        (revision 42)
+++ what_would_seth_godin_do.php        (working copy)
@@ -120,11 +120,16 @@
        }
 }

+//Skip Google bot - do not offer Google opportunity to sign to Your feed ;)
+function skip_this_user_agent() {
+       return 0 != preg_match( "/Googlebot/", $_SERVER['HTTP_USER_AGENT'] );
+}
+
 function wwsgd_message_filter($content = '')
 {
        global $wwsgd_visits, $wwsgd_settings, $wwsgd_messagedisplayed;

-       if ($wwsgd_messagedisplayed || is_feed())
+       if ($wwsgd_messagedisplayed || is_feed() || skip_this_user_agent() )
        {
                return $content;
        }

Of course it could check against more User Agent strings (Ask, Live.com, others?) but who cares :) It works for me:

And what does Google see now?

And what does Google see now?

To patch or not to patch?

Much better. If You do use this plugin I suggest You to patch WWSGD (of course in case You need some help with patching…).

MySQL collate setting in Rails application

I do love MySQLs collate options. There are so many of them. One for server, one for database and one for client connection. There is one missing – One to rule them all ;))

I don’t know if it is FreeBSD specific (no it is not – I have checked only Linux box I got access to, and it is the same), but when You set UTF8 charset in database connection, You are not safe at all. Strange things still are waiting to happen :)

First – when You do use UTF8 You need give database hint how it should compare chars – it is important when somebody will send some non ASCII characters and want to compare against data You already have in database. This is what for collation is for – in most cases what You need is just utf8_general_ci. But MySQL, glorious product of Swedish engineering sets collation for latin1_swedish_ci. Do You smell cookings? :))

Now we enter into Rails world. You were smart and have started with MySQL server with --character-set-server=utf8 --collation-server=utf8_general_ci. You did MySQL dump and checked all tables that have set UTF8 as character set. Seems to be OK?

>>  ActiveRecord::Base.connection.collation
=> "utf8_general_ci"

It seems so :) but now client uses some national char for example in tag (and it seems that user is not Swedish – character is for example one from Polish chars) and then:

Mysql::Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) 
for operation 'like': SELECT * FROM `tags`     
WHERE (name LIKE '. Naj?wi?tszego Serca Pana Jezusa.')  LIMIT 1

Now we can start dig more:

>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cbdba0>
>> r.fetch_row
=> ["collation_connection", "latin1_swedish_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]

Now You have learned that connection.collation returns not connection collation but servers. But WTF? activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:

        def connect
          encoding = @config[:encoding]
          if encoding
            @connection.options(Mysql::SET_CHARSET_NAME, encoding) rescue nil
          end
          @connection.ssl_set(@config[:sslkey], @config[:sslcert], @config[:sslca], @config[:sslcapath], @config[:sslcipher]) if @con
          @connection.real_connect(*@connection_options)
          execute("SET NAMES '#{encoding}'") if encoding

SET NAMES 'utf8' in line 472 should set all the collation variables, but somehow it does not, so as quick fix I just have added to environment.rb:

ActiveRecord::Base.connection.execute "SET collation_connection = 'utf8_general_ci' "

and

>>  ActiveRecord::Base.connection.execute("set names utf8")
=> nil
>>  r= ActiveRecord::Base.connection.execute("show variables like 'colla%'")
=> #<Mysql::Result:0x8cb1468>
>> r.fetch_row
=> ["collation_connection", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_database", "utf8_general_ci"]
>> r.fetch_row
=> ["collation_server", "utf8_general_ci"]
>>

One more note. Collations for UTF8 in MySQL can be:

  • utf8_bin: compare strings by the binary value of each character
  • utf8_general_ci: compare strings using general language rules, case insensitive
  • utf8_general_cs: compare strings using general language case sensitive

In this thread on PHP Builder forum is explanation how MySQL will behave doing comparisons depending on collation setting. Worth to read and remember.

Session store – don’t get trapped

I’ve recently stumbled upon some design flaw in Rails applications. It looks like it is much more popular than I thought…

Session hash can store whole objects. Don’t do that. I’ve recently seen applications storing whole ActiveRecord objects in session. Why it is dumb idea?

First – with new Rails default storage for session are cookies in browser, so You get very low size limit (AFAIR 4 kB). Second (and this the real reason) – if Your schema will change, then all objects kept in session become invalid. The problem is that with new application code, those objects will be like guest from deep past. They will be created but they will miss new attributes – and when Your application will try to use some of them kaboom….

Guests from a past

Guests from past (c)

This is not only related to session hash – it is general problem with object serialization (and storing for long. Try to avoid Marshal, whenever it is not really required.