Will paginate for rescue

Will paginate is great plugin, used almost by every Rails application I have seen. But I’ve seen this plugin used to generate handful navigation among huge datasets. Well it was written with this in mind, but today I have found another use.

How often Your application does need export huge dataset as CSV for example? Quite often. First approach could be:

@l = Model.find :all
    str = ""
    CSV::Writer.generate(str) do |csv|
      @l.each {|l|
        csv << [ l.fields ]
      }    
    end
    send_data  str, :type => 'text/csv', :filename => 'model.csv'

This works, but… If You are operating with some memory limit (like on most shared hosting setups or on small instance of VPS) this could lead to problems. One of applications I have seen with this approach and exporting 35k objects (each with one has_one association loaded with :include) just after restart and this export done have used 215 MB (as reported by passenger-memory-stats). On VPS with 256 MB it will result in swaping out this app – and next requests will be handled with latency due to swap. And process won’t give back this memory until passenger kill this instance in normal life cycle.

Will_paginate for rescue!


Instead of using find :all use paginate and go through all data in smaller chunks, like that:

    step = 1000
    @l = Model.paginate :page => 1, :per_page => step
    str = ""
    CSV::Writer.generate(str) do |csv|
      for i in 1..@l.page_count
        @mod = Model.paginate :page => i, :per_page => step
        @mod.each {|l|
          csv << [ l.fields]
        }
      end
    end
    send_data  str, :type => 'text/csv', :filename => 'model.csv'

And results? On VPS with 360 MB RAM this approach have rendered Rails into 129 MB RAM instead of 215 (the same application, the same dataset and export). Time of execution? Find :all approach 23 seconds, will_paginate 22 seconds. Apparently, when You are hitting memory limit (and swaping out some part of Rails process memory) benefits of being smaller are bigger than overhead with splitting this into more operations. With even bigger data set (or with smaller memory limit) gains on will_paginate approach would be even better.

Do You have other ideas how to create such export in memory efficient way?

Join the Conversation

3 Comments

  1. I guess I had something else on my mind: Paginator was primarily advertised as a tool for large batch jobs and a paginating solution, while will_paginate is rarely used for batch processing – in that way your example is quite unique (for me at least).

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.