becos life is about pushing the envelope playfully



Ruby Array vs Trie - Which Performed Better?

Recently I took a Ruby class, taught by Sarah Allen and one of the most fun things I learned was profiling. Performance testing is such a crucial part of software development. Prior to Sarah’s class, the most common way to store and retrieve data is via an array. For example in a dictionary example, I would normally do this:

To add a word to a dictionary
def add(word)
  @words « word
end

To find a word in a dictionary
def find(prefix)
 @words.find_all {|w| w =~ /^#{prefix}/}
end 

See dictionary_array.rb in my github account for more details.



In class, we were introduced a new data storing structure, called Trie, inspired by Tyler McMullen’s LA RubyConf 2010 talk: http://www.scribd.com/doc/27149829/Alternative-Data-Structures-in-Ruby It’s like storing each character in a tree and creating branches of a tree based on the word inserted into the dictionary.

To add a word to a dictionary
def add(word,subtree=@tree)
    if word.size == 0
      subtree[:terminal] = true
    else
      first_char = word[0]
      rest = word[1..-1]
      subtree[first_char] ||= {}
      add(rest, subtree[first_char])
    end
  end

To find a word in a dictionary
def find(prefix)
    subtree = walk(prefix)
    return [] unless subtree
    return words(subtree, prefix)
  end

See trie_dictionary.rb in my github account for more details.



So which performs better, an ARRAY or TRIE?

We used a profiling/performance testing tool called Ruby-Prof to find out! These were the steps taken to perform the test:

1) Install ruby-prof (http://ruby-prof.rubyforge.org/)

sudo gem install ruby-prof

2) Create a script to do the following:
a) Open and read a large data set, e.g. more than 10,000 rows of data. In my experiment, I used a file with lots of products.
http://github.com/aihui/ruby_prof_test/blob/master/products.csv

b) Insert the large data set (products.csv) into the array or trie. Add RubyProf.start to this section of code so that we can monitor the performance.

#read file and put words into dictionary
RubyProf.start
  file.each do |data|
   d.add(data)
  end
result = RubyProf.stop

c) Find products that begin with “fr” in the array or trie. Add RubyProf.start to this section of code so that we can monitor the performance.

RubyProf.start
  found = d.find(search_text)
foundresult = RubyProf.stop 

d) Write the results of each test into their respective files, e.g. results for adding words and results for finding words.

A dictionary_test.rb file was create for step(2). I switched the “require dictionary_array” and “require trie_dictionary” accordingly, depending on what test I’m running.

3) Run the ruby file and you’ll see the results file being produced too.
ruby dictionary_test.rb



Results
Based on the results below, it shows adding to an array is definitely faster. However, searching through a trie yields way better performance! It took 0.069465 seconds to search through an array while it ONLY took 0.000049 seconds to search through a trie!

4:17 pm, by aihui
permalink
tagged: rails, tech, ruby,


   Comments