Recently I took a Ruby class, taught by Sarah Allen and one of the most fun things I learned was profiling. Performance testing is such a crucial part of software development. Prior to Sarah’s class, the most common way to store and retrieve data is via an array. For example in a dictionary example, I would normally do this:
To add a word to a dictionary
def add(word)
@words « word
endTo find a word in a dictionary
def find(prefix)
@words.find_all {|w| w =~ /^#{prefix}/}
end
See dictionary_array.rb in my github account for more details.
In class, we were introduced a new data storing structure, called Trie, inspired by Tyler McMullen’s LA RubyConf 2010 talk: http://www.scribd.com/doc/27149829/Alternative-Data-Structures-in-Ruby It’s like storing each character in a tree and creating branches of a tree based on the word inserted into the dictionary.
To add a word to a dictionary
def add(word,subtree=@tree)
if word.size == 0
subtree[:terminal] = true
else
first_char = word[0]
rest = word[1..-1]
subtree[first_char] ||= {}
add(rest, subtree[first_char])
end
end
To find a word in a dictionary
def find(prefix)
subtree = walk(prefix)
return [] unless subtree
return words(subtree, prefix)
end
See trie_dictionary.rb in my github account for more details.
So which performs better, an ARRAY or TRIE?
We used a profiling/performance testing tool called Ruby-Prof to find out! These were the steps taken to perform the test:
1) Install ruby-prof (http://ruby-prof.rubyforge.org/)
sudo gem install ruby-prof
2) Create a script to do the following:
a) Open and read a large data set, e.g. more than 10,000 rows of data. In my experiment, I used a file with lots of products.
http://github.com/aihui/ruby_prof_test/blob/master/products.csv
b) Insert the large data set (products.csv) into the array or trie. Add RubyProf.start to this section of code so that we can monitor the performance.
#read file and put words into dictionary
RubyProf.start
file.each do |data|
d.add(data)
end
result = RubyProf.stop
c) Find products that begin with “fr” in the array or trie. Add RubyProf.start to this section of code so that we can monitor the performance.
RubyProf.start
found = d.find(search_text)
foundresult = RubyProf.stop
d) Write the results of each test into their respective files, e.g. results for adding words and results for finding words.
A dictionary_test.rb file was create for step(2). I switched the “require dictionary_array” and “require trie_dictionary” accordingly, depending on what test I’m running.
3) Run the ruby file and you’ll see the results file being produced too.
ruby dictionary_test.rb
Results
Based on the results below, it shows adding to an array is definitely faster. However, searching through a trie yields way better performance! It took 0.069465 seconds to search through an array while it ONLY took 0.000049 seconds to search through a trie!



