How to handle nils
in CSV data
Recently I’ve had to work a lot with CSV’s and I’ve learned the hard way that the data within these aren’t always what you expect. Why would there be no number associated to a name or part? Why no name for a number or part and so on. Below is the strategy I’ve adopted to handle this case.
Fetching the data
We want the headers to return as symbols because I don’t trust the position of the rows in the future. On the second like we see the headers are indeed an array of symbols.
parsed_rows = CSV.parse(File.read("data.csv"), headers: true, header_converters: :symbol)
parsed_rows.headers => [:number, :name, :part]
Filter out nils
parsed_rows.reject do |row|
if row.to_h.values.any?(&:nil?)
puts "number: #{row[:number]} | name: #{row[:name]} | part: #{row[:part]}"
true
end
end
=> [#<CSV::Row number:"1" name:"foo" part:"bar">, #<CSV::Row number:"4" name:"zub" part:"fab">]
Here we use reject to only return rows that don’t have nil
values.
We’re going through each row and transforming the data to a Hash. We’re then using any? to check if any of the values are nil
in the Hash. If they are, we log the data with the puts statement. We’re doing this because someone will ask “Why didn’t this part get updated?” and we want to tell them it’s because the data in that row was nil
for some reason.