Google Refine: a Few Sample Transformations and Helpful URLs
Google Refine is a powerful tool to help cleanup data from a variety of sources. I’m gradually becoming more familiar with it and the powerful transformation tools that it offers. Here are:
- a few sample transformation expressions, and…
- sample data
- the result “=” after the transformation is applied
The first example provides labels and explanations, the rest of the examples omit the labels and explanations but follow the same syntax. I hope these are helpful to you!
NOTE: there are probably much better ways to do some of these, but hopefully if you’re a beginner like me these will at least provide you a simple starting point from which you can improve upon… also, make sure to checkout the Google screencasts for Google Refine, they are very helpful . 🙂
Sample data: 1931 ( Model A ) 103.5″
Expression: value[0,5]
=
Result: 1931
Explanation: retrieve the value of the cell, starting at character zero and including five characters
1952-1954 : New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2
value.replace(/([1-9][1-9][1-9][1-9])/,””)
=
– : New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2
1952-1954 : New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2
value.replace(/([1-9][1-9][1-9][1-9])*-*([1-9][1-9][1-9][1-9])/,””)
=
: New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2
Custom Imperial: 133-1/2
NOTE: /i applies case insensitive match in the following expression…
value.replace(/([a-z])/i,””)
=
: 133-1/2
Crown Imperial: 145-1/2″
value.replace(/([0-9])/,””)
=
Crown Imperial: -/”
Crown Imperial: /
value.replace(/\//,””)
=
Crown Imperial:
More expressions, without example data:
value.split(“–“)[1]
value.replace(“NA”, cells[“Get to dimensions 6”].value[0,12])
value.replace(“*”,”,”)