Google Refine: a Few Sample Transformations and Helpful URLs

Google Refine is a powerful tool to help cleanup data from a variety of sources. I’m gradually becoming more familiar with it and the powerful transformation tools that it offers. Here are:

  • a few sample transformation expressions, and…
    • sample data
    • the result “=” after the transformation is applied

The first example provides labels and explanations, the rest of the examples omit the labels and explanations but follow the same syntax. I hope these are helpful to you!

NOTE: there are probably much better ways to do some of these, but hopefully if you’re a beginner like me these will at least provide you a simple starting point from which you can improve upon… also, make sure to checkout the Google screencasts for Google Refine, they are very helpful . 🙂

Sample data: 1931 ( Model A ) 103.5″
Expression: value[0,5]
=
Result: 1931
Explanation: retrieve the value of the cell, starting at character zero and including five characters


1952-1954 : New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2
value.replace(/([1-9][1-9][1-9][1-9])/,””)

=
– : New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2

1952-1954 : New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2
value.replace(/([1-9][1-9][1-9][1-9])*-*([1-9][1-9][1-9][1-9])/,””)
=
: New Yorker & New Yorker Special: 125-1/2; Custom Imperial: 133-1/2; Crown Imperial: 145-1/2

Custom Imperial: 133-1/2
NOTE: /i applies case insensitive match in the following expression…
value.replace(/([a-z])/i,””)
=
: 133-1/2

Crown Imperial: 145-1/2″
value.replace(/([0-9])/,””)
=
Crown Imperial: -/”

Crown Imperial: /
value.replace(/\//,””)
=
Crown Imperial:
More expressions, without example data:

value.split(“–“)[1]
value.replace(“NA”, cells[“Get to dimensions 6”].value[0,12])
value.replace(“*”,”,”)

Some links that provide additional info and help:

Google Refine: adding a new column that gives the local authority, ward or other details based on a postcode – Notes on spreadsheet formulae and CARhttp://excelnotes.posterous.com/google-refine-adding-a-new-column-that-givesThis is how I used Google Refine to add extra columns to a spreadsheet which added information on constituency, local authority or other geographical information:1. Find a…
first_steps_in_google_refine [Underdone but not half-baked]http://dillingham.me.uk/dokuwiki/first_steps_in_google_refineYou will see each line of the wikitext source is a line in the Google Refine table. There are two sets of columns: the first set is labeled “All”, the second “Column”. The…
Recipes – google-refine – Useful recipes for achieving certain tasks in Google Refine – Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) – Google Project Hostinghttp://code.google.com/p/google-refine/wiki/RecipesHere are some examples of possible types of common string manipulation operations that you might encounter and how they can be achieved with the Google Refine Expression Language…
UnderstandingExpressions – google-refine – Understanding expressions. – Google Refine, a power tool for working with messy data (formerly Freebase Gridworks)http://code.google.com/p/google-refine/wiki/UnderstandingExpressionsWhen you invoke the Transform command on, say, column “friend” and enter an expression, Google Refine will go through each row in the data (matching facets and filters,…
UnderstandingRegularExpressions – google-refine – Understanding regular expressions – Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) – Google Project Hostinghttp://code.google.com/p/google-refine/wiki/UnderstandingRegularExpressionsOnce upon a time in the land of King ArthurThe number of spaces between consecutive words vary. In order to change the text so that there is precisely 1 space between any 2…
GRELOtherFunctions – google-refine – Other functions supported by the Google Refine Expression Language (GREL) – Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) – Google Project Hostinghttp://code.google.com/p/google-refine/wiki/GRELOtherFunctionsGRELOtherFunctions   Other functions supported by the Google Refine Expression Language (GREL) Updated Nov 1, 2010 by dfhu…@google.com
Splitting Columns – google-refine | Google Groupshttp://groups.google.com/group/google-refine/browse_thread/thread/13104292b3cff983?pli=1For the caps thing, if there’s only one in each cell, and it is not at the beginning of the text, then this should pick it out value.match(/.*[^A-Z]([A-Z]{2,}).*/)[0] David -…
ColumnEditing – google-refine – Editing columns. – Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) – Google Project Hostinghttp://code.google.com/p/google-refine/wiki/ColumnEditingFor more elaborate rearrangement of columns, invoke the left-most drop-down menu in front of “All” and pick Edit columns > Reorder columns… This command also lets…

Leave a Reply

Your email address will not be published. Required fields are marked *