It's worth mentioning that what you're asking for is a simplified case of unicode text normalization. Many languages have a function for this in their standard libraries (e.g., Java). One good approach would be to insert your text BigQuery already normalized. If that won't work -- for example, because you need to retain the original text and you're concerned about hitting BigQuery's row size limit -- then you'll need to do normalization on the fly in your queries.

Some databases have implementations of Unicode normalization of various completeness (e.g., PostgreSQL's unaccent method, PrestoDB's normalize method) for use in queries. Unfortunately, BigQuery is not one of them. There is no text normalization function in BigQuery as of this writing. The implementations on this answer are kind of a "roll your own unaccent." When BigQuery releases an official function, everyone should use that instead!

Assuming you need to do the normalization in your query (and Google still hasn't come out with a function for this yet), these are some reasonable options.

Approach 1: Use NORMALIZE

Google now has come out with a NORMALIZE function. (Thanks to @WillianFuks in the comments for flagging!) This is now the obvious choice for text normalization. For example:

SELECT REGEXP_REPLACE(NORMALIZE(text), r"\pM", '') FROM yourtable;

There is a brief explanation of how this works and why the call to REGEXP_REPLACE is needed in the comments.

I have left the additional approaches for reference.

Approach 2: Use REGEXP_REPLACE and REPLACE on Content

I implemented the lowercase-only case of text normalization in legacy SQL using REGEXP_REPLACE . (The analog in Standard SQL is fairly self-evident.) I ran some tests on a text field with average length around 1K in a large table of 28M rows using the query below:

SELECT id, text FROM (SELECT id, CASE WHEN REGEXP_CONTAINS(LOWER(text), r"[àáâäåæçèéêëìíîïòóôöøùúûüÿœ]") THEN REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(LOWER(text), 'œ', 'ce'), 'ÿ', 'y'), 'ç', 'c'), 'æ', 'ae'), r"[ùúûü]", 'u'), r"[òóôöø]", 'o'), r"[ìíîï]", 'i'), r"[èéêë]", 'e'), r"[àáâäå]", 'a') ELSE LOWER(text) END AS text FROM yourtable ORDER BY id LIMIT 10);

versus:

WITH lookups AS ( SELECT 'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,ñ' AS accents, 'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,n' AS latins ), pairs AS ( SELECT accent, latin FROM lookups, UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1, UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2 WHERE p1 = p2 ) SELECT foo FROM ( SELECT id, (SELECT STRING_AGG(IFNULL(latin, char), '') AS foo FROM UNNEST(SPLIT(LOWER(text), '')) char LEFT JOIN pairs ON char=accent) AS foo FROM yourtable ORDER BY id LIMIT 10);

On average, the REGEXP_REPLACE implementation ran in about 2.9s; the array-based implementation ran in about 12.5s.

Approach 3: Use REGEXP_REPLACE on Search Pattern

What brought me to this question my was a search use case. For this use case, I can either normalize my corpus text so that it looks more like my query, or I can "denormalize" my query so that it looks more like my text. The above describes an implementation of the first approach. This describes an implementation of the second.

When searching for a single word, one can use the REGEXP_MATCH match function and merely update the query using the following patterns:

a -> [aàáaâäãåā] e -> [eèéêëēėę] i -> [iîïíīįì] o -> [oôöòóøōõ] u -> [uûüùúū] y -> [yÿ] s -> [sßśš] l -> [lł] z -> [zžźż] c -> [cçćč] n -> [nñń] æ -> (?:æ|ae) œ -> (?:œ|ce)

So the query "hello" would look like this, as a regexp:

r"h[eèéêëēėę][lł][lł][oôöòóøōõ]"

Transforming the word into this regular expression should be fairly straightforward in any language. This isn't a solution to the posted question -- "How do I remove accents in BigQuery?" -- but is rather a solution to a related use case, which might have brought people (like me!) to this page.