-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for quoted string backslash escaping #1177
Conversation
Pull Request Test Coverage Report for Build 8678508768Details
💛 - Coveralls |
1b9ff2a
to
7273ded
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR @iffyio -- I am sorry it took me so long to review it properly. I am a little concerned about the difference between how this PR works and how it works for MySqlDialect
Is there any way we can unify the behavior?
/// ```sql | ||
/// SELECT '\'; | ||
/// ``` | ||
fn supports_string_literal_backslash_escape(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ and 😍 for the doc comments
src/tokenizer.rs
Outdated
@@ -1235,6 +1245,11 @@ impl<'a> Tokenizer<'a> { | |||
'\\' => { | |||
// consume | |||
chars.next(); | |||
|
|||
if allow_escape { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior seems different than how MySqlDialect behaves
Specifically, with MySQL the escape characters are transformed into their literal values (e.g.
'a"b'would be parsed to
a"bwhile this PR would parse it to
a"b`
What do you think about making this consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I'll take a closer look at this to keep the same behavior for Mysql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Looking into updating this and turns out I didn't follow the comment entirely and was unable to infer the inconsistency for mysql - could you clarify the problem once more? It seems Github/markdown unfortunately reformatted the expected and desired output in your example so that they became identical 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in I think the tokenizer for mysql (assuming self.unescape
is true) actually does the unescaping -- so a string with an escape character ("\x20"
) would actually be tokenized as a space (" "
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! I've updated this have the new logic respect the unescape similar to mysql - let me know if that's what you had in mind!
7273ded
to
a9fdd33
Compare
This adds support for parsing string literals on dialects that treat backslash character as an escape character. As an example, the following previously failed to parse by dialects like BigQuery where the syntax is valid. ```sql SELECT 'a\'b'; ``` Moves the SQL `like` and `similar_to` tests from individual dialects to common since the tests were identical.
a9fdd33
to
0458e4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @iffyio 🙏
This adds support for parsing string literals on
dialects that treat backslash character as an escape
character. As an example, the following previously failed
to parse by dialects like BigQuery where the syntax is valid.
Moves the SQL
like
andsimilar_to
tests from individualdialects to common since the tests were identical.