[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Informal & Untested Suggestions for Possible Transformations #75

Open
kaustubhdhole opened this issue Jun 26, 2021 · 6 comments
Open

Informal & Untested Suggestions for Possible Transformations #75

kaustubhdhole opened this issue Jun 26, 2021 · 6 comments

Comments

@kaustubhdhole
Copy link
Collaborator
kaustubhdhole commented Jun 26, 2021

Here are some random ideas informally put which could be used for perturbations & augmentations. @vgtomahawk is making a formal list in this branch.

Meanwhile here is an informal list for the benefit of the participants.

  1. Interchange positions of SRL AM arguments for non-overlapping AM arguments:

    • Alex left for Delhi with his wife at 5 pm. --> Alex left for Delhi at 5 pm with his wife.
    • "at 5 pm" (AM-TMP) and "with his wife" (AM-COM) can be exchanged: This is safe to do only with non-core arguments and non-overlapping arguments. Check what SRL is here.
  2. The ButterFingersPertubation could be implemented for keyboard types other than English - like Devanagiri (Hindi, Marathi, Nepail), Shahmukhi (Urdu, Persian), South Indian languages (Tamil, Telugu, Kannada, Malayalam) or Chinese, etc.

  3. Style transfer approaches could be interesting to look at - Changing formal to informal and vice versa. Check this model.

  • What the heck is going on? --> What is going on?
  • What you upto? --> What are you doing?
  1. Word Order Changes: Active to Passive & vice versa, Topicalisation, Extraposition, Wh-fronting, (& vice versa) & other used in constituency tests.
    Scrambling (for German, Turkic languages)
    John went to the store to buy bread. --> To buy bread, John went to the store.

The above are only related to SentenceOperation. There are other transformation types too which could be looked at.

This was referenced Jun 27, 2021
@kaustubhdhole
Copy link
Collaborator Author
kaustubhdhole commented Jun 28, 2021

Adversarial SQUAD adds wrong but similar facts at the end of the context in a QuestionAnswer setting which does not affect the QA pair.

@kaustubhdhole
Copy link
Collaborator Author

These two surveys provide a great overview of previous approaches - This is a great place to look for ideas:
https://github.com/AgaMiko/data-augmentation-review
https://arxiv.org/pdf/2105.03075.pdf

@kaustubhdhole
Copy link
Collaborator Author

Another excellent set of paraphrases can be checked here: http://cognet.mit.edu/pdfviewer/journal/coli_a_00166

@vgtomahawk
Copy link
Collaborator
vgtomahawk commented Jul 21, 2021

Another excellent set of paraphrases can be checked here: http://cognet.mit.edu/pdfviewer/journal/coli_a_00166

In particular from the lists in this paper, "Converse Substitution", "Manipulator-Device Substitution" and "Metaphor Substitution" are three which I have seldom seen being implemented anywhere properly in code..

@kaustubhdhole
Copy link
Collaborator Author

There is interesting work on gapping worth looking at: https://arxiv.org/pdf/1804.06922.pdf
Paul likes coffee and Mary tea. (gapped sentence)
Paul likes coffee and Mary likes tea. (ungapped sentence)
It would be interesting for building rules to convert to and fro between the above two forms.

@vgtomahawk
Copy link
Collaborator

This semi-syntactic paraphrasing algorithm by Tanya Goyal et al, based on reordering source word position [a part of the stream of work following up SCPNs a.k.a Syntactically Controlled Paraphrase Networks (Wieting et al) ] is a really interesting augmentation, particularly due to its reduced sensitivity to the constituency parses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants