Evaluating and Ameliorating Data Set Quality for Low-Resource Natural Language Processing