I am building a forum application in Django and I want to make sure that users dont enter certain characters in their forum posts. I need an efficient way to scan their whole post to check for the invalid characters. What I have so far is the following although it does not work correctly and I do not think the idea is very efficient.
def clean_topic_message(self):
topic_message = self.cleaned_data['topic_message']
words = topic_message.split()
if (topic_message == ""):
raise forms.ValidationError(_(u'Please provide a message for your topic'))
***for word in words:
if (re.match(r'[^<>/\{}[]~`]$',topic_message)):
raise forms.ValidationError(_(u'Topic message cannot contain the following: <>/\{}[]~`'))***
return topic_message
Thanks for any help.
For a regex solution, there are two ways to go here:
Here is a script that implements both:
Take your pick.
Note: the original regex erroneously has a right square bracket in the character class which needs to be escaped.
Benchmarks: After seeing gnibbler's interesting solution using
set()
, I was curious to find out which of these methods would actually be fastest, so I decided to measure them. Here are the benchmark data and statements measured and thetimeit
result values:Test data:
Results:
The benchmark tests show that Option 1 is slightly faster than option 2 and both are much faster than the
set().intersection()
method. This is true for strings which both match and don't match.is_valid = not any(k in text for k in '<>/{}[]~`')
If efficiency is a major concern I would re.compile() the re string, since you're going to use the same regex many times.