Pythonic String Testing
Solution 1:
You can start by simplifying content_test()
:
defcontent_test(term):
returnany(c.isalpha() for c in term)
In fact, that's simple enough that you don't really need a separate function for it anymore.
What I'd do in this case is write a generator that yields only valid terms from the file. Then just convert that to a list using the list()
constructor. This way you can read just a line at a time, which will save you a good bit of memory if the files are large.
defread_valid_terms(filename):
withopen(filename) as f:
for line in f:
for term in line.split():
ifany(c.isalpha() for c in term):
yield term
terms = list(read_valid_terms("terms.txt"))
Or if you are just going to iterate over the terms anyway, and only once, then just do that directly rather than making a list:
for term in read_valid_terms("terms.txt"):
print term,
print
Solution 2:
In Python, string objects already contain a method that does that for you:
>>> "abc".isalpha()
True>>> "abc22".isalpha()
False
Solution 3:
While you could use a regular expression, a pythonic way would be to use any
:
import string
defcontent_test(term):
returnany((c in string.ascii_lowercase) for c in term)
If you also want to allow upper-case and locale-dependent characters, you can use str.isalpha
.
A couple of additional notes:
FileRead
should inherit fromobject
, to make sure it's a new-style class.- Instead of writing
if content_test(term) is False:
, you can simply writeif not content_test(term):
. clean
can be written a lot, ahem, cleaner, by usingfilter
:
defclean(self):
self.terms = filter(content_test, self.terms)
- You're not closing the file
f
, and may therefore leak the handle. Use thewith
statement to automatically close it, like this:
withopen(filename, 'r') as f:
content = f.read()
self.terms = content.split()
Solution 4:
Using regular expressions:
import re
# Match any number of non-whitespace characters, with an alpha char in it.
terms = re.findall('\S*[a-zA-Z]\S*', content)
Post a Comment for "Pythonic String Testing"