Find N Largest Lines From A File: How Would You Make This Better?
Solution 1:
from heapq import nlargest
deflongest_lines(n, filename):
withopen(filename) asinput:
return nlargest(n, input, key=len)
Alright, addressing the comments below:
deflongest_lines(n, filename):
heap = []
withopen(filename) asinput:
for ln in filename:
push(heap, ln)
iflen(heap) > n:
pop(heap)
return heap
where push
and pop
are the good old min-heap insert and delete-min algorithms that can be found in any textbook (and that I never get right in one go, so I'm not posting them now), comparing lines by their length. This runs in O(N×lg(n)) time where N is the number of lines in the file, consuming O(n) temporary space.
Note that the resulting list is not sorted by length, but adding that can be done by popping the elements until the heap is empty and reversing the result of that.
Solution 2:
I would use a heap, but a min-heap, not a max-heap, which may seem counterintuitive.
Create an empty heap. For each line, if there are less than N lines in the heap, add the current line; otherwise, if the current line is longer than the minimum element in the heap, remove the minimum element from the heap, and add the current line to the heap. Return the contents of the heap.
Solution 3:
Does this boil down to sorting? If you have a list that can hold N lines, you could keep a couple of ints that hold the length of the shortest line so far found and its index in the list. If any line is read that is longer than this line, you only need to iterate the list once to find the new shortest line and also to replace the old shortest line.
Post a Comment for "Find N Largest Lines From A File: How Would You Make This Better?"