Skip to content Skip to sidebar Skip to footer

Find N Largest Lines From A File: How Would You Make This Better?

I was recently rejected from a potential employer after submitting this code. They suggested I wasn't technically capable enough. I'm wondering if someone could shed light on to ho

Solution 1:

from heapq import nlargest

deflongest_lines(n, filename):
    withopen(filename) asinput:
        return nlargest(n, input, key=len)

Alright, addressing the comments below:

deflongest_lines(n, filename):
    heap = []
    withopen(filename) asinput:
        for ln in filename:
            push(heap, ln)
            iflen(heap) > n:
                pop(heap)
    return heap

where push and pop are the good old min-heap insert and delete-min algorithms that can be found in any textbook (and that I never get right in one go, so I'm not posting them now), comparing lines by their length. This runs in O(N×lg(n)) time where N is the number of lines in the file, consuming O(n) temporary space.

Note that the resulting list is not sorted by length, but adding that can be done by popping the elements until the heap is empty and reversing the result of that.

Solution 2:

I would use a heap, but a min-heap, not a max-heap, which may seem counterintuitive.

Create an empty heap.
For each line, 
  if there are less than N lines in the heap, add the current line;
  otherwise,
    if the current line is longer than the minimum element in the heap,
      remove the minimum element from the heap, and
      add the current line to the heap.
Return the contents of the heap.

Solution 3:

Does this boil down to sorting? If you have a list that can hold N lines, you could keep a couple of ints that hold the length of the shortest line so far found and its index in the list. If any line is read that is longer than this line, you only need to iterate the list once to find the new shortest line and also to replace the old shortest line.

Post a Comment for "Find N Largest Lines From A File: How Would You Make This Better?"