Skip to content Skip to sidebar Skip to footer

Get Element With A Randomized Class Name

It looks like the for on Instagram's web page is changing every day. Right now it is FFVAD and tomorrow it will be something else. For example (I

Solution 1:

You're currently searching for the element by a hardcoded class name.

If the class name is randomized, you cannot hardcode it any longer. You have to either:

  • Search the element by some other characteristics (e.g. element hierarchy, some other attributes, etc; XPath can do that)

    In [10]: driver.find_elements_by_xpath('//article//img')
    Out[10]:
    [<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="1ab4eeb4-10c4-4da4-996c-ee6744445dcc", element="55c48964-8cd0-4472-b35b-214a5a9bfbf7")>,
     <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="1ab4eeb4-10c4-4da4-996c-ee6744445dcc", element="b7f7c8a4-e343-49ca-b416-49f72e67ae07")>,
     <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="1ab4eeb4-10c4-4da4-996c-ee6744445dcc", element="728f6148-6a03-4c9a-9933-36859d65eb51")>]
    
    • You can also search by the element's visual characteristics: size, visibility, position. This cannot be done solely by XPath though, you'll have to get all <img> tags and inspect each one with JS by hand. (See an example below because it's long.)
  • Learn this class name somehow from other page logic (it must be present somewhere else if the page's logic itself can find and use it, and that logic must be found by something else, etc etc)

    In this case, the class name is a part of a local variable in the renderImage function, so it's only salvageable via DOM by exploring its AST. The function itself is buried somewhere inside webpack machinery (it seems to pack all resources into a few global objects with one-letter names). Alternatively, you can read all included JS files as raw data and look for the definition of renderImage in them. So, in this case, it's disproportionally hard, though theoretically possible still.


Example of getting elements by visual characteristics

On any page whatsoever, this would find 3 images of the same size, located side by side (this is the way they are at https://www.instagram.com/kitties).

Since HTMLElements can't be passed to Python directly (at least, I couldn't find any way to), we need to pass some unique IDs instead to locate them by, like unique XPath's.

(The JS code could probably be more elegant, I don't have much experience with the language)

In [22]: script = """
  //https://stackoverflow.com/questions/2661818/javascript-get-xpath-of-a-node/43688599#43688599
  function getXPathForElement(element) {
      const idx = (sib, name) => sib 
          ? idx(sib.previousElementSibling, name||sib.localName) + (sib.localName == name)
          : 1;
      const segs = elm => !elm || elm.nodeType !== 1 
          ? ['']
          : elm.id && document.querySelector(`#${elm.id}`) === elm
              ? [`id("${elm.id}")`]
              : [...segs(elm.parentNode), `${elm.localName.toLowerCase()}[${idx(elm)}]`];
      return segs(element).join('/');
  }

  //https://plainjs.com/javascript/styles/get-the-position-of-an-element-relative-to-the-document-24/
  function offsetTop(el){
    return window.pageYOffset + el.getBoundingClientRect().top;
  }

  var expected_images=3;
  var found_groups=new Map();
  for (e of document.getElementsByTagName('img')) {
    let group_id = e.offsetWidth + "x" + e.offsetHeight;
    if (!(found_groups.has(group_id))) found_groups.set(group_id,[]);
    found_groups.get(group_id).push(e);
  }
  for ([k,v] of found_groups) {
    if (v.length != expected_images) {found_groups.delete(k);continue;}
    var offset_top = offsetTop(v[0]);
    for (e of v){
      let _c_oft = offsetTop(e);
      if (_c_oft !== offset_top){
        found_groups.delete(k);
        break;
      }
    }
  }
  if (found_groups.size != 1) {
    console.log(found_groups);
    throw 'Unexpected pattern of images after filtering';
  }

  var found_group = found_groups.values().next().value;


  result=[]
  for (e of found_group) {
    result.push(getXPathForElement(e));
  }
  return result;
"""

In [23]: d.execute_script(script)
Out[23]:
[u'id("react-root")/section[1]/main[1]/div[1]/article[1]/div[1]/div[1]/div[1]/div[1]/a[1]/div[1]/div[1]/img[1]',
 u'id("react-root")/section[1]/main[1]/div[1]/article[1]/div[1]/div[1]/div[1]/div[2]/a[1]/div[1]/div[1]/img[1]',
 u'id("react-root")/section[1]/main[1]/div[1]/article[1]/div[1]/div[1]/div[1]/div[3]/a[1]/div[1]/div[1]/img[1]']

In [27]: [d.find_element_by_xpath(xp) for xp in _]
Out[27]:
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="1ab4eeb4-10c4-4da4-996c-ee6744445dcc", element="55c48964-8cd0-4472-b35b-214a5a9bfbf7")>,
 <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="1ab4eeb4-10c4-4da4-996c-ee6744445dcc", element="b7f7c8a4-e343-49ca-b416-49f72e67ae07")>,
 <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="1ab4eeb4-10c4-4da4-996c-ee6744445dcc", element="728f6148-6a03-4c9a-9933-36859d65eb51")>]

Solution 2:

So I managed to get it using (outside of loop, of course)

get_img_class = driver.find_elements_by_class_name('img')[1].get_attribute('class')

Just like that I am able to parse the Class ID and store it for a later use. Thanks so much for everyones help. All ideas are great and noted for later use.

Post a Comment for "Get Element With A Randomized Class Name"