Starlet #19 Skyvern: Browser automation using LLMs and computer vision
LogoBlogAdd Access Token

Starlet #19 Skyvern: Browser automation using LLMs and computer vision

Kerem Yilmaz 3 min read

This is the nineteenth issue of The Starlet List. If you want to prompt your open source project on star-history.com for free, please check out our announcement.


Hello Star History readers! We're the team behind Skyvern, an open-source tool that uses LLMs and computer vision to help companies automate and scale browser-based workflows.

The Birth of Skyvern

We talked to hundreds of companies bogged down by repetitive manual workflows. The pain points were clear: keeping it manual with hiring doesn’t scale well and traditional automation tools like Selenium are rigid and maintenance-heavy. We felt like there was a way to get the best of both worlds with LLMs. We could use LLMs to reason through a website’s layout while preserving the advantage of traditional browser automation allowing it to scale alongside demand. This led us to build Skyvern.

Core Features of Skyvern

  1. Natural Web Navigation: Skyvern can operate on websites it’s never seen before by connecting visible elements with the natural language instructions provided to us. We use a blend of computer vision and DOM parsing to identify a set of possible actions on a website, and multi-modal LLMs to map the natural language instructions to the available actions on the page.

  2. AI-driven Adaptability: Skyvern is resistant to website layout changes, as it doesn’t depend on any predetermined XPaths or other selectors. If a layout ever changes, we can leverage the methodology in #1 to complete the user-specified goal.

  3. Contextual Information Matching: Skyvern accepts a blob of information when navigating workflows—just a JSON blob of whatever information you want to put, and then we use LLMs to map that to information on the screen. For example: if you're generating an auto insurance quote in the US, they commonly ask “Were you eligible to drive at 21?”. The answer could be inferred from the driver receiving their license in 2012, and having a birth date of 1996.

Skyvern in Action

Our project has seen exciting use cases, such as:

  • Automating material procurement for companies.

finditparts_recording_crop

  • Streamlining interactions with government websites for administrative tasks. [demo]

edd_services

  • Facilitating insurance quote generation through dynamic form navigation. [demo]

geico_shu_recording_cropped

bci_seguros_recording

Learn more about Skyvern

We had a great open source launch on Hacker News and following that, the Skyvern repository reached 2.7K ⭐ in less than a week.

Star History Chart

If you’d like to try out Skyvern and see how it works yourself, visit our GitHub. Contributions and feedback in any form are much appreciated.

If you’d like to learn more about how other people are trying to use Skyvern and what are the initial problems that they are facing, check out our Discord or reach out to us directly at founders@skyvern.com