On Firefox, web accessible resources are available at "moz-extension://<extension-UUID>/myfile.png" <extension-UUID> is not your extension's ID. This ID is randomly generated for every browser instance. This prevents websites from fingerprinting a browser by examining the extensions it has installed. https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...
Doesn't the idea of swapping extension specific IDs to your browser specific extension IDs mean that instead of your browser being identifiable, you become identifiable?
I mean, it goes from "Oh they have X, Y , and Z installed" to "Oh, it's jim bob, only he has that unique set of IDs for extensions"
Skimming the list, looks like most extensions are for scraping or automating LinkedIn usage. Not surprising as there's money to be made with LinkedIn data. Scraping was a problem when I worked there, the abuse teams built some reasonably sophisticated detection & prevention, and it was a constant battle.
In order to create the data source that LinkedIn's extension-fingerprinting relies on to work, someone (at LinkedIn*?) almost certainly violated the Chrome Web Store TOS—by (perversely*) scraping it.
* if LinkedIn didn't get it from an existing data source
"The code" here you're referring to (fetch_extension_names.js[1]) isn't and doesn't claim to be LinkedIn's fingerprinting code. It's a scraper that the researcher behind this repo wrote themselves in order to create the CSV of the data that they're publishing here.
LinkedIn's fingerprinting code, as the README explains, is found in fingerprint.js[2], which embeds a big JSON literal with the IDs of the extensions it probes for. (Sickeningly enough, this data starts about two-thirds of the way through the file* and isn't the culprit behind the bulk of its 2.15 MB size…)
* On line 34394; the one starting:
const r = [{
id: "aacbpggdjcblgnmgjgpkpddliddineni",
file: "sidebar.html"
Do they respect my data? Why do they get to track me across sites when I clearly don't want them to but someone can't scrape their data when they don't want them to. Why should big companies get the pass but individuals not? They clearly consider internet traffic fair game and are invasive and abusive about it so it is not only fair to be invasive and abusive back, it is self defense at this point.
I think they framed it this way because they don't consider scraping abuse (to be fair, neither do I, as long as it doesn't overload the site). Botting accounts for spam is clear abuse, however, so that's fair game.
No, I consider all data collection and scraping egregious. From that perspective, LinkedIn is hypocritical when Microsoft discloses every filesystem search I do locally to bing.
I'm sure there are issues with fake accounts for scraping, but the core issue is that LinkedIn considers the data valuable. LinkedIn wants to be able to sell the data, or access to it at least, and the scrapers undermine that.
They could stop all the scraping by providing a downloadable data bundle like Wikipedia.
We enjoy the fruits of an LLM or two from time to time, derived from hoards of ill gotten data. Linkedin has the resourses to attempt to block scraping, but even at the resource scale of LI I doubt the effort is effective.
I am not denying that scraping is useful. If it wasn't people wouldn't do it. But if the site rules say you aren't allowed to scrape, then I don't think people should be hostile towards the people enforcing the rules.
Well, they can try to enforce the rules; that's perfectly fair. At the same time, there are many methods of "trying" which I would not consider valid or acceptable ones. "Enforcing the rules" does not give a carte blanche right to snoop and do "whatever's necessary." Sony tried that with their CD rootkits and got multiple lawsuits.
The big social media businesses deserve a Teddy Roosevelt character swooping in and busting their trusts, forcing them to play ball with others even if it destroys their moats. Boo hoo! Good riddance. World's tiniest violin.
This is a popular position across the aisle. Here's hoping the next guy can't be bought, or at least asks for more than a $400M tacky gold ballroom!
I mean, regardless of who they are or even if you don’t like what LinkedIn does themselves with the data people have given them, the random third parties with the extensions don’t additionally deserve to just grab all that data too, do they?
Eh. I worked at a company which made an extension which scraped LinkedIn. We provided a service to recruiters, who would start a hiring process by putting candidates into our system.
The recruiters all had LinkedIn paid accounts, and could access all of this data on the web. We made a browser extension so they wouldn’t need to do any manual data entry. Recruiters loved the extension because it saved them time.
I think it was a legitimate use. We were making LinkedIn more useful to some of their actual customers (recruiters) by adding a somewhat cursed api integration via a chrome extension. Forcing recruiters to copy and paste did’t help anyone. Our extension only grabbed content on the page the recruiter had open. It was purely read only and scoped by the user.
Chrome is the new IE6. Google set themselves up to be the next Microsoft and is "ad friendly" in all the creepy ways because that's what Google IS an ad company. All they've contributed to security is diminishing the capability of adblockers and letting malware to do bad things to you as consumers.
> This repository documents every extension LinkedIn checks for and provides tools to identify them.
I get that the CSV lists the extensions, and the tools are provided in order to show work (mapping IDs to actual software). But how was it determined that LinkedIn checks for extensions with these IDs?
Technical writeup from a few weeks ago by a vendor that explains how LinkedIn does it, then boasts that their approach is "quieter, harder to notice, and easier to run at scale":
Fingerprinting. There are a few reasons you'd do it:
1. Bot prevention. If the bots don't know that you're doing this, you might have a reliable bot detector for a while. The bots will quite possibly have no extensions at all, or even better specific exact combination they always use. Noticing bots means you can block them from scraping your site or spamming your users. If you wanna be very fancy, you could provide fake data or quietly ignore the stuff they create on the site.
2. Spamming/misuse evasion. Imagine an extension called "Send Messages to everybody with a given job role at this company." LinkedIn would prefer not to allow that, probably because they'd want to sell that feature.
I wrote some automation scripts that are not triggered via browser extensions (e.g., open all my sales colleagues’ profiles and like their 4 most recent unliked posts to boost their SSI[1], which is probably the most ‘innocent’ of my use-cases). It has random sleep intervals. I’ve done this for years and never faced a ban hammer.
Wonder if with things like Moltbot taking the scene, a form of “undetectable LinkedIn automation” will start to manifest. At some point they won’t be able to distinguish between a chronically online seller adding 100 people per day with personalized messages, or an AI doing it with the same mannerisms.
Another thing... they alter the localStorage & sessionStorage prototype, by wrapping the native ones with a wrapper that prevent keys that not in their whitelist from being set.
I didn't find popular extensions like uBlock or other ad blockers.
The list is full of scammy looking data collection and AI tools, though. Some random names from scrolling through the list:
- LinkedGPT: ChatGPT for LinkedIn
- Apollo Scraper - Extract & Export Apollo B2B Leads
- AI Social Media Assistant
- LinkedIn Engagement Assistant
- LinkedIn Lead Magnet
- LinkedIn Extraction Tool - OutreachSheet
- Highperformr AI - Phone Number and Email Finder
- AI Agent For Jobs
These look like the kind of tools scummy recruiters and sales people use to identify targets for mass spamming. I see several AI auto-application tools in there too.
LinkedIn itself provides tools for scummy recruiters to mass spam, so this is just them protecting their business.
Also, not all of them are data collection tools. There are ad blockers listed (Hide LinkedIn Ads, SBlock - Super Ad Blocker) and just general extensions (Ground News - Bias Checker, Jigit Studio - Screen Recorder, RealEyes.ai — Detect Deepfakes Across Online Platforms, Airtable Clipper).
I’m probably on the list. I made a LinkedIn Redactor that allowed you to add keywords and remove posts from your thread that included such words. It’s the X feature but for LinkedIn. Anyway, got a cease and desist from those lame fucks at LI. So I removed from the chrome store but it’s still available on GitHub.
This works by looking for web accessible resources that are provided by the extensions. For Chrome, these are are available in a webpage via the URL chrome-extension://[PACKAGE ID]/[PATH] https://developer.chrome.com/docs/extensions/reference/manif...
On Firefox, web accessible resources are available at "moz-extension://<extension-UUID>/myfile.png" <extension-UUID> is not your extension's ID. This ID is randomly generated for every browser instance. This prevents websites from fingerprinting a browser by examining the extensions it has installed. https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...
Doesn't the idea of swapping extension specific IDs to your browser specific extension IDs mean that instead of your browser being identifiable, you become identifiable?
I mean, it goes from "Oh they have X, Y , and Z installed" to "Oh, it's jim bob, only he has that unique set of IDs for extensions"
edit: er, I think that that also suggests that I need to restart firefox more often...
I presume the extension knows when it wants to access resources of its own. But random javascript, doesn't.
It won't disclose how, as it says it has had several users report it. And that it expects 50% of the bounty, and will use it for GPU upgrades.
* if LinkedIn didn't get it from an existing data source
LinkedIn's fingerprinting code, as the README explains, is found in fingerprint.js[2], which embeds a big JSON literal with the IDs of the extensions it probes for. (Sickeningly enough, this data starts about two-thirds of the way through the file* and isn't the culprit behind the bulk of its 2.15 MB size…)
* On line 34394; the one starting:
1. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>2. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>
They could stop all the scraping by providing a downloadable data bundle like Wikipedia.
The data bundle doesn't help that at all.
This is a popular position across the aisle. Here's hoping the next guy can't be bought, or at least asks for more than a $400M tacky gold ballroom!
The recruiters all had LinkedIn paid accounts, and could access all of this data on the web. We made a browser extension so they wouldn’t need to do any manual data entry. Recruiters loved the extension because it saved them time.
I think it was a legitimate use. We were making LinkedIn more useful to some of their actual customers (recruiters) by adding a somewhat cursed api integration via a chrome extension. Forcing recruiters to copy and paste did’t help anyone. Our extension only grabbed content on the page the recruiter had open. It was purely read only and scoped by the user.
Screenshots found here https://x.com/DenisGobo/status/2018334684879438150
https://www.nymeria.io/blog/linkedins-war-on-email-finder-ex...
https://blog.castle.io/detecting-browser-extensions-for-bot-...
Google became a monopoly. All monopolies do this.
https://javascript.plainenglish.io/the-extensions-you-use-ar...
I get that the CSV lists the extensions, and the tools are provided in order to show work (mapping IDs to actual software). But how was it determined that LinkedIn checks for extensions with these IDs?
And is this relevant for non-Chrome users?
https://blog.castle.io/detecting-browser-extensions-for-bot-...
Typical early hooks: • fetch wrapper • XMLHttpRequest.prototype.open/send wrapper • WebSocket constructor wrapper • history.pushState/replaceState wrapper • EventTarget.addEventListener wrapper (optional, heavy) • MutationObserver for DOM diffs • Error + unhandledrejection capture
1. Bot prevention. If the bots don't know that you're doing this, you might have a reliable bot detector for a while. The bots will quite possibly have no extensions at all, or even better specific exact combination they always use. Noticing bots means you can block them from scraping your site or spamming your users. If you wanna be very fancy, you could provide fake data or quietly ignore the stuff they create on the site.
2. Spamming/misuse evasion. Imagine an extension called "Send Messages to everybody with a given job role at this company." LinkedIn would prefer not to allow that, probably because they'd want to sell that feature.
3. User tracking.
Wonder if with things like Moltbot taking the scene, a form of “undetectable LinkedIn automation” will start to manifest. At some point they won’t be able to distinguish between a chronically online seller adding 100 people per day with personalized messages, or an AI doing it with the same mannerisms.
[1] https://business.linkedin.com/sales-solutions/social-selling...
I would guess this is for rate limiting and abuse detection.
const msg = createDoneMessage(); msg.style.opacity = '1';
You can try this by opening devtools and setting
I didn't find popular extensions like uBlock or other ad blockers.
The list is full of scammy looking data collection and AI tools, though. Some random names from scrolling through the list:
- LinkedGPT: ChatGPT for LinkedIn
- Apollo Scraper - Extract & Export Apollo B2B Leads
- AI Social Media Assistant
- LinkedIn Engagement Assistant
- LinkedIn Lead Magnet
- LinkedIn Extraction Tool - OutreachSheet
- Highperformr AI - Phone Number and Email Finder
- AI Agent For Jobs
These look like the kind of tools scummy recruiters and sales people use to identify targets for mass spamming. I see several AI auto-application tools in there too.
Also, not all of them are data collection tools. There are ad blockers listed (Hide LinkedIn Ads, SBlock - Super Ad Blocker) and just general extensions (Ground News - Bias Checker, Jigit Studio - Screen Recorder, RealEyes.ai — Detect Deepfakes Across Online Platforms, Airtable Clipper).