Martin from GitHub here. This type of behaviour is explicitly against the GitHub terms of service, when we catch the accounts doing this we can (and do) take action against those accounts including banning the accounts. It's a game of whack-a-mole for sure, and it's not just start-ups that take part in this sketchy behaviour to be honest. I've been plenty of examples in my time across the board.
The fundamental nature of Git makes this pretty easy for folks to scrape data from open source repositories. It's against our terms of service and those folks might want to talk with some lawyers about doing it - but as every Git commit contains your name and email address in the commit data it's not technically difficult even if it is unethical.
From the early days we've added features to help users anonymise their email addresses for commits posted to GitHub. Basically, you configure your local Git client to use your 'no-reply' email address in commits and that still links back to your GitHub account when you push: https://docs.github.com/en/account-and-profile/reference/ema...
I think that's still probably the best route. We want to keep open source data as open as possible, so I don't think locking down API's etc is the right route. We do throttle API requests and scraping traffic, but then again there have been plenty of posts here over the years from people annoyed at hitting those limits so it's definitely a balancing act. Love to know what folks here think though.
I’ve made over five reports for this exact spam scenario, and never once have y’all acted on them. I have a hard time believing you ban spam accounts that clearly violate your ToS.
I don't have any specific suggestions, but I do want to give thanks for implementing functionality to block pushes if the email field is *not* using an anonymized mail address.
It's one thing to offer anonymous e-mail addresses, but it's also awesome that GitHub can help prevent mistakes that would otherwise leak a user's e-mail address. I am not sure how many people try to be privacy conscious on GitHub, but I assume most users don't, so it's nice seeing this little feature exist.
I am also getting constant spam because apparently they can see who starred a repo (i.e. I see you starred repo x and we are doing something similar). I am not starring anything anymore.
Mind fixing lucidrains account? Something happened without notice or recourse. He's one of, if not the most well known open source AI researchers on the planet, with implementations and explanations of papers and ideas that are wonderful. If you could bring some sanity to that situation and take it out of whatever kafkaesque account purgatory it fell into, you'd be doing the work of angels.
What was happening with this account? I was often seeing popular but empty (only title of the paper and maybe a short readme) repositories that were created directly after a paper was published?
I know it is against the ToS. I've reported multiple organisations doing this. Last time I reported one, support closed the ticket saying the activity is off platform so they can't do anything.
Maybe I am missing something, but can’t you simply not show the email address in a git commit? (Sincere question, not saying this is trivial. i am dumb and like to ask dumb questions even if might be embarassing)
If someone wants to message someone, it goes through github notifications or github emails them
Also banning an account doesnt seem like a heavy punishment, given they can simply move to gitlab, bitbucket etc
That would be a fundamental change to how Git works, not just GitHub. Even if the web UI didn't show it, a simple `git log` would reveal it.
You can mask your email address in git commits but a lot of open source projects won't accept that. And some pseudo-open-source ones insist on sending you an email to authenticate before they'll give you access to the GitHub repo (looking at you Unreal Engine!)
So, no, I don't think they could simply "not show the email address".
Git commits have a email address as a required field[0], although some people put something bogus in there. And then it's in the data provided when you clone the repo onto your machine even if you aren't using the GitHub APIs.
To his point, you can set that to the no-reply email address GitHub gives you if you don't want mail but do want the commit to be linked to your GitHub account.
I kinda wish I had that much power. There would certainly be less people in the world listening to their phones without headphones..
Usually starts with contacting them over email reminding them of the terms of service and warning them to stop. Then their account might get deactivated and they need to write and promise to not be naughty again. If they ignore that then the account gets removed.
There are a bunch of automated checks that are running all the time as well and will take automated action that then gets later reviewed by humans. At lot of times the process is fast-tracked.
The off-platform 'let's scrape a bunch of data and then spam nice people' is the hardest to police. Linking those mails to an offending GitHub account is hard and very manual, also anyone can send emails saying they are someone they are not and because of that anyone can deny they sent the mail and they'll usually blame a rogue agency they where working with etc.
I probably shouldn't say it, but the public shame that comes from being mentioned on social, in hacker news etc. That stops people who want to be treated as legitimate from doing that sort of thing and helps educate the wider community around what is and isn't acceptable behaviour - that is why it's good to see this thread and see the issue getting attention.
This would be a gross miscarriage of justice and bringing successful action under this theory would do widespread harm by expanding the definition of the CFAA.
Just because a company can take some nuclear action, doesn't mean they should.
I've spent a lot of my career marketing to developers, and spamming their GitHub account might be top 1 or 2 worst marketing tactics you can use.
Cold emailing rarely works by itself. Cold emailing developers via emails you pulled from their GitHub accounts? At that point, you're actively harming your brand, and may as well just send them spam diet pill ads.
I also had unsolicited spam from Vincent Jiang of Aden, another YC company.
Hi Daniel,
I just came across your profile on social media and wondered if you'd be interested in joining our Discord community for AI agent development. Currently, we see that agents break, loop, get lost, hallucinate, and cost a fortune, and therefore built a space where developers can share challenges and insights.
I had a similar one from that guy asking me to make open source PRs to some repo of theirs for, err, $25-50/hour. I replied explaining that senior software engineers in the UK aren’t quite as desperately poor as that, and got a canned response saying that they were looking forward to reviewing my PRs :D
I wish github could ammend the email of my commits to the private noreply address during push so they _never_ have any other email associated to them. May not be feasible due to the commit changing, confusing local branch and such?
They have this other thing where they reject pushes for the 'known' emails you've told them you have, but kinda seems there should be a setting to do that for any email that is not your noreply private one.
is that a feasible thing to ask for?
If you change the email address, you change the commit hash. And yes, suddenly your local branches are orphaned.
Of course, there's nothing stopping you from using a git-only email address (nospam-6thbit@yourdomain) and routing that to /dev/null. GitHub can't change email addresses, but you can.
YC is basically advising their startups to engage in shitty business practices, like trying to hire UK staff for half the salary and expecting 7 day weeks.
This happens all the time, not really surprised as the GitHub API makes it pretty easy to extract valuable leads with real and confirmed email addresses.
I don't like this way of putting it, it's good the github API makes this easy as that makes it an useful. Should not try to imply this should be restricted just because of some bad actors. It's just going to annoy legit users and the bad ones will scrape anyway.
But that is someone pretending to be YC which is sort of less interesting than a YC company doing something bad. Because phishers imitate legit companies all the time. Easy to get roped in and I sympathise, anyone is suseptable (today I almost clicked the phishing training email as it looked urgent and pushed the right buttons)
I have received over the years so much spam of this kind by multiple YC-funded companies that I now reflexively send to spam any email that mentions being YC-funded, regardless of how legitimate the email is.
I'm always interested to understand - what constitutes a basic ChatGPT wrapper?
Is Legora, which is doing very well, a basic ChatGPT wrapper? Because if you don't view it as one, it certainly started as one.
Doesn't YC have some code of conduct or legal/ethical guidelines? I would assume a legal and compliance department would have some major headache if documented cases of misconduct jeopardize later due diligence. I would not fund or aquire a company on the radar of national regulatory bodies for something as stupid as this.
Like every other VC firm, the only thing they care about is money. They can pretend to morals, but they will never sacrifice one for the other in any meaningful way.
Only free individual can have strong ethics. There are no free people in capitalism, money is debt after all. Think of applied pressure once you sign under VC money and amount of brainwashing / gaslighting. I sincerely hope my observation is wrong.
If you are going to go down that road: life is debt, and there is no true freedom. We are bound by the needs of our meat-containers, after all.
I don't like unfettered capitalism, but when I consider economies that have existed over time, it certainly looks like constrained capitalism affords the most freedom.
Didn't AirBnB famously spam people in the Bay Area as a "guerilla tactic" to build the business in its early days? This kind of fast and loose behaviour seems standard.
My solution to this is to use a Github-specific email address. All emails sent to that address which do not originate from GitHub are immediately reported as spam, marked read and deleted.
I sometimes use different git/GitHub addresses depending on who I'm working for or specific projects so I can more accurately detect where data is being scraped from.
N.B. Using service-specific emails is trivial - you don't need separate email accounts. Just use email aliases, e.g. "john.smith+github@gmail.com" -- which is an alias called "github" for "john.smith@gmail.com"
A simple regex filter will get rid of that. Now, if you use your own domain and have it configured as a catch-all, then you could do github@domain.tld.
I'm not saying I do this but if I were as smart as I think I am I would have given a Gmail example rather than the example you've given to avoid bots just looking up my website and starting to bypass my setup... ;) ;) ;)
Also, spammers generally don't seem to be going to the effort to apply regex filters to the data they've scraped...
You'd have thought so, but no, in my experience this works very well. People doing this kind of spamming don't seem to be particularly bright, nor do they seem to spend any time/effort to clean up their scraped database.
Perhaps, but it doesn't change the fact that this is bad behavior for the company sending the email. Since YCombinator funded this company it makes sense that YC would want to know about how they are conducting business.
General advice would be to mark the email as spam or junk and hopefully their email platform penalizes them, but this has been working less and less. Email has truly become pay to play now.
You mention GDPR, which also "applies" to me, though I wonder if what they're doing is actually illegal. I mean, after all, I'm putting my email on GitHub precisely to give people a way to contact me.
Of course, I do that naïvely, assuming good faith, not expecting _companies_ to use it to spam me. So definitely what they're doing is, at the very least, in poor taste.
> I'm putting my email on GitHub precisely to give people a way to contact me.
They’re not only looking at the public email in your profile, they’re also looking at your committer email (git config user.email). You could argue that you’re not putting that out for people to contact you.
(I’ve used that trick a couple times to reach out to people, too, but never mass emailing.)
Is there any company that will take my money to solve GDPR issues? And by solve I mean sue the spammers? For last few years I saw they "try" to look legit, by claiming addresses are managed by some Hungarian/Spanish shell company, hoping no one will be able to afford pursuing infractions over borders.
There's probably a law against it, but I've always thought a legal company could make decent money taking cases like this in bulk for free, on the condition that they get to keep all the compensation, while the "client" still gets the satisfaction of punishing the offending party.
I’m not especially bothered by this [yet -AI is likely to make this worse]. It’s a fairly insignificant component of my spam catcher. At least, it’s a bit focused.
Every day, I get deluged with hundreds of spam and scam emails, often because some knucklehead entered my email in a form (either accidentally, or as a throwaway red herring).
> Some examples of ethical behavior we expect from founders are:
> - Not spamming members of the community
> To maintain our community, if we determine (in our sole discretion) that a founder has behaved unethically during or after YC, we will revoke their YC founder status. This includes access to all Y Combinator spaces, software, lists and events. All founders in a company may be held responsible for the unethical actions of a single co-founder or a company employee, depending on the circumstances.
Do random GH accounts count as "members of the YC community"?
Sorry, but unsolicited contact, much as I hates, HATESSSS it, is a classic component of any business, and has been, for many decades. I don't think it would be appropriate for a business organization to prohibit its members from engaging in "cold calling," of which, UCE is really an example.
Using the YC branding/name, however, is a different matter.
This sounded familiar, so I checked my inbox and I did indeed receive a similar email from sanchitmonga@runanywheresdk.com earlier this month:
> I came across your GitHub profile and thought you might be interested in what my team and I are building. We're developing an open source SDK that runs LLMs directly on-device.
What's even more interesting is that both buildrunanywhere.org and runanywheresdk.com show a stock hostinger parking page when accessed in a browser. Something tells me they're intentionally registering these "alternate" domains specifically for spam, to avoid tanking the email reputation of their main runanywhere.ai domain.
I guess I shouldn't be surprised given YC is going all in on AI and most AI companies are no better than the crypto scammers of yesteryear, but still.
I observed the same thing and it was only when you told me the main domain that I found their website.
> Something tells me they're intentionally registering these "alternate" domains specifically for spam, to avoid tanking the email reputation of their main runanywhere.ai domain
(I also couldn't see any post created by them on YC checking algolia from their website fwiw)
Seeing their star history on their product, I see some few interesting observations[0] Their star history was almost horizontal between december and february until it got vertical all of a sudden.
I looked through their linkedin and found this website owned by them as well https://www.openclawpi.com/ and using the YC brand here as well. (registerered 26 days ago)
This website looks fairly AI generated to me as well and there are some bugs within the original website as well which I am now incredibly more unsure of if generated by AI or not given the similarities between the two websites UI/UX as well.
I've received several similar ones over the years. At this point, if I get an email from someone I don't know and it contains a link, chances are it's spam. I genuinely doubt github(or any other company for that matter) would do something about it. While I fully support GDPR, the truth is, few people are willing to take action knowing how much bureaucracy would be involved...
I usually check the "Received" header and report to the email service provider. Once in a while I receive a response saying the case is properly handled.
These providers are the only ones that care about their reputation and thus may take some action. Investors? Nope.
the problem is that the emails arent typically sent from the main domain.
in this example, the email came from buildrunanywhere.org, which is just a parked domain. the real domain is runanywhere.ai, which they arent using for spam.
so, once buildrunanywhere.org has their reputation burned from reports, they will simply register buildrunanywheres.org and start spamming again.
HN and YC walk a thin line between hacker culture and venture capitalist culture. I know it’s easy to think that because HN comes from YC them too are aligned with hacker culture, but no. YC is all cutthroat business.
There's no reason to put your real email in git config unless you're signing, in which case repos should be private. I would have thought that was obvious.
I have been having the same experience. If you starred a GitHub repo, and they think that their product is similar, they will send you their spam. I condemn this! They should be ashamed!
After 25 years on the internet dealing with spam, it would never even occur to me to invest the energy to write a letter to the offending companies investor. But more power to them I'd say!
> These emails indicate that those companies scrape people's Github activity, and if they notice users contributing to repos in their field of business, send marketing emails to those users without receiving their consent. My guess is that they use commit metadata for this purpose.
There are likely marketing email datasets floating around the internet that contain email addresses scraped from commit metadata.
I use a catchall with a specific Git client (not GitHub) email address, and found spam and phishing emails being sent there quite a few times.
May not necessarily be from commit messages, there's at least one way simpler way: simply adding .gpg to the end of any user URL will return that user's public GPG key.
The fundamental nature of Git makes this pretty easy for folks to scrape data from open source repositories. It's against our terms of service and those folks might want to talk with some lawyers about doing it - but as every Git commit contains your name and email address in the commit data it's not technically difficult even if it is unethical.
From the early days we've added features to help users anonymise their email addresses for commits posted to GitHub. Basically, you configure your local Git client to use your 'no-reply' email address in commits and that still links back to your GitHub account when you push: https://docs.github.com/en/account-and-profile/reference/ema...
I think that's still probably the best route. We want to keep open source data as open as possible, so I don't think locking down API's etc is the right route. We do throttle API requests and scraping traffic, but then again there have been plenty of posts here over the years from people annoyed at hitting those limits so it's definitely a balancing act. Love to know what folks here think though.
I even wrote about a specific example of a YC company spamming me from my GitHub email at https://benword.com/dont-tolerate-unsolicited-spam
It's one thing to offer anonymous e-mail addresses, but it's also awesome that GitHub can help prevent mistakes that would otherwise leak a user's e-mail address. I am not sure how many people try to be privacy conscious on GitHub, but I assume most users don't, so it's nice seeing this little feature exist.
Mind fixing lucidrains account? Something happened without notice or recourse. He's one of, if not the most well known open source AI researchers on the planet, with implementations and explanations of papers and ideas that are wonderful. If you could bring some sanity to that situation and take it out of whatever kafkaesque account purgatory it fell into, you'd be doing the work of angels.
Thanks!
How do I report that person, though? Your support page about reporting abuse assumes I know the person's Github account: https://docs.github.com/en/communities/maintaining-your-safe...
If someone wants to message someone, it goes through github notifications or github emails them
Also banning an account doesnt seem like a heavy punishment, given they can simply move to gitlab, bitbucket etc
You can mask your email address in git commits but a lot of open source projects won't accept that. And some pseudo-open-source ones insist on sending you an email to authenticate before they'll give you access to the GitHub repo (looking at you Unreal Engine!)
So, no, I don't think they could simply "not show the email address".
To his point, you can set that to the no-reply email address GitHub gives you if you don't want mail but do want the commit to be linked to your GitHub account.
[0]: https://git-scm.com/docs/git-commit#_commit_information
"What you are doing is against Github's TOS"
Usually starts with contacting them over email reminding them of the terms of service and warning them to stop. Then their account might get deactivated and they need to write and promise to not be naughty again. If they ignore that then the account gets removed.
There are a bunch of automated checks that are running all the time as well and will take automated action that then gets later reviewed by humans. At lot of times the process is fast-tracked.
The off-platform 'let's scrape a bunch of data and then spam nice people' is the hardest to police. Linking those mails to an offending GitHub account is hard and very manual, also anyone can send emails saying they are someone they are not and because of that anyone can deny they sent the mail and they'll usually blame a rogue agency they where working with etc.
I probably shouldn't say it, but the public shame that comes from being mentioned on social, in hacker news etc. That stops people who want to be treated as legitimate from doing that sort of thing and helps educate the wider community around what is and isn't acceptable behaviour - that is why it's good to see this thread and see the issue getting attention.
This would be a gross miscarriage of justice and bringing successful action under this theory would do widespread harm by expanding the definition of the CFAA.
Just because a company can take some nuclear action, doesn't mean they should.
I will pay more for GitHub if you go hard on these mfs.
Cold emailing rarely works by itself. Cold emailing developers via emails you pulled from their GitHub accounts? At that point, you're actively harming your brand, and may as well just send them spam diet pill ads.
From: henry@joincactuscompute.com
Hey,
I hope all is well with you, just reaching out as you seem to be interested in on-device speech models.
Cactus is a low-latency AI engine for consumer devices like phones, Macs, wearables, Raspberry Pis, etc.
We support transcription models like Whisper & Parakeet, benchmarks available in the attached GitHub repo.
GitHub: https://github.com/cactus-compute/cactus
We are keen to get your feedback, and star if feeling generous.
Thanks a million
A 419 scam?
https://news.ycombinator.com/item?id=9332418 (11 years ago)
https://news.ycombinator.com/item?id=20660624 (7 years ago)
https://news.ycombinator.com/item?id=27855152 (5 years ago)
https://news.ycombinator.com/item?id=30900237 (4 years ago)
Seems it’s a reoccurring issue
They have this other thing where they reject pushes for the 'known' emails you've told them you have, but kinda seems there should be a setting to do that for any email that is not your noreply private one. is that a feasible thing to ask for?
Of course, there's nothing stopping you from using a git-only email address (nospam-6thbit@yourdomain) and routing that to /dev/null. GitHub can't change email addresses, but you can.
https://news.ycombinator.com/item?id=45357205
Hope they didn’t get too many folks.
Sorry but lol you must be new here.
I don't like unfettered capitalism, but when I consider economies that have existed over time, it certainly looks like constrained capitalism affords the most freedom.
And they are using a different domain for the emails so the spam markers don’t hit their primary domain.
I sometimes use different git/GitHub addresses depending on who I'm working for or specific projects so I can more accurately detect where data is being scraped from.
Also, spammers generally don't seem to be going to the effort to apply regex filters to the data they've scraped...
Kernel guidelines now have a more verbose section about tagging: https://www.kernel.org/doc/html/latest/process/submitting-pa...
You mention GDPR, which also "applies" to me, though I wonder if what they're doing is actually illegal. I mean, after all, I'm putting my email on GitHub precisely to give people a way to contact me.
Of course, I do that naïvely, assuming good faith, not expecting _companies_ to use it to spam me. So definitely what they're doing is, at the very least, in poor taste.
They’re not only looking at the public email in your profile, they’re also looking at your committer email (git config user.email). You could argue that you’re not putting that out for people to contact you.
(I’ve used that trick a couple times to reach out to people, too, but never mass emailing.)
It needs to be modified like how individuals can go after telemarketers.
A lawyer
Every day, I get deluged with hundreds of spam and scam emails, often because some knucklehead entered my email in a form (either accidentally, or as a throwaway red herring).
> Some examples of ethical behavior we expect from founders are:
> - Not spamming members of the community
> To maintain our community, if we determine (in our sole discretion) that a founder has behaved unethically during or after YC, we will revoke their YC founder status. This includes access to all Y Combinator spaces, software, lists and events. All founders in a company may be held responsible for the unethical actions of a single co-founder or a company employee, depending on the circumstances.
Edit: Apparently "about a dozen companies"[0] have been booted for ethics violations.
0: https://techcrunch.com/2021/06/09/does-what-happens-at-yc-st...
Ah... but there's the rub.
Define "the community."
Do random GH accounts count as "members of the YC community"?
Sorry, but unsolicited contact, much as I hates, HATESSSS it, is a classic component of any business, and has been, for many decades. I don't think it would be appropriate for a business organization to prohibit its members from engaging in "cold calling," of which, UCE is really an example.
Using the YC branding/name, however, is a different matter.
This is not GitHub only, I have got a survey on how my experience interacting with folks on lkml
> I came across your GitHub profile and thought you might be interested in what my team and I are building. We're developing an open source SDK that runs LLMs directly on-device.
What's even more interesting is that both buildrunanywhere.org and runanywheresdk.com show a stock hostinger parking page when accessed in a browser. Something tells me they're intentionally registering these "alternate" domains specifically for spam, to avoid tanking the email reputation of their main runanywhere.ai domain.
I guess I shouldn't be surprised given YC is going all in on AI and most AI companies are no better than the crypto scammers of yesteryear, but still.
> Something tells me they're intentionally registering these "alternate" domains specifically for spam, to avoid tanking the email reputation of their main runanywhere.ai domain
This is a really bad look on them.
https://www.whatsmydns.net/domain-age?q=buildrunanywhere.org and https://www.whatsmydns.net/domain-age?q=runanywheresdk.com
Both these domain were registered only 36 days ago
Their main domain had been around for 6 month (216 days) tho:- https://www.whatsmydns.net/domain-age?q=runanywhere.ai
(I also couldn't see any post created by them on YC checking algolia from their website fwiw)
Seeing their star history on their product, I see some few interesting observations[0] Their star history was almost horizontal between december and february until it got vertical all of a sudden.
[0]:https://www.star-history.com/#runanywhere.ai/runanywhere.ai&...
I looked through their linkedin and found this website owned by them as well https://www.openclawpi.com/ and using the YC brand here as well. (registerered 26 days ago)
This website looks fairly AI generated to me as well and there are some bugs within the original website as well which I am now incredibly more unsure of if generated by AI or not given the similarities between the two websites UI/UX as well.
These providers are the only ones that care about their reputation and thus may take some action. Investors? Nope.
in this example, the email came from buildrunanywhere.org, which is just a parked domain. the real domain is runanywhere.ai, which they arent using for spam.
so, once buildrunanywhere.org has their reputation burned from reports, they will simply register buildrunanywheres.org and start spamming again.
And I use a different email fromy priority email for GitHub commits since 4 years ago.
So just stop with marketing slop please.
Yes, I work with AI, and I'm becoming pretty good at it.
But this doesn't mean I'm comfortable pushing AI slop into potential users and customers.
I (and they) want to use AI to facilitate their processes, not to ingest slop content.
There are likely marketing email datasets floating around the internet that contain email addresses scraped from commit metadata.
I use a catchall with a specific Git client (not GitHub) email address, and found spam and phishing emails being sent there quite a few times.