MTA Open Data Challenge(new.mta.info)
207 points by oftenwrong 19 hours ago | 13 comments
chaps 14 hours ago
I do work with "open data" on a near-obsessive basis and -- friends, please do not trust "open data" portals to reflect reality accurately. The datasets are often curated, categories changed during the ETL processes, rows missing, and things like that. For example, Chicago's "crimes" dataset intentionally doesn't include all homicides. Can't remember the exact dataset, but I once had a conversation with Chicago's head of open data who told me that they intentionally removed many rows because they were concerned that the public was going to misinterpret the results... but didn't make it clear that rows were missing. So I guess everybody gets the opportunity to misinterpret the results!

FOIA is the better alternative because it gives you the original, pre-cleaned data. Open data is a lie.

pjot 9 hours ago
This is super true. For my city’s portal as well. I’ve found one way around this by versioning the dataset - that is, committing the diffs in git. Credit to Simon Willison’s git-scraping technique.

I do this with my power company’s outage map: https://github.com/patricktrainer/entergy-outages

67k commits!

https://simonwillison.net/2020/Oct/9/git-scraping/

chaps 5 hours ago
That's a really freaking neat trick. Thanks!
stevage 2 hours ago
I worked in open data for quite a few years. This is a very weird take.

Open data portals generally have data is useful form. FOI probably gives you PDFs.

bshep 9 hours ago
Where I grew up the data for murders is curated in such a way that anybody that dies 24 after being attacked is not considered a ‘murder’. Tehy do this to reduce the statistical murder rate.
chaps 8 hours ago
Can you say more about this?
whoiscroberts 8 hours ago
Well now we know why crime is down
kalendos 12 hours ago
I can only imagine. Many ETLs are already messy in companies with better tooling and processes.

Would love to read more about your experience with Open Data. Any place where I can reach out?

chaps 12 hours ago
Here's something about shotspotter data in Chicago: https://x.com/foiachap/status/1775296597850480663

And this one makes some rounds: https://mchap.io/that-time-the-city-of-seattle-accidentally-...

Feel free to reach out!

gordon_freeman 5 hours ago
But even if dataset is incomplete or not accurate, do you think we could at least get directionally right insights from such datasets?
chaps 5 hours ago
Yes, of course there can be. But I cannot ignore the harms in doing so, by misrepresenting the data in a way that disallows others to understand what is or isn't there -- it happens regularly. These datasets are often used as a political tool and contracted with local universities to show that they're providing data... though not actually providing the accurate data. Simultaneously though, people who don't know data will champion the data as accurate because it comes from a university program.

Sometimes what can happen is that somebody inexperienced will try to make some assessment of the data and come to the exact wrong conclusion because they didn't know what not to trust. But it gets on the news anyway and damage is done.

We can do better than that.

IanCal 9 hours ago
Although pre-cleaned data is often not reflective of reality and requires careful work to use, often requiring a lot more knowledge of the field.
whitej125 15 hours ago
Would be neat if instead an open-ended challenge ("here's some data, do something cool") the MTA instead shared a list of hypothetical or real problems to solve and provided data that could be potentially useful in the exploration/solution to the problem.
maxverse 13 hours ago
Also, considering they just got a 68 billion dollar budget approved [1] over the next 5 years, even a small monetary reward would be nice for this. It doesn't need to be a ton of money, but something other than "here's a piece of empty and memorabilia and we'll write a blog post" would be a good incentive

[1] https://ny1.com/nyc/all-boroughs/news/2024/09/25/mta-board-a...

exegete 9 hours ago
I think you are misinterpreting that article. The MTA board approved the plan to spend $68B but they depend on the state to give them funds. That’s the amount of money they are asking for based on the projects they want to complete. The state government has to pass a budget to fund that plan (or do something else). Additionally several current, already started projects are on hold due to the “pause” of congestion pricing which was going to be a funding source.
doctorpangloss 12 hours ago
Why would a cost center political institution enumerate all its problems? It is kind of miraculous they can engage with the public this way at all.
slt2021 11 hours ago
I could not find dataset with payroll hours reported and overtime reimbursed for each MTA employee.

I wanted to investigate how well MTA is managing its workforce and compensation (as to require additional tax in form of Congestion Pricing to fix its budget hole), but there seems to be no dataset for that.

Does anyone have links to MTA payroll/hours/overtime related dataset?

or alternatively, I need dataset to study each and every subway improvement project, and components of each project in materials, labor and etc

WUMBOWUMBO 10 hours ago
perhaps this could be covered in a FOIA request
stevage 2 hours ago
Interesting, these open data challenges were all the rage 10 years ago. Wonder why the sudden trip down memory lane.
thecosas 14 hours ago
Time for someone to crack their knuckles and do a Power Broker-style MTA Open Data mashup :-)

https://en.wikipedia.org/wiki/The_Power_Broker

krebby 14 hours ago
nocman 13 hours ago
I keep clicking on these 'MTA' articles expecting them to be about a "message transfer agent".

Then I think, oh, right, wrong MTA. Guess I've spent too much time dealing with email servers.

rayrrr 15 hours ago
Hold my Metrocard.
onemoresoop 11 hours ago
Hold my bus transfer card.
asjfkdlf 18 hours ago
The prize is very underwhelming. If they really want people to spend effort on it, they need to make the prize worth it.
noitpmeder 17 hours ago
Seems perfect actually! Attracts people that are interested in the subject matter, not just a proposed reward.
maxverse 13 hours ago
"we're hiring people that really love programming and aren't just in it for the money"
0cf8612b2e1e 12 hours ago
It will look great in your portfolio.
xtiansimon 18 hours ago
> “The winner will receive a vintage New York City Transit item from our memorabilia collection.”

Depends what it is. Long as it’s not something you could steal yourself. Ha!

jesterman 17 hours ago
One of the options is literally a trash can! https://new.mta.info/document/85441

Or perhaps... a subway seat? https://new.mta.info/document/85661

erikaww 17 hours ago
I’d give multiple weeks of time for a city trash can lol
mannyv 16 hours ago
Their collection of vintage gum scrapings perhaps?
nxobject 16 hours ago
Never underestimate the value of surplus NYC subway memorabilia to a transit enthusiast. Especially signage from retired rolling stock.
zeroxfe 16 hours ago
If you're doing it for the prize, then you're not the targeted audience :-)
afavour 16 hours ago
IMO it deliberately establishes a tone. This challenge is for rail fans, it’s not a generalised “use our API” hackathon type thing.

Plus the MTA has a huge budget crunch. I really don’t think they could justify spending money on something with such an unclear outcome.

stevage 2 hours ago
Even still it probably cost tens of thousands of dollars of staff time.
IncreasePosts 13 hours ago
The prize is being able to say you won the prize on your resume. I assume a lot of college kids in data science are going to be going at this.
corytheboyd 12 hours ago
I think it actually sounds kinda cool, if it’s something unique that couldn’t just be purchased!
mcfedr 17 hours ago
Why would you region block a webpage like this
JumpCrisscross 16 hours ago
> Why would you region block a webpage like this

As a part-time New York City taxpayer, I'd rather we not be paying EU lawyers to make sure the MTA's open data complies with European law.

pc86 12 hours ago
Good news, the EU doesn't have any jurisdiction in NYC (or anywhere else outside of the EU) so they don't have the ability to enforce anything outside of their borders, as much as they would like you to believe otherwise.

You can enforce what people and companies do within your borders. You cannot enforce what companies or people outside of your borders do.

alwa 9 hours ago
That may come as news to sanctioned Russians and various motley crypto types…

Isn’t the GDPR’s basic theory about jurisdiction that, if I’m sitting in New York City but routinely serving my web content to people in France, that service I’m providing relies on browsing intentions and tracking functions being executed by a user and on a machine in France, and therefore the meat of the “wrongdoing” is happening within their borders?

You can choose to do that the European way or not at all. And the local contests division of the NYC local transit authority is choosing “not at all.”

Isn’t this then a case of NYC complying with the EU’s express wishes for privacy by not “exporting” code they don’t want there?

remram 8 hours ago
In what circumstance do you imagine NYC tax money would go towards EU lawyers?
safeimp 17 hours ago
Reading their terms, I'm guessing it's due to:

> 3. Eligibility: The Challenge is open to legal residents of the United States. Entrants must be 18 years of age or older as of their date of entry. The Challenge is subject to federal, state, and local laws and regulations and is void where prohibited by law. Employees and contractors of the MTA, its subsidiaries, affiliates, and directors (collectively the “Employees”), as well as members of an Employee’s immediate family and/or those living in the same household, are ineligible to participate in the Challenge.

n_plus_1_acc 11 hours ago
You can be a resident of the US and be on vacation for a couple weeks
ratedgene 16 hours ago
yeah but wouldn't you want to create enough buzz globally so word of mouth can spread to more US entrants?
safeimp 16 hours ago
I don't disagree with you at all, I'm just speculating over why they'd block it.
kassner 14 hours ago
https://web.archive.org/web/20240927144204/https://new.mta.i...

I can access it just fine from Sweden :shrug:

nemo44x 16 hours ago
Because the next thing you know the EU is suing you for billions of Euros.
deathanatos 16 hours ago
"Doctor it hurts…", IANAL.

I mean … as I understand the Europeans' law, only if you're doing dumb things to begin with, like giving users' data away to random 3rd parties hellbent on shoving "ads" down one's throat. If you had just made this site a simple HTML page that just had the information the MTA wanted to convey on it, AIUI the EU doesn't have a problem.

Which … the MTA does appear to be, sending requests to Google, LinkedIn, and some other CDNs.

I also don't think the MTA has any EU presence, so what are they going to do?

JumpCrisscross 16 hours ago
> as I understand the Europeans' law, only if you're doing dumb things to begin with, like giving users' data away to random 3rd parties hellbent on shoving "ads" down one's throat

There is a massive difference between complying with the law and proving you comply. (Think: IRS audit.)

> don't think the MTA has any EU presence, so what are they going to do?

Send letters. The MTA would be obligated to respond to them, which means legal bills.

deathanatos 12 hours ago
> There is a massive difference between complying with the law and proving you comply. (Think: IRS audit.)

> The MTA would be obligated to respond to them, which means legal bills.

…why would the MTA be obligated to respond to them? They've no jurisdiction/sovereignty over an American transit agency.

Why would they audit themselves against laws that don't apply to them? (Again, jurisdiction?) I've never worked for a company that audited itself against every law from every nation on Earth; we complied with the laws where we had a presence and did business.

returningfory2 15 hours ago
> ...AIUI the EU doesn't have a problem

We're talking about a US transit agency. Even thinking about whether the EU has a problem with the agency's website is sort of absurd to begin with.

warkdarrior 15 hours ago
Did this US transit agency, MTA, obtain permission from all EU citizens who traveled on the MTA to share their data with the whole world?
JumpCrisscross 15 hours ago
> Did this US transit agency, MTA, obtain permission from all EU citizens who traveled on the MTA

Not how jurisdiction works.

returningfory2 14 hours ago
Eh this conversation has nothing to do with people traveling on MTA services. We’re talking about people accessing the MTA website. Two different things.
cddotdotslash 16 hours ago
Expect to see more of this, especially when the audience is local/US. IIRC, some newspapers are already doing region blocks. Why should website owners targeting US visitors spend _any_ amount of money making their content comply with asinine regulations (like cookie banners)?
cinntaile 15 hours ago
Cookie banners are not a regulation requirement.

Contrary to what you seem to believe...There were more geoblocks when the EU law went into action a couple of years ago. There are less now.

cddotdotslash 8 hours ago
> There were more geoblocks when the EU law went into action a couple of years ago. There are less now.

Source for that?

cinntaile 2 hours ago
My personal experience.
kevin_thibedeau 13 hours ago
EU cookie directive predates GDPR. Notices have long been required by that regulation for use of non-essential cookies.
sgtbr1 15 hours ago
can someone share the data?
manvillej 14 hours ago
what a tragedy, this person never learned how to read.
leanthonyrn 15 hours ago
Intersting challenge. Here is the NotebookLM Audio: MTA's Open Data program https://notebooklm.google.com/notebook/286a30b9-b17f-4dac-9e...