New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-526] pin all dependencies in setup.py #1809
Conversation
Pinning all dependencies in setup.py with help having more homogenous installs across environments. The current approach of allowing version ranges based on version notation is risky at best and results in bad surprises. The downside is that we won't get positive bug fixes and security fixes for free, but these come at a high cost where a single bad version of a package may take your Airflow environment down, without warning.
Couple of questions: Could this break airflow for some users who need their own specific versions (i.e. should this change wait for Airflow 2.0)? Was this tested? I am a big +1 on this idea. |
I am also +1 |
@aoen I decided on versions based on the most recent within the previously specified range, using this very useful service: I ran tests locally and on Travis. I did run |
That is nifty indeed! What about the airflow 2.0 comment? I think this will likely break some airflow installations out there. |
As discussed I'd say we collectively bite the bullet and get uniformity from the next version ( |
LGTM |
@mistercrunch @bolkedebruin @aoen |
@mistercrunch do you want to go ahead with this? It looks like it might have stagnated. Worth going forward IMO. |
+1 Same. Got bit by a pandas release over the weekend. (0.19.1) |
I forced a version of numpy in setup_requires (runs before install_requires) to fix issues with pandas installation and documented the fact that people should update to the latest |
Note: I have the same issue as @r39132:
|
The commit I just added: 4b6aa14 The problem was that |
Note that somewhere along the way |
Thanks! |
Without the
Even after running:
I still get it. |
@criccomini ok I was able to reproduce and moved python-daemon to setup_requires and things seem to work now. |
:( Still giving me the same thing. I'm starting to get paranoid.
|
Also:
I applied the diff for this PR to it. Appears to have your latest commit in it.
|
was that from a fresh venv? if not I'm guessing that |
It was from a fresh venv. Completely new. The very first line is where I create it. |
|
FYI, I was able to reproduce what @criccomini saw as well. |
seems like this change may be a little too rigid and increase friction in installing airflow alongside someone's existing packages. does it not make sense to assume people will follow proper patterns with major vs. minor version changes, and then become more strict on a case by case basis if we see issues with individual packages.. e.g. with pandas; flask just released a very minor backwards compatibility issue in a minor release. |
We've been bit 3 times in the past by packages upgrading within the strict-ish range we had setup before (within big number release after 1.0, and within minor release before 1.0). That's not mentioning the dozens of installation, upgrades and configuration specific bugs that are hard to reproduce. In some cases you have to ask people to share the output of their Maybe some packages (pandas, numpy) can be set within a range, and eventually for Airflow 2.0 we can break down the airflow package into more smaller packages with less dependencies in each. We should pin as much as possible though. |
Waiting on @bolkedebruin to confirm python-daemon 1.6.1 is ok. |
Is it our job to set other people's dependencies? The piece that I worry about is someone else setting a requirements file with airflow and some other things and having issues controlling the specific versions of libraries used. As a library, it should be our job to specify the minimum that we need and let our users specify the exact versions they care about. |
@mistercrunch let me know what you would like to do with this. This is an important change that I would like to see committed, so please prioritize it, else we will need to close. |
@mistercrunch 1.6.1 seems ok, however the patch does not always "work" cleanly (python setup.py install complaining):
|
@mistercrunch Can you get this PR across the line? I've "lightened" your load by closing a bunch of your other PRs. This one is def needed. |
Can I request a bump of the google api to 1.5.4 (or shall I do it after this one is merged) |
I think someone needs to take this on - the PR is more than a month old! This is an important PR and someone needs to carry this across the line. If @mistercrunch can get to this today, I will merge it, else I'd like to see if there are other takers as he seems crunched for time! BTW, as part of testing release candidates for 1.8, we will need to run setup.py. I have had to stop using setup.py because it's non-deterministic and has created a ton of problems for us - so much so that we are just pip installing releases now in prod and staging envs. Before we can test 1.8 rc.x, we will need this PR to get across the line. |
From http://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions
I tend to agree but have read other articles that claim the opposite. I'm not sure what to do here. Python doesn't do nice with diamond dependencies with 2 different versions. I wish python did like npm does on that regard: building a tree where different libs can each use different versions of the same lib. |
Having done some reading and performed a bit of a survey of some other (bigish) packages, my recommendation is that we pin to a known, working state in This is based on the following findings: ipython - not pinned: https://github.com/ipython/ipython/blob/master/setup.py#L200 |
@gwax thanks for gathering the evidence! You have me convinced. |
we don't seem to have a |
-1000 You can pin minimum-requirements in setup.py, BTW major-version changes do not necessary imply breaking-compatibility, and minor-version do not necessary imply the opposite. It should in theory, but in practice it doesn't. Setup.py should just list its dependencies, Packages listed in setup.py should never be pinned. Pinning requirements in setup.py immediately creates a dependency-hell in larger projects, |
@woutervh Please look at the current |
You only need to pin minimum-versions if you know (not assume or think) that your application is not going to work with older versions of that dependency, AND you know you are not going to support those older versions (which makes perfect sense) Similar with pinning maximum-versions. for example: Is airflow broken when using lxml 4.0>? We were assuming it would be broken, If airflow would be broken, users should file a bugreport here. When I run pipdeptree in our stack, we get these warnings about potential conflicts, but airflow runs fine:
|
Agreed, for Apache Superset we recently changed our On top of that, we provide an optional |
Yep, I think that's the way forward. This is how dependencies are handled in our Big Data Environment as well. Don't pin unless absolutely necessary, this way we can test with the latest, and maybe package the pinned versions with a tried and tested requirements.txt in the release. |
Without the I think that there can be some challenges around |
Sure, but then we get early warning that things will break, once we see the issue, we can pin and people can rebase. Java install broke recently, and that's how it was handled, I think that's fine. |
Pinning all dependencies in setup.py will help having more homogenous
installs across environments. The current approach of allowing version
ranges based on version notation is risky at best and results in bad
surprises.
The downside is that we won't get positive bug fixes and security fixes
for free, but these come at a high cost where a single bad version of a
package may take your Airflow environment down, without warning.