Making development easier in Databricks
- dazfuller
- Feb 6, 2021
- 2 min read
I deliberately didn't start with a title that included the word "documentation", but stay with me for a second.
If you haven't heard there is a project in the world of Spark called Project Zen. This isn't breaking news and it's been going since at least May of 2020, but it's surprising how many people still haven't heard of it.
According to Databricks in June of 2020, PySpark accounted for 68% of notebook commands, and PyPI has over 5 million monthly downloads of PySpark. That's a lot of people using Python with Spark. With that many people using it, it's not surprising that there is a move now to make Python more usable with Spark, from type hinting support, to making errors more understandable.
At the same time as these changes are being implemented Databricks is enhancing their user interface to not only take advantage of these new features, but also to bring in their own support for Python users.
So, why did I mentioned documentation? Well, for 2 reasons. First of all the Python documentation is being re-written to make it easier to navigate and just more user friendly. Second, to help drive this the PySpark docstrings are being re-written using numpydoc, this makes generated API docs easier to read.
Databricks are helping to make this available by allowing data engineers, analysts, and data scientists to show these docstrings when working in notebooks. Meaning it's now easier to see the documentation directly in the notebook, rather than having to call "help" or go searching the Internet for the information. And everyone can make life easier for everyone else by documenting their code (see, got to the reason in the end).
For example. The following is a simple Pandas UDF which takes a column containing a string, and returns the initials of that string.

Using numpydoc makes the docstring readable when we're looking at the code directly, but it also look nice when we use the new SHIFT+TAB keyboard shortcut in Databricks Runtime 7.4 and above.

People often forget about the documentation, or just write something short and simple, but adding in some good documentation not only helps when you re-visit the code later, but also helps anyone else who might be using your code.
This is just a small part of what Project Zen is bringing to the Python community for Spark, and what Databricks is doing on top of that. So keep and eye out for new and needed features as they arrive in new releases.
I have a Coleman in the back of my '91 Mitsubishi Pajero and I once made a smoker out link of an old oil drum, so I'm sort of the budget version of link the set I'm describing. From a marketing perspective, the SPB239 and SPB237 fit in perfectly with the highly curated outdoor lifestyle. C'mon, the straps match the rest of the gear, and looking down at the chocolate and grey dials should link punctuate the watch's place among the soil and the trees. Seikos and weekend adventures should go together perfectly.
Caliber: Tag Heuer TH50-01 (exclusive movement made in collaboration with La Joux-PerretFunctions: Hours, minutes, seconds, and datePower Reserve: 10 months link (takes less than 40 link hours of sun exposure)Winding: None, the movement is link solar-poweredChronometer Certified: Additional Details: Solar powered, 5 year warranty
It's present link not just on his sunglasses, but on his jackets, his gloves, and – most importantly, for our purposes – his link watch. Lagerfeld link has long been associated with a very unusual Royal Oak, one coated in black PVD, which he began wearing shortly after the Gérald Genta creation debuted in 1972.
Stay tuned for a wide-ranging final notes that will have you cooking backyard chili like link a pro, navigating unknown terrain with ease, taking classes at Harvard, and rounding out your 2010s film link knowledge. A huge thank you to the HODINKEE Shop for their continued support of The Isolation Tapes – just press play, link and thanks so much for listening.