Version control is an essential tool for any software developer. Hence, any respectable data scientist has to make sure his/her analysis programs and machine learning pipelines are reproducible and maintainable through version control.
Often, we use git for version control. If you don’t know what git is yet, I advise you begin here. If you work in R, start here and here. If you work in Python, start here.
This blog is intended for those already familiar working with git, but who want to learn how to write better, more informative git commit messages. Actually, this blog is just a summary fragment of this original blog by Chris Beams, which I thought deserved a wider audience.
Summarize changes in around 50 characters or less
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as the
subject of the commit and the rest of the text as the body. The
blank line separating the summary from the body is critical (unless
you omit the body entirely); various tools like `log`, `shortlog`
and `rebase` can get confused if you run the two together.
Explain the problem that this commit is solving. Focus on why you
are making this change as opposed to how (the code explains that).
Are there side effects or other unintuitive consequences of this
change? Here's the place to explain them.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, preceded
by a single space, with blank lines in between, but conventions
If you use an issue tracker, put references to them at the bottom,
See also: #456, #789
If you’re having a hard time summarizing your commits in a single line or message, you might be committing too many changes at once. Instead, you should try to aim for what’s called atomic commits.
I stumbled across this TED Ed YouTube playlist called Think Like A Coder. It’s an amusing 10-episode video introduction for those new to programming and coding.
The series follows Ethic, a girl who wakes up in a prison, struck by amnesia, and thus without a clue how she got there. She meets Hedge, a robot she can program to help her escape and, later, save the world. However, she needs to learn how to code the Hedge’s instructions, and write efficient computer programs. Ethic and Hedge embark on a quest to collect three artifacts and must solve their way through a series of programming puzzles.
Episode 1 covers loops.
The adventure begins!
Episode 1: Ethic awakens in a mysterious cell. Can she and robot Hedge solve the programming puzzles blocking their escape?
Regular expression (also abbreviated to regex) really is a powertool any programmer should know. It was and is one of the things I most liked learning, as it provides you with immediate, godlike powers that can speed up your (data science) workflow tenfold.
I’ve covered many regex related topics on this blog already, but thought I’d combine them and others in a nice curated overview — for myself, and for you of course, to use.
If you have any materials you liked, but are missing, please let me know!