• 0 Posts
  • 140 Comments
Joined 2 years ago
cake
Cake day: June 17th, 2023

help-circle
  • I find most people don’t create good unit tests. They create them too small which results in them being fragile to change or near useless. They end up being a tray for future you not a love letter.

    The number of times I have refactored some code and broken a whole bunch of unit tests is unreal. These types of tests are less then useless. If you cannot refactor code then what is the point in a unit test? I don’t need to know that if I don’t change the code under test it doesn’t break… I need to know that when I change code my assumptions still hold true.

    We should be testing modules as a whole using their external API (that is external to the module, not necessarily the project as a whole), not individual functions. Most people seem to call these integration tests though…





  • Yes. They can. But they do not mix well with required checks. From githubs own documentation:

    If a workflow is skipped due to path filtering, branch filtering or a commit message, then checks associated with that workflow will remain in a “Pending” state. A pull request that requires those checks to be successful will be blocked from merging.

    If, however, a job within a workflow is skipped due to a conditional, it will report its status as “Success”. For more information, see Using conditions to control job execution.

    So even with github actions you cannot mix a required check and path/branch or any filtering on a workflow as the jobs will hang forever and you will never be able to merge the branch in. You can do either or, but not both at once and for larger complex projects you tend to want to do both. But instead you need complex complex workflows or workflows that always start and instead do internal checks to detect if they need to actually run or not. And this is with github actions - it is worst for external CICD tooling.


  • If you have folderA and folderB each with their own set of tests. You don’t need folderAs tests to run with a change to folderB. Most CI/CD systems can do this easily enough with two different reports. But you cannot mark them both as required as they both wont always run. Instead you need a complicated fan out pipelines in your CICD system so you can only have one report back to GH or you need to always spawn a job for both folders and have the ones that dont need to run return successful. Neither of these is very good and becomes very complex when you are working with large monorepos.

    It would be much better if the CICD system that knows which pipelines it needs to run for a given PR could tell GH about which tests are required for a particular PR and if you could configure GH to wait for that report from the CICD system. Or at the very least if the auto-merge was blocked for any failed checks and the manual merge button was only blocked on required checks.




  • We have a few non-required checks here and there - mostly as you need an admin to list a check as required and that can be annoying to do. And we still get code merged in occasionally that fails those checks. Hell, I have merged in code that fails the checks. Sometimes checks take a while to run, and there is this nice merge when ready button in GH. But it will gladly merge your code in once all the required checks have passed ignoring any non-required checks.

    And it is such a useful button to have, especially in a large codebase with lots of developers - just merge in the code when it is ready and avoid forgetting about things for a few hours and possibly having to rebase and run all the checks again because of some minor merge conflict…

    But GH required checks are just broken for large code bases as well. We don’t always want to run every check on every code change. We don’t need to run all unit tests when only a documentation has changed. But required checks are all or nothing. They need to return something or else you cannot merge at all (though this might apply to external checks more then gh actions maybe). I really wish there was a require all checks to pass and a at least one check must run. Or if external checks could tell GH when they are required or not. Either way there is a lot of room for improvement on the GH PR checks.




  • Never said it had to be a text file. There are many binary serialization formats that could be used. But is a lot of situations the overhead you save is not worth the debugging effort of working with binary data. For something like this that is likely not going to be more then a GB or so, probably much less it really does not matter that much if you use binary or text formats. This is an export format that will likely just have one batch processing layer on. This type of thing is generally easiest for more people to work with in a plain text format. If you really need efficient querying of the data then it is trivial and quick to load it into a DB of your choice rather then being stuck with sqlite.


  • export tracking data to analyze later on

    That is essentially log data or essentially equivalent. Log data does not have to be human readable, it is just a series of events that happen over time. Most log data, even what you would think of as traditional messages from a program, is not parsed by humans manually but analyzed by code later on. It is really not that hard to slow to process log data line by line. I have done this with TB of data before which does require a lot more effort to do. A simple file like this would take seconds to process at most, even if you were not very efficient about it. I also never said it needed to be stored as text, just a simple file is enough - no need for a full database. That file could be binary if you really need it to be but text serialization would also be good enough. Most of the web world is processed via text serialization.

    The biggest problem with yaml like in OP is the need to decode the whole file at once since it is a single list. Line by line processing would be a lot easier to work with. But even then if it is only a few 100 MBs loading it all in memory once and analyzing it all in memory would not take long at all - it just does not scale very well.