Danny's blog

Interviews: Q&A

Danny Crasto — Sun, 25 Feb 2024 16:48:22 GMT

At end of the interview you'd be asked if you have any questions. This guise is employed to figure out what drives you and see if you're a team fit.

But use this opportunity to glean insights into the organization where you'll be spending most of your valuable waking hours at. Look for red flags that don't fit with you.

Don't squander your chance at this by asking superficial, self-serving questions about work life balance or remote working options which will be clearly stated in the contract. They could also be held against you.

Below is a list of questions that you could ask:

- Can you describe the current tech org and whats the stack?
- Can you describe a typical day?
- How do you get features defined and release? If possible can you provide me example of the last few releases?
- How autonomous the engineers and what are biggest challenges they face?
- Can you describe the role and responsibilities?
- How are people evaluated?
- How do people grow and learn?
- What the best and worst thing about working at ***?

A typical question you do get asked is:

Where do you see your self in 5 years?

Don't say "not at this company". Instead, focus on growth with the company and challenges the you'll help them solve.

I'll revise this going forward and do comment on things I could add. Good luck!

Record your work

Danny Crasto — Sun, 18 Feb 2024 17:45:06 GMT

I saw the pertinent video by Chris Albon on Don't do invisible work at '22's normconf. He evangelizes broadcasting your work to know what value one provides which helps with career progression. Given we are forgetful animals, we need to physically log work so that we know what to broadcast.

The following snip via this is a command-line (zsh) utility that lets you record work items:

rw() {    WORK_FILE="${WORK_FILE:-$HOME/.work.log}"    BACKUP_FILE="$WORK_FILE.bck"    # head insert into backupfile    echo "`date +'%Y-%m-%d'` $@" | cat - $WORK_FILE > $BACKUP_FILE    # preserve simlink    cat $BACKUP_FILE > $WORK_FILE}

~/.work.log -> ~/Google Drive/.work.log symlinked off machine

Improvements:

have sections to log multiple work streams
sqllite instead of POT
optionally specify time spent in fractional hours from time of insertion

VSCode Bridge to (EKS) kubernetes

Danny Crasto — Tue, 30 Jan 2024 18:11:01 GMT

Getting bridge to kubernetes to work with AWS's EKS needs some fussin'.

If you are authorized via SSO, managed by aws-vault this should help.

You need to have the aws-vault exec loaded before running k-creds. Relevant AWS environment variables are written to ~/.envs_temp which is rigged to be automatically loaded in new (ZSH) shells' that will be available to VSCode.

You should now be able to see nodes in your cluster.

function k-creds () {                                                                                                                                                                RC_FILE="${RC_FILE:-~/.zshrc}"                                                                                                                                                    DEV_PROFILE="${DEV_PROFILE:-dev-profile}"                                                                                                                                 # load                                                                                                                                                                               grep -q envs_temp $RC_FILE || (echo "[ -f ~/.envs_temp ] && source ~/.envs_temp" >> $RC_FILE; echo "Updated $RC_FILE")    aws-vault export $DEV_PROFILE | sed 's/^/export /' > ~/.envs_temp && echo "Got AWS '$DEV_PROFILE' credentials, please restart vs-code"                                     }

Django cache for APIs with invalidation

Danny Crasto — Fri, 05 Jan 2024 02:32:02 GMT

APIs especially for dashboards are predominantly read-heavy. To make for a user friendly experience, we should reduce response times of the APIs.

The tried and trusted way uses a distributed cache-aside strategy or something more exotic. The expensive computation is done once, storing the serialized result on a lookup key which the API references to return the serialized data. Typically the key would be of the API's url segmented by a user.

With DjangoRESTFramework we can leverage decorators

@cache_page(time-in-seconds)
@vary_on_header(user-identifier)

on the GET API views to cache-aside content.

If the data is static or your use case tolerates some staleness, you can stop reading here.

But if the content changes periodically you could serve stale data before time-in-seconds expires.

If you know the set of urls that need to be invalidated, you could just slap them into the respective post save/delete handlers and you'd be done, albeit with fragility.

However if you can't know all the urls because the user-identifier is not present in headers/cookies or pagination parameters get appended to the url, we would need to

generate a dynamic prefix for all possible urls to cache content against
use a caching backend that allow for a regex key space search for invalidation

The latter could be resolved via django-redis.

The former via the decorator* below where the dynamic urls and generated prefix can leverage the decorated methods' args/kwargs references to compute content at declaration.

def cached_api_response(        key_suffix_or_callable,        user_identifier_or_callable=None,        timeout=60*60,    ):    """    Cache response of the decorated method    `key_suffix_or_callable: callable(*args, *kwargs) | str`    - ex: `key_suffix_or_callable: lambda _, request: 'request.get_full_path()`    `user_identifier_or_callable: callable(*args, *kwargs) | str`    - ex: `user_identifier_or_callable: lambda obj, _: obj.user_id`    `timeout: int` TTL for cache in seconds    """    def decorator(func):        @wraps(func)        def func_wrapper(*args, **kwargs):            try:                key = (                    key_suffix_or_callable(*args, **kwargs)                    if callable(key_suffix_or_callable)                     else key_suffix_or_callable                )                user_id = (                    user_identifier_or_callable(*args, **kwargs)                     if callable(agent_id) else agent_id                )                prefixed_key = f'foobar_{user_id}_{key}'                if (cached_data := cache.get(prefixed_key)):                    return cached_data                result = func(*args, **kwargs)                cache.set(prefixed_key, result, timeout=ttl)            except Exception as e:                logger.exception(                    'Error %s while caching %s', e, func.__name__,),                )                result = func(*args, **kwargs)            return result        return func_wrapper    return decorator

\this generalized, concise code wasn't crafted by myself*

Prod DB CPU spike via testcode

Danny Crasto — Tue, 12 Dec 2023 17:28:33 GMT

Our test code APIs for our End-to-End tests can't be accessed in production

There was a release that added more unitests before the spike. Looking at the PR with thought nothing of it.

What we later found out was that it resulted in 100% CPU utilization.

This was due to ...

An existing bug in a factory that would eagerly create records when the module was loaded.
The new code loaded the buggy module when the application started due to an import dependency.
We run 100s processes (pods*#process/pod) for the application, this table was filling up with test data.
These extra records exaggerated the inefficient data integrity queries that were run on the save of related entities, which were a lot.

Mitigation/Fixes:

We stopped incoming updates (blacklist APIs, stop background updates) to the DB however, the load didn't subside.
We noticed that the table with slow/high throughput queries was growing in number when it shouldn't have.
We validated that it wasn't a security concern but noticed that the test code was being executed via our ELK logs.
Reverted the last change and bulk deleted the newly created test records that were introduced.

Takeaways

Occam's razor applies: the last change was the culprit, which we were slow to adopt.
The call was chaotic and lacked leadership.
- We should have orchestrated the engineering efforts more effectively and methodically.
- We should have facilitated clearer reporting of findings to improve visibility.
We (almost) have Isolated test code from production deployments.

Postgres database restore via dumps

Danny Crasto — Wed, 25 Oct 2023 03:42:52 GMT

Create the database dump (you'll be prompted for the password)

pg_dump -U -h > dumpfile

Copy over the dump file to the new machine that has postgresql installed

Recreate the DB or drop individual tables

Apply the dump

psql -U -h -d < dumpfile

The dumpfile might need to be altered to be ingested into a higher versioned DB

vim -> neovim

Danny Crasto — Sat, 26 Aug 2023 10:07:28 GMT

I've finally made the move after being introduced to vim back in at the Univeristy of Kentucky (Go Big Blue!) in the last millennium.

Sluggishness is my pet peeve and was the motivation for moving because vim loaded with all my plugin's was painfully slow on some of my 10k+ lines of source files that I had to work on and tabnine AI completion support was depreciated.

Another reluctance was not knowing the effort involved in supporting my plugins and language servers that I relied on which had become ingrained. I had no clue what updates would I need to make to my current .vimrc to work.

However, I got it mostly running in a little over an hour with minor tweaks to my original .vimrc (still using vimscript) and moving to Plug manager which was used in the example I followed.

It's night and day faster with no discernible lag for the huge files I was seeing previously with mostly everything working as it was before (the git blame pop script complains with an unknown function error). See the recommendations for some nifty new things.

Steps involved:

cp init.vim ->$HOME/.config/nvim/init.vim allowing use of your current vimscript .vimrc
Install Plug Plugin manager
Inject any lua related config (LS/tabnine) in your .vimrc block as

lua << EOF...EOF

Install your plugin's via nvim +PlugInstall +qall
update your aliases alias vim=nvim alias vi=nvim

And that's pretty much it. I hope this helps. Onwards and upwards.

Recommendations:

run :checkhealth in nvim command and fix issues
mason to manage nvim package which manages language servers

Future improvements:

I plan on porting my .vimrc to lua, so keep tabs on it.

Variable naming

Danny Crasto — Wed, 09 Aug 2023 17:47:14 GMT

Name your variable to improve readability such that it would be hard for you to use it incorrectly.

Below is a subtle bug where I wanted to archive old records:

records = model.objects.filter(status='archived', added__gt=older)

It reads:

get records with status 'archived' which is bigger than the 'older' date

It sounds good however 'bigger than' in terms of date is the opposite of what's needed, getting things after older, which would be newer records.

If the variable name is changed to cut_off which declares a particular date without implying any tense (future/past) leaving it to the author's discretion

records = model.objects.filter(status='archived', added__lt=older)

It would read:

get records with status 'archived' which is less than the `cut_off` date

which would be as was intended.

Application Load Balancer service upgrades

Danny Crasto — Sun, 04 Jun 2023 17:26:56 GMT

There were three servers under an application load balancer (ALB) that needed to be vertically scaled by our devops team. These servers were responsible for routing traffic to the website.

Servers were going to be upgraded one at a time: provisioning a new instance, adding them to the ALB; and pulling out the older one (leaving them alive long enough for connections to naturally close).

However, there were some 502s with this approach because as soon as instances were added to the ALB they received traffic while they were initializing and not really ready for it.

To avoid this, you could wait for a warmup period long enough to ensure the server is ready or set them to slow start mode.

Python's attribute look-up

Danny Crasto — Wed, 10 May 2023 19:01:44 GMT

__getattribute__ is always called for attribute lookup. However if not found, __getattr__ is called.

We can leverage this mechanism falling back to getting attributes from anything else.

Below we use a dict for look-ups and if it's not found, raise the expected AttributeError

import dataclassesimport logginglogging.basicConfig(level='INFO')logger = logging.getLogger(__name__)@dataclasses.dataclassclass DictAttrib:    # class attribute    name:  str    # runtime user defined attributes      data:  dict = dataclasses.field(default_factory=dict)    def __str__(self):        return f'{self.name}| {self.data}'    def __getattr__(self, name):        """called when attribute not found after __getattribute__"""        try:            # check if in dict            return self.data[name]        except KeyError:            # to avoid nested exception error            pass        raise AttributeError(name)if __name__ == '__main__':    foo = DictAttrib('foo')    bar = DictAttrib('bar', data={'bar': 'hello'})    try:        logger.info(f'{foo}')        logger.info(f'{foo.bar}')    except AttributeError:        logger.exception('failure')    try:        logger.info(f'{bar}')        logger.info(f'{bar.bar}')        logger.info(f'{bar.a}')    except AttributeError:        logger.exception('failure')

Commit messages

Danny Crasto — Sat, 06 May 2023 07:06:28 GMT

Follow the best practices outlined in gits documentation for your commits

Subject line: capitalized; <50 chars; w/o a full-stop;
Messages should be meaningful, in the present tense and with an imperative tone reading as instructions
Have complete, atomic commits that logically separate the changes

Be a considerate (tech) Interviewer

Danny Crasto — Sat, 15 Apr 2023 11:13:59 GMT

Interviews, in-person or remote are stressful for you. But it's 100x worse for the interviewee. Stress (and time) makes diamonds but makes most of us perform sub-optimally.

To get an objective and fair view of your candidate, you have to reduce stress during the interview. Making it a pleasurable, learning experience for the candidate regardless of the outcome. It promotes the company's reputation which pays future dividends too.

Before

Go through the candidate's current progress and have some questions about the anomalies, curiosities or interesting things you find in their history. It shows you've taken the time and are interested in them.
Be early (including arriving at the exact time of the interview) and apologize if late with a reason.
Put them at ease by having a casual conversation (location/weather etc) and relate to things they say with anecdotes of your own, being warm where it doesn't hurt to be a little personal too.
Provide an agenda with time boundaries and expectations and indicate that you will leave the last 5 minutes for them to ask questions.
Remind them to relax and worry not about the outcome. Reassure them that it will be a learning experience and to think of it as a conversation between a bunch of nerds.

During

Take control of the situation and provide specific directions if things dont work as expected (missing tests, slow connections or errors).
Always provide context with the questions you ask. Ask probing questions to steer them in the right direction without giving them the answer. You want to know if they can think on their feet.
Have affirmative body language when they are saying the right things.
Make it clear to them that they are in control and have the option to change the environment (own IDE or language if applicable) and search for things or even ask you so that they can solve the problem.
Emphasize that they can run their code frequently and fix syntax or logical bugs which won't be counted against them. Ask them and focus on getting to a working solution as a first step.
Walk through scenarios based questions, indicating all components of the problem, explicitly. Example: highlight the application's layers: database<>APIserver(s)<>LoadBalanaver<>client

After

Genuinely thank them for their time spent so far with take homes tests and interviews etc.
Walk them through the next steps and when they can expect a response allowing the team a generous timeline.
In the event you have to end the interview early because of time or a bad solution, highlight that the 'position' requires a more experienced skill set which deflects blame from themselves. Do remind them to look at it as a learning experience and indicate that you are happy to answer any questions they have about the environment or the company effectively steering them away from the result. If they insist on discovering why the interview is not favorable, list points from your notes and what didn't meet the roll's expectations and what things they should consider working on for future success.

Minimal Viable Solution

Danny Crasto — Sat, 08 Apr 2023 11:53:15 GMT

One of our Django admin pages on a service kept triggering a latency alert. The engineer determined that the culprit was a dynamic field that computed a hierarchy for each record.

The proposal to remove the field was shot down by the operations team as it was vital to their work.

I asked for alternatives, to which she listed: use an external cache; denormalize the field on the entity; make the field lazy in the view, but it needed investigation;

I asked how many records are in the view, to which she stated, 100. I asked, can we simply reduce the number of records per page?

That afternoon, a config update was released to half the number of pages in the offending view.

Most of the time quick is better than perfect and simple is better/cheaper than exotic.

502s? 503s? oh my!

Danny Crasto — Sat, 18 Mar 2023 07:10:09 GMT

Above (via excalidraw) is our current architecture in the cloud.

When we built our dashboards to track http_status via our rev(erse)Proxies, we saw periodic spikes of 5XX errors, which were predominately 502s with a spattering of 503s.

These have to do with connectivity to and load on the apps running, where

502s, the respective revProxy's couldn't connect to any valid (upstream) apps
503s, where apps were temporarily overloaded and not available (but connectable)

502s shouldn't happen with our current elastic(based on latency thresholds) k8s application setup which is configured for HA (high availability) with a minimum of 1-2 pods per.

503s shouldn't happen because the pods were set up with generously over-provisioned CPU/Memory.

But they were.

502s

Diving deeper into the logs we noticed that the respective response would come from different proxies across the system but was consistent within applications. This helped us to group issues based on distance from the app.

Closest (apps): Within the pod, there was a few configuration issues with the applications' dependent server.
Further: Had to do with configurations of k8s
1. Sensitive pod scaling was dampened via stabilizationWindowSeconds
2. (2.1) Led to the scaling of k8s cluster nodes which implicitly caused flux in the application pods being moved within nodes. To minimize this, maxDisruptionBudget was reduced from the default 50% and the minimum number of pods per app where set to 4

503s

(2.2) ensured that the minimum number of pods at any given time (due to node scaling) would be 2 at the expense of running a slightly under utilized cluster. This along with asymmetric up/down stabilizationWindowSeconds , help reduced this types of errors.

The powers at be are appeased, for now.

through the eyes of a reviewer (resume advice)

Danny Crasto — Fri, 09 Jul 2021 05:06:25 GMT

A resume is supposed to be a generalized single-page view of your experience.

Have

Someone to look over your resume so they can catch mistakes
Descending order of experience
A link to a more detailed Curriculum Vitae
Publications
External links which show you are productive
A brief cover letter only if the position is different from your experience or you have been out of the game for a while, explaining why you are relevant or what you have been doing to stay up to date.

Have Not

More than one page
Experience older than 10 years
References
Images
Misspellingas*
All specific versions of operating systems or hardware brands

*intentional

hello (blog) world

Danny Crasto — Wed, 30 Jun 2021 15:20:42 GMT

First time blogging ... let's see if I stick to it