Thursday, January 7, 2016

Checking "Star Wars - The Force Awakens" tickets availability with Azure WebJobs, scriptcs and SendGrid

This user story is quite simple: there is a guy (me) who likes Star Wars. This guy wants to buy the best tickets available in an IMAX Cinema. The premiere was not so long ago, so a day after the showtimes are updated, the best seats are booked. This guy (also me) is quite lazy, so he doesn't like to check the showtimes manually.

Hm... let's do it like pro developers using cutting-edge technologies!

How the booking system works?

There is this whole UI for selecting seats and so on, however there is one interesting request which I can use to check showtimes. It look like this:
POST http://www.cinema-city.pl/scheduleInfoRows HTTP/1.1
Host: www.cinema-city.pl
Connection: keep-alive
Content-Length: 52
Accept: */*
Origin: http://www.cinema-city.pl
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: http://www.cinema-city.pl/imax
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: bla bla bla

locationId=1010304&date=09%2F01%2F2016&venueTypeId=2

When there is a picture show on that date it returns an HTML table with links for booking, otherwise an empty HTML table.

scriptcs to make the job done

I've written a simple scriptcs which will make a POST with appropriate headers and check if a HTML link opening tag is in the response. If that's the case, I send an email using fresh, free SendGrid account.
using System.Net;
using System.Net.Mail;
using SendGrid;

public void SendMeEmail()
{
 var myMail = new SendGridMessage();
 myMail.From = new MailAddress("Yoda@gmail.com");
 myMail.AddTo("me@gmail.com");
 myMail.Subject = "StarWars tickets are available!!";
 myMail.Text = "Go to CinemaCity IMAX to book them.";

 var credentials = new NetworkCredential("user-sendgrid@azure.com", "sendgrid-password");
 var transportWeb = new Web(credentials);
 transportWeb.DeliverAsync(myMail).Wait();
}

var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("Host", "www.cinema-city.pl");
httpClient.DefaultRequestHeaders.Add("X-Requested-With", "XMLHttpRequest");
httpClient.DefaultRequestHeaders.Add("Referer", "http://www.cinema-city.pl/imax");
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36");

var content = new StringContent(@"locationId=1010304&date=09%2F01%2F2016&venueTypeId=2", Encoding.UTF8, @"application/x-www-form-urlencoded");
var response = httpClient.PostAsync("http://www.cinema-city.pl/scheduleInfoRows", content).Result;
var responseAsString = response.Content.ReadAsStringAsync().Result;

var isMovieAvailable = responseAsString.Contains("<a");
if(isMovieAvailable)
{
 Console.WriteLine("Movie is available, sending email");
 SendMeEmail();
 Console.WriteLine("Movie is available, email sent");
}
else
{
 Console.WriteLine("Movie is not available.");
}

Evironment setup is quite simple:
  • create a new SendGrid account
  • download scriptcs as zip (link) and unzip it to a folder StarWarsCheck
  • save the code as checkmovie.csx to folder StarWarsCheck
  • update checkmovie.csx with your SendGrid credentials
  • add reference to Sendgrid dll's by invoking scriptcs.exe -Install Sendgrid
  • now you can run the script locally. In a console write: scriptcs.exe checkmovie.csx
If you are interested on other options how one can prepare a standalone, portable scriptcs scripts - check my question on StackOverflow: How to run scriptcs without installation? Make portable/standalone scriptcs (csx)

Note: of course, once showtimes are updated, I will get email every hour. But that's good, isn't it? There is a chance I won't miss it.

Create an Azure WebJob to run scriptcs file every hour

You can use Azure WebJobs by simply uploading a .zip file with the job and configuring how it should be scheduled in Azure. The job entry point is based on naming convention. According to the documentation, the recommended script file to have in your job directory is: run.cmd. Therefore, my run.cmd look like this:
call scriptcs.exe checkmovie.csx

Next, pack whole StarWarsCheck folder as a zip file and upload it as Azure WebJob. Instructions are here.

Effects

It started with...

But then...

The email arrived without problems:

However, most importantly, I've booked the best seats for my friends & me:

Summary

It was fun, easy and profitable to play with Azure WebJobs and scriptcs. I liked the scriptcs sleekness and Azure WebJobs simplicity. For sure I'll use them for something else in the future.

Sunday, January 3, 2016

Merging multiple git repositories into one and purging sensitive data

Git is a very powerful, distributed version control system. It's based on simple concept - directed graph without cycles (a tree) pointing four types of objects in it's database. I love git and it's brilliant design. Therefore, when I saw how misused it was in a company which I joined, I've had to fix it.

The state before

Due to multiple factors, there were around 4 repositories, which had to be cloned in one directory. Each repository was using or was being used by another repository. In other words, projects in VisualStudio were having dependencies on another projects, or worse - on compiled dlls in another repositories. Therefore, sometimes, one change required 4 commits (including projects rebuild and adding compiled dlls to the commit). During one month, around 2 man-days were lost for checking in/out changes from multiple repositories and for false (or true...) alarms that somebody forgot to check in/out something from the repositories. What's more, this was only for one branch - master - because only one existed back then. However, in future, to support multiple environments or development on fine-grained features/stories, multiply those problems by the number of branches and the number of new developers, at least. As always in IT, there wasn't much time, so setting up internal company NuGet Server wasn't the best thing to do. It isn't that it takes a long time to setup NuGet Server, but training all developers requires a great amount of time. Instead, I've decided to create one repository.
The state before was like this:
\Repo1
  \src
    \project1
      project1.csproj with dll reference to project 2
    \ExternalDlls
      project2.dll
    Solution1.sln
\Repo2
  \project2
    project2.csproj with project reference to project 3
  \project3
    project3.csproj with project reference to project 4
  Solution2.sln
\Repo3
  \project4
    project4.csproj
  Solution3.sln

One to rule them all

In those 4 repositories passwords or MachineKey's to production environment were stored in plain text. Therefore I've decided to create a new repository. Side note: remember, passwords pushed to git repository always there will be, Yoda said. Therefore the new repository will have entirely rewritten history (removed passwords). Naturally, all branches (masters in this case) from all repositories with their's history must be included in the new repository. It will look like this:

new repo HEAD
|
M
|  \
M    \
| \    \
x' \     \
|   y'    z'
x'  |     |
|   y'    z'
x'  |     |
.   y'    z'
.   .     .
.   .     .
    .     .
 
Legend:
x' - commits from repository 1 with removed sensitive data
y' - commits from repository 2 with removed sensitive data
z' - commits from repository 3 with removed sensitive data
M  - merges in the new repository
new repo HEAD - the brand new, future repo HEAD (master)

Migration scripts

Migration must be done in "atomic" way, well at least it must be seen from developers perspective as atomic operation - they commit to the old repos and from some point in time they commit to the new repo (note: stashes will have to be discarded). Therefore, I've decided to run the migration during the weekend, when repositories are inactive. However, I don't like to work during weekends, so I wrote a script or two to automate the majority of the work. The git filter-branch command which I will be using is painfully slow, so additionally I've used powerful Amazon EC2 instance to make things a little faster.

Step 1 - fetch all repos and form a nice repository structure

Look that in the state before not all repos have had the code in src folder. To fix it, I'll use git filter-branch command to entirely rewrite the history. Each commit in history, blame etc. will look like it was committed to the right, src folder. Additionally, I've seen that someone was committing Packages folder to the git (possibly due to poor .gitignore file), so now it's a chance to remove that bloat permanently. Here is the bash script. Save it as mergerepos.sh and run it from git bash console like normal linux script (./mergerepos.sh):
#!/bin/bash 
FinalRepo="main" 
echo $FinalRepo

mkdir $FinalRepo
cd $FinalRepo
git init

touch tmp
git add -A
git commit -m 'merge all repositories'


declare -a reponames=("repo1" "repo2" "repo3")
declare -a repourls=("https://user@bitbucket.org/Company/repo1.git" "https://user@bitbucket.org/Company/repo2.git" "https://user@bitbucket.org/Company/repo3.git")
numberofrepos=${#reponames[@]}

function rewriterepo {
 git checkout $1/master
 git checkout -b "$1master"
 git filter-branch -f --tree-filter 'rm -rf packages
 mkdir "src"
 rm -rf src/packages
 ls -A | grep -v ^[Ss]rc | grep -v \.git | while read filename
 do
 mv "$filename" "src/"
 done' HEAD
}

for (( i=0; i<${numberofrepos}; i++ ));
do
  echo $i " -> " ${reponames[$i]} $(date) "-" ${repourls[$i]} " STARTED"
  git remote add ${reponames[$i]} ${repourls[$i]}
  git fetch ${reponames[$i]}
  rewriterepo ${reponames[$i]}
  git remote rm ${reponames[$i]}
  echo $i " -> " ${reponames[$i]} $(date) "-" ${repourls[$i]} " FINISHED"
done
The script will:
  • set up a new repository
  • make a dummy commit
  • go through the list of given repositories and for each do
    • add it as remote, fetch it, checkout it to repoXmaster branch
    • clean each commit as follows
      • create src folder, remove src/packages folder
      • move each file/directory from the root, except of src and git folder to src folder
    • remove added remote
So far so good.

Step 2 - merge all branches (repositores)

As I have all repositories in the right structure and in our one "chosen" repository, merging them is just a normal merge operation.

Step 3 - delete sensitive data (passwords etc)

This can be done by painfully slow git filter-branch or... fast and easy to use BFG Repo Cleaner. 1Check the project website, it's self explanatory. 

Step 4 - add a nice, root .gitignore

All my work of removing redundant Packages folder can be destroyed by a single commit. Therefore I've merged all existing .gitignore files and added those rules to well known github/gitignore for VS file.

Further steps

I have now one repository with the right structure and good history. Further steps?
Taking the chance, I've introduced one solution for all projects in the new VS 2015, migrated to Automatic NuGet package restore (check all those scripts - one also fixes project hint paths), changed all dll references to project references and upgraded projects to the new VS version. This is how I've done the csproj update:
$listOfBadStuff = @(
 "<Project DefaultTargets=""Build"" xmlns=""http://schemas.microsoft.com/developer/msbuild/2003"" ToolsVersion=""4.0"">",
 "<OldToolsVersion>[0-9]\.0</OldToolsVersion>",
 "<Project ToolsVersion=""12.0"""
)
$listOfGoodStuff = @(
 "<Project DefaultTargets=""Build"" xmlns=""http://schemas.microsoft.com/developer/msbuild/2003"" ToolsVersion=""14.0"">",
 "<OldToolsVersion>14.0</OldToolsVersion>",
 "<Project ToolsVersion=""14.0"""
)

ls -Recurse -include *.csproj, *.sln, *.fsproj, *.vbproj, *.wixproj |
  foreach {
    $content = cat $_.FullName | Out-String
    $origContent = $content
 For ($i=0; $i -lt $listOfBadStuff.Length; $i++) {
  $content = $content -replace $listOfBadStuff[$i], $listOfGoodStuff[$i]
    }
    if ($origContent -ne $content)
    { 
        $content | out-file -encoding "UTF8" $_.FullName
        write-host messed with $_.Name
    }      
}

Summary

It was relatively easy to get from nightmare to a reasonable repository environment. Those one or two days of merging repositories will pay off very quickly. Not mentioning the removal of sensitive data from the repository - this can be priceless.