Although, having migrated to .net Core for the most part,we all have to support our older applications using .net Framework. As a part of routine, you may tend to update the .net framework in your application only to find out that your test or even prod (yikes!) do not have the latest framework installed. 

One way to find the version is using microsoft’s documentation here:

https://docs.microsoft.com/en-us/dotnet/framework/migration-guide/how-to-determine-which-versions-are-installed

You, look at the complexity, and may just give up. I’ve found an easier place to check the same. And it positively has worked for me every time.

All you need is to browse to a folder path:

C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework.NETFramework

Here’s the structure:

And That’s it. You will see all the framework’s installed on this machine, even a minor version has a different folder as well. Easy on the eye indeed.

How to get multiple TeamCity build agents running on one server.

Found this the hard way, that there is an key thing that needs to be changed before finishing the setup as per teamcity’s official instruction.

First log in to the server where you want the agents to run then open TeamCity from a browser on that box.

Go to the Agents tab.

From the top right of the page choose Install Build Agents then MS Windows Installer.

When prompted choose to run the agent installer. You may have to be an administrator since this will be installing windows services.

Choose the directory where you want the agent configuration and working directories to live. I put them under the TeamCity home directory.

Take the defaults as you work your way through the installer.

The agent directory is being configured.

Here it is important that you choose a unique name and port. The directories should be consistent with your previous choices. You may need to change the server URL and port.

When this popup appears do not click OK yet.

Open an Explorer window and navigate the the launcher/conf directory under the build agent directory you configured above.

Edit wrapper.conf and change the name of the ntservice to match your agent name as appropriate. This is important because each agent runs under a different service and they must have unique names, otherwise only one will connect at a time.

Now you can click OK on the popup.

And choose the defaults the rest of the way through the installer.

If you open Services

You should see the service you named.

The new agent should connect and be visible from the TeamCity Agents page within a few moments.

Repeat the steps to add additional agents.


I have a love-hate relationship with ad blockers. On the one hand, I despise the obnoxious ads that are forced down our throats at what seems like every turn. On the other hand, I appreciate the need for publishers to earn a living so that I can consume their hard-earned work for free. Somewhere in the middle is a responsible approach. If they can do it in 140 characters and a link, I’m happy. That is all. No images. No video. No script. No HTML tags. No tracking.

 

Which brings me to Pi-hole. I’m going to keep the intro bits as brief as possible but, in a nutshell, Pi-hole is a little DNS server you run on a Raspberry Pi in your local network then point your router at such that every device in your home resolves DNS through the service. It then blacklists about 130k domains used for nasty stuff such that when any client on your network (PC, phone, smart TV) requests sleazy-ad-domain.com, the name just simply doesn’t resolve. Scott Helme put me onto this originally via his two excellent posts on Securing DNS across all of my devices with Pi-Hole + DNS-over-HTTPS + 1.1.1.1 and Catching and dealing with naughty devices on my home network. Go and read those because I’m deliberately not going to repeat them here. In fact, I hadn’t even planned to write anything until I saw how much difference the service actually made. More on that in a moment, the one other bit I’ll add here is that the Raspberry Pi I purchased for the setup was the Little Bird Raspberry Pi 3 Plus Complete Starter Kit:

Little Bird Raspberry Pi 3 Plus Complete Starter Kit

This just made it a super easy turnkey solution. Plus, Little Bird Electronics down here in Aus delivered it really quickly and followed up with a personal email and a “thank you” for some of the other unrelated stuff I’ve been up to lately. Nice 🙂

I went with an absolute bare bones setup which essentially involved just following the instructions on the Pi-hole site (Scott gets a bit fancier in his blog posts). I had a bit of a drama due to some dependencies and after a quick tweet for help this morning followed by a question on Discourse, I was up and running. I set my Ubiquiti network to resolve DNS through the Pi and that’s it – job done! As devices started picking up the new DNS settings, I got to see just how much difference was made. I set my desktop to manually resolve through Cloudflare’s 1.1.1.1 whilst my laptop was using the Pi-hole which made for some awesome back to back testing. Here’s what I found:

Let’s take a popular local Aussie news site, news.com.au. Here’s what it looks like with no Pi-hole:

news.com.au without pi-hole

In the grand scheme of ads on sites, not too offensive. Let’s look at it from the machine routing through the Pi-hole:

news.com.au with pi-hole

Visually, there’s not a whole lot of difference here. However, check out the network requests at the bottom of the browser before and after Pi-hole:

news.com.au network without pi-hole

news.com.au network with pi-hole.jpg

Whoa! That’s an 80% reduction in network requests and an 82% reduction in the number of bytes transferred. I’d talk about the reduction in load time too except it’s really hard to measure because as you can see from the waterfall diagrams, with no Pi-hole it just keeps going and going and, well, it all gets a bit silly.

Let’s level it up because I reckon the smuttier the publication, the bigger the Pi-hole gain. Let’s try these guys:

dailymail.co.uk with no pi hole

And for comparison, when loaded with the Pi-hole in place:

dailymail.co.uk-with-pi-hole

And now – (drum roll) – the network requests for each:

dailymail.co.uk network with no-pi-hole

dailymail.co.uk network with pi-hole

Holy shit! What – why?! I snapped the one without Pi-hole at 17.4 mins after I got sick of waiting. 2,663 requests (one of which was to Report URI, thank you very much!) and 57.6MB. To read the freakin’ news. (Incidentally, in this image more than the others you can clearly see requests to domains such as fff.dailymail.co.uk failing as the Pi-hole prevents them from resolving.)

After just a few quick tests, I was pretty blown away by the speed difference. I only fired this up at about 8am this morning and I’m just 9 hours into it but already seeing some pretty cool stats:

Pi-hole-dashboard

It’s also flagging a bunch of things I’d like to look at more, for example my wife’s laptop being way chattier than everything else:

Top clients

Top blocked clients

Wife's laptop requests

So in summary, no compromising devices, no putting your trust in the goodwill of an extension developer, no per-device effort, the bad stuff is blocked and the good stuff still works:

Sponsor message still works

Lastly, Pi-hole has a donate page and this is one of those cases where if you find it as awesome as I have already, you should absolutely show them some love. Cash in some of that time you’ve reclaimed by not waiting for rubbish ads to load 😎

Function start detection in stripped binaries is an undecidable problem. This should be no surprise. Many problems in the program analysis domain fall into this category. Add to that the numerous types of CPU architectures, compilers, programming languages, application binary interfaces (ABIs), etc. and you’re left with an interesting, multifaceted, hard problem. Accurate detection of both function starts and the low-level basic blocks is often the first step in program analysis. Performing this task accurately is critical. These foundational analysis artifacts are often the starting point for many automated analysis tools.

Being undecidable, there is no single algorithm to identify all function starts across the wide variety of program binaries. Among some of the approaches enlisted to try and solve this problem are machine learning algorithms (e.g. Byteweight and Neural Networks) which use signature based pattern recognition. At first glance the results of these learners look promising but they are often biased to their training set.

Other approaches to function detection (e.g. Nucleus and Function Interface Analysis) are more sensible. Their research is focused on control-flow, data-flow, and function interface analysis rather than signature detection. Yet, these solutions fall short of addressing the entire problem domain across architectures in a generalized way.

For example, the Nucleus approach can be improved upon since it relies on heuristics to handle indirect control flow. What’s wrong with heuristics? They must be updated constantly for the latest compiler constructs, and can frequently produce invalid results, especially in highly optimized binaries. Even today, many tools rely on heuristics. That’s a dilemma. Even though they solve a specific problem really well, they rarely generalize to solve an entire class of problems.

Enter The Ninja

Binary Ninja is an extensible reverse engineering tool that provides a generalized, architecture-agnostic approach to program analysis. Therefore, one of our goals is to develop a function detection algorithm that comprises the best techniques available while being architecture-agnostic. We therefore intentionally overlook signature matching and heuristics as primary methods to provide this capability (though do not discount them entirely).

Carving Binary Spaces With Confidence

Given the multifaceted nature of function detection in stripped binaries, it is common to apply multiple strategies over multiple analysis passes. The order of those strategies is driven by the desire to achieve positive function identification with the highest degree of confidence first, as this should reduce the scope, complexity, and possibility for mistakes in later analysis stages. This is important. Mistakes early in the binary analysis process can degrade the results of higher-level analysis later. The search space for each analysis phase is the current set of unknown regions in the binary that are not tagged as code or data. After each phase, the search space shrinks by tagging regions as code and/or data. Currently, Binary Ninja performs analysis in four distinct phases:

  • Recursive Descent
  • Call Target Analysis
  • Control Flow Graph (CFG) Analysis
  • Tail Call Analysis

After analysis is complete, the desired result is a binary that is annotated in a similar way to a binary with full symbol information, or even better.

Recursive Descent

In earlier versions of Binary Ninja, automatic function detection is limited to our recursive descent algorithm. This algorithm analyzes the control flow from the defined entry point(s) and attempts to follow all code paths. Since the algorithm is based on control flow from a well-defined entry point it is inherently proficient at differentiating between code and data. A unique ability of Binary Ninja’s recursive descent algorithm is the ability to perform multi-threaded* analysis during this phase.

Results from recursive descent are highly accurate, but there are several limitations: handling indirect control flow constructs, disconnected functions, and tail call function identification. We present our solutions to these challenges later.

* Note that multi-threaded analysis is only available in Binary Ninja Commercial

Jump Table Inference With Generic Value-Set Analysis

Many tools approach indirect dataflow resolution with heuristics. Even some of the newest binary analysis tools still rely on heuristics to detect instruction sequences and patterns in order to locate jump tables. The problem is that every time a new compiler/optimization/architecture is released, the compilers’ indirect control flow idioms could change, undercutting the effectiveness of the once-sufficient heuristic. This creates a big problem for most types of function-detection algorithms. Broken jump tables negatively impact function detection because each of the indirect jump targets can appear to be a potential function as they are valid native blocks of instructions. Depending on the architecture, well-connected basic blocks may even appear within the jump table itself. Attempting to filter out all of these types of candidate functions is not only a tedious, but difficult task.

Table Value on HoverTable Value on Hover – Figure 1

Since adding heuristics to solve a problem can be an eternal cycle of misery, Binary Ninja provides a practical implementation of generic value-set analysis. Value-set analysis is in the domain of abstract interpretation. Essentially, it is a method to interpret the semantics of low-level CPU instructions and track the possible value of variables and/or registers at each instruction point. The value represents an approximation that is path sensitive. Given this capability it becomes trivial to query the possible values at an indirect jump-site. One example is shown in Figure 1. If the possible values represent a reasonable set of jump targets, then the recursive descent algorithm can continue. This reduces the resolution of an indirect control flow instruction idiom to a simple query via our possible value set API.

Multiple IndirectSolving jump tables with multiple levels of indirection – Figure 2

Because our value set analysis occurs within our Binary Ninja Intermediate Language (BNIL) set of abstractions, it is inherently architecture agnostic. This means that when we add a new architecture to Binary Ninja all of the existing value set analysis is automatically applied by Binary Ninja’s analysis core. The result is a lifted, clean binary ready for consumption. The next several sections describe the additional analysis phases which make up our implementation of linear sweep.

CALL TARGET ANALYSIS

This analysis phase sequentially disassembles the unknown regions in the search space while aggregating all potential call targets. For our purposes, a call target is the destination of a call instruction as returned from the architecture. Once complete, the call targets are ordered by the number of cross-references in descending order and handed off to the recursive descent algorithm for further analysis.

CONTROL FLOW GRAPH ANALYSIS

This analysis pass is based on the Control Flow Graph (CFG) recovery technique discussed in the research paper titled Compiler-Agnostic Function Detection in Binaries (i.e. Nucleus). A key insight from this paper is that compilers typically use different control-flow constructs for interprocedural control-flow versus intraprocedural control-flow. The general algorithm for the CFG recovery is to perform a linear disassembly of the unknown regions in the search space creating basic blocks. Then connect the basic blocks into groupings and group the basic blocks based on intraprocedural control-flow constructs. The basic block groupings become candidate functions which undergo some additional analysis to find the possible entry point. For more details about CFG recovery please reference the Nucleus paper. Once we have possible entry points, we hand them off to the recursive descent algorithm to perform the heavy lifting.

TAIL CALL ANALYSIS

Compilers employ tail call optimizations to reduce the overhead of adjusting the stack before and after the call to a function. This technique is typically applied when the last thing a function does before it returns is call another function. This optimization often appears as a jump to the target function. The caller stack frame is available to the target function, which may or may not require a stack frame.

Recursive traversal by itself does not detect tail call functions since a jump is simply a direct control flow that is followed by the algorithm. Note that the techniques used in the linear sweep passes described thus far attempt to find functions in areas of the binary that are not already indicated as part of a basic block or data variable.

Tail Call Before AnalysisTail call before analysis – Figure 3

For Binary Ninja, we add a tail call analysis pass as the final stage of our function detection. This pass requires iterating over existing functions. Generally, if we detect a basic block that jumps before or after the current function boundary, then it is likely a tail call function. This works well for some cases, however there are other times where the tail call function is part of the current function under analysis, as shown in Figure 3. For these situations we use our static dataflow capability to query the stack frame offset at the jump site as well as in the candidate tail call function. This allows Binary Ninja to determine if the stack offset is zero and whether a particular jump site may in fact be a tail call. Figure 4 illustrates the result of Binary Ninja’s tail-call analysis, the correct identification of a tail-called function.

Tail Call After AnalysisTail call after analysis – Figure 4

Evaluation

We currently evaluate and refine our function detection capability on three different data sets. These include the ByteWeight data set, a subset of binaries from the Cyber Grand Challenge (CGC) corpus of binaries, as well as our own BusyBox binaries that are cross-compiled for various architectures. For most of our results we compare Binary Ninja with Nucleus only. The reason for this is that Nucleus is readily available, and easy to set up and generate results for comparison.

The results presented below were generated with Binary Ninja 1.1.988-dev, IDA Pro Version 7.0.170914, and Nucleus (c98b48c). ByteWeight results are reposted from the original work.

GROUND TRUTH DATA

Unless otherwise noted, in each data set for all of the non-stripped ELF binaries, we generate ground truth data for function starts by parsing output from the readelf binary utility. We also exclude PLT entries or thunks as was done in prior research.

Reference the Sensitivity and Specificity article regarding statistical measures calculated in the tables below. Note the following definitions:

  • True Positives (TP): identified function starts that match ground truth data
  • False Positives (FP): identified function starts that do not match ground truth data
  • False Negatives (FN): function starts in ground truth data that are not detected

BYTEWEIGHT DATA SET

This data set consists of 2200 binaries that target both x86-32 and x86-64 architectures. 2064 of the binaries are compiled with gcc or icc across various optimization levels. The remaining 136 binaries are PEs compiled with msvc. Note that the ground truth data for the pe-x86/pe-x86-64 binaries are obtained from the original Byteweight data set. It should be noted that Nucleus never presented results for the icc compiler. We hypothesize the algorithm performance is degraded on icc due to the use of heuristics for resolving indirect control flow.

Original ByteWeight Data Set (gcc/icc/msvc)

Binary Ninja Byteweight Nucleus
Precision 0.9734 0.9730 0.8130
Recall 0.9632 0.9744 0.9347
F1 0.9683 0.9737 0.8696

We also include a modified ByteWeight data set that swaps out the icc compiled binaries with ones compiled with clang/llvm. (Thanks to Dennis for providing the clang binaries.)

Modified ByteWeight Data Set (gcc/clang/msvc)

Binary Ninja Nucleus
Precision 0.9702 0.9704
Recall 0.9753 0.9461
F1 0.9727 0.9581

DARPA CHALLENGE BINARIES

This data set consists of a subset of the DARPA challenge binaries created for the Cyber Grand Challenge (CGC) and adapted by Trail Of Bits. We have plans to expand this data set to multiple compilers but for now it includes 283 binaries compiled with clang/llvm for the x86-32 architecture.

DARPA Challenge Binaries (clang/llvm)

Binary Ninja Nucleus
Precision 0.9996 0.9985
Recall 0.9994 0.9815
F1 0.9995 0.9899

BUSYBOX MULTIARCH BINARIES

This data set consists of 16 BusyBox binaries where each is cross-compiled for a different architecture. All binaries are compiled with gcc using the default BusyBox settings. One set has optimizations enabled and another has them disabled.

Busybox MultiArch Binaries (gcc)

Binary Ninja IDA Pro v7.0 Nucleus
Precision Recall F1 Precision Recall F1 Precision Recall F1
busybox_aarch64_opt 0.9958 0.9551 0.9750 0.8461 0.8355 0.8408 0.9857 0.9101 0.9464
busybox_aarch64 0.9980 0.9989 0.9984 0.9001 0.9600 0.9291 0.8397 0.9861 0.9071
busybox_arm-thumb_opt 0.8791 0.9411 0.9091 0.8569 0.8818 0.8692 0.0121 0.0453 0.0190
busybox_arm-thumb 0.9258 0.9727 0.9487 0.8985 0.9223 0.9103 0.0081 0.0314 0.0128
busybox_arm_opt 0.9976 0.9890 0.9933 0.8499 0.8580 0.8539 0.1973 0.2142 0.2054
busybox_arm 0.9964 0.9961 0.9963 0.8993 0.9351 0.9169 0.2722 0.2729 0.2725
busybox_i386_opt 1.0000 0.9944 0.9972 0.8638 0.9598 0.9093 1.0000 0.9639 0.9816
busybox_i386 0.9975 0.9986 0.9980 0.9035 0.9822 0.9412 0.9961 0.9886 0.9923
busybox_mips32_opt 0.9812 0.9380 0.9591 0.7448 0.9104 0.8194 0.9451 0.9156 0.9301
busybox_mips32 0.9886 0.9906 0.9896 0.8232 0.9583 0.8857 0.9861 0.8478 0.9118
busybox_mipsel32_opt 0.9788 0.9382 0.9580 0.7440 0.9122 0.8196
busybox_mipsel32 0.9886 0.9905 0.9896 0.8207 0.9621 0.8858
busybox_powerpc_opt 0.8449 0.9416 0.8906 0.7610 0.9486 0.8445 0.8144 0.7777 0.7957
busybox_powerpc 0.9019 0.9928 0.9452 0.8245 0.9833 0.8970 0.7635 0.9872 0.8611
busybox_x86-64_opt 1.0000 0.9924 0.9962 0.8613 0.9551 0.9058 1.0000 0.9555 0.9773
busybox_x86-64 0.9986 0.9994 0.9990 0.8982 0.9629 0.9294 0.9943 0.9898 0.9920
Overall 0.9662 0.9795 0.9728 0.8454 0.9373 0.8890 0.4932 0.6183 0.5487

Mission Accomplished

We have a few architectures where performance lags. However, these are typically due to incomplete lifting and/or other architectural differences (e.g. delay slots, multiple architectures in a binary). We are very pleased with the results, but we are not finished. Stay tuned.

Raw Results

We think it’s important to be transparent and support reproducibility of our work so we’ve published the binaries used in testing.

Acknowledgements

Thanks to the following whose shoulders we stand on and whose contributions helped this work:

 

Post originally appeared at:

https://binary.ninja/2017/11/06/architecture-agnostic-function-detection-in-binaries.html