The French Hack

jeudi 17 août 2023

Address similarity detection using bags of words using python

Principles

Bags of word allow the comparison of text by splitting changing text into a vector. Then, the distance between each can be computed and we can measure the proximity.We would connect the close elements using some graph to have a formal relation between the elements and let understand thay are similar.


 Loading the dataset
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import euclidean_distances
import numpy as np

## I use a corpus that is not address in order not tu divulgate the dataset I used

corpus=pd.read_csv("addresses.csv")

Preprocessing data

Before we convert it into a bags of words, we need to perform some transformation. Typically, our dataset contains some address in various countries and the country name can differ depending on the language of the emitter country.


  
import re
#Read a file were countries are mappend to their english translation
countries=pd.read_csv("countries.csv",index_col=0)

# Select the countries when there is a need to substitute
replacements=countries[countries.originalcountry!=countries.tobename]
replacements.originalcountry="\\s"+replacements.originalcountry+"[\\s$]"
replacements.tobename=" "+replacements.tobename

replacements=replacements.set_index('originalcountry')
replacements.tobename=replacements.tobename.replace("\s","",regex=True)
# The replacement functions needs a disctionnary. Hence, we get that représentation from the dataframe.
replacements=replacements.tobename.to_dict()
# Replacing the society abreviations
replacements.update({"s.a.r.l":"sarl"})
replacements.update({"s.a.u":"sau"})
replacements.update({"s.a.s":"sas"})
replacements.update({"s.a":"sa"})
replacements.update({"limited":"ltd"})


#iterate(countriestoreplace)
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in replacements.items()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))

Converting into bags of words



 # The vectorizer convert the sentences into a bags of word representation
vectorizer = CountVectorizer(strip_accents="ascii")
features = vectorizer.fit_transform(corpus).todense()

Computing the distance



from sklearn.metrics.pairwise import pairwise_distances

distances=pairwise_distances(features,metric='cosine')
np.save("cosinedistances",distances)
print("Sauvegarde terminée")

Identifying the synonyms

Consideration on performances

lundi 10 octobre 2022

Entering Kubernetes avec Microks

Site de référence

https://microk8s.io/docs/getting-started

Installer

sudo snap install microk8s --classic

Lister les servvces

 microk8s kubectl get services
 Lister les pods

microk8s kubectl get pods

`Pousser une image docker vers un repo`

microk8s enable registry

docker tag d17287c3708f localhost:32000/jupyter:1

docker push localhost:32000/jupyter:1

Créer un deploiement

 microk8s kubectl create deployment jupyter --image=localhost:32000/jupyter:1

Deployer

microk8s kubectl scale deployment jupyter 1

Lister les images (et leur taille)

 microk8s ctr images ls

Proxy dashboard (accessible en remote)

 microk8s dashboard-proxy

 microk8s kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443

Delete a pod

 microk8s kubectl delete  pods <podname>

`Appliquer un changement`

microk8s kubectl apply -f volume.yml

Créer un volume persistant

apiVersion: v1
kind: PersistentVolume
metadata:
  name: streamlit-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/home/clement/appstreamlit"

vendredi 25 septembre 2020

Mirror Ubuntu mirror in corporate Artifactory

Configuring Artifactory using remote repository, selecting the generic

Provide the url of the repository you wan to mirror, select the type generic.

In case you are behind a proxy you may select it.

Go on the Linux machine that you want to use this mirror and edit file /etc/apt/sources.list

Change the default location from archive.ubuntu.com to your internal mirror.

That's it

mardi 12 mai 2020

Analysing disctrete correlation

survey <- cleme="" esktop="" read.csv="" sers="" stringsasfactors="FALSE)<br" survey-covid.csv="">surveyFinalized<-survey br="" esponse="" finalized="" survey="" tatus="=">
names(surveyFinalized)[names(surveyFinalized) == "What.is.your.home.situation..you.are.in.living...."] <- br="" latorhouse="">names(surveyFinalized)[names(surveyFinalized) == "What.is.your.family.situation."] <- br="" situation="">names(surveyFinalized)[names(surveyFinalized) == "How.worried.are.you.about.the.impact.of.coronavirus.on.you.personally."] <- anxiety="" br="">
View(surveyFinalized)
surveyFinalized$situation[grepl( "without child", surveyFinalized$situation,fixed = TRUE)]<- br="" lone="">surveyFinalized$situation[grepl( "with child", surveyFinalized$situation,fixed = TRUE)]<- amily="" br="">
surveyFinalized$FlatOrHouse[grepl( "ppart", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "Apart", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "flat", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "city", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "country", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ouse="">surveyFinalized$FlatOrHouse[grepl( "house", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ouse="">

library("ggplot2")

surveyFinalized$anxiety<-factor all="" anxiety="" at="" br="" ery="" levels="c(" omewhat="" ot="" so="" surveyfinalized="" worried="" xtremely="">relevel(surveyFinalized$anxiety,"Very worried")

(d <- aes="" anxiety="" br="" depending="" eneral="" ggplot="" ggtitle="" on="" situation="" surveyfinalized=""> geom_jitter())

(d <- aes="" anxiety="" br="" depending="" eneral="" ggplot="" ggtitle="" latorhouse="" on="" setup="" surveyfinalized=""> geom_jitter())

vendredi 27 mars 2020

Docker campaign

My notes to startup docker:

Configure your internal docker repository

Find your internal Trusted Repository
dtr.mycompany.com

In your : /etc/docker/daemon.json file

Make sure that you trust your repository:
{
"insecure-registries" : ["dtr.mycompany.com"]
}

curl -k https://dtr.mycompany.com/ca -o /usr/local/share/ca-certificates/dtr.mycompany.com.crt

update-ca-certificates

service docker restart

Allow the user (not root to use docker)

usermod -aG docker myuser
su - myuser
newgrp

Create a docker file

Create the docker file

FROM dtr.mycompany.com/mycompany/httpd:latest
LABEL maintainer="clement_soullard@infosys.com"
EXPOSE 80
COPY dist/ui /usr/local/apache2/htdocs/
COPY docker/certs/server.crt /usr/local/apache2/conf/
COPY docker/certs/server.key /usr/local/apache2/conf/
COPY docker/httpd.conf /usr/local/apache2/conf/httpd.conf
RUN sed -i \
-e 's/^#$Include .*httpd-ssl.conf$/\1/' \
-e 's/^#$LoadModule .*mod_ssl.so$/\1/' \
-e 's/^#$LoadModule .*mod_socache_shmcb.so$/\1/' \
conf/httpd.conf

Useful commands

Stopping a container

docker stop specworks-ui

Removing a container

docker rm specworks-ui

docker image build -t specworks-ui:1.0 .
docker container run --publish 8000:80 --detach --name specworks-ui specworks-ui:1.0
docker exec -it specworks-ui bash

docker network create -d bridge specworks-net

cp commands works recursively

docker cp docker/httpd.conf specworks-ui:/usr/local/apache2/conf/httpd.conf

Clean up the docker cache

docker system prune -a -f

Download image from proxy

mkdir /etc/systemd/system/docker.service.d

Now create a file called /etc/systemd/system/docker.service.d/http-proxy.conf that adds the HTTP_PROXY environment variable:

[Service]
Environment="HTTP_PROXY=http://proxy.example.com:80/"

If you have internal Docker registries that you need to contact without proxying you can specify them via the NO_PROXY environment variable:

Environment="HTTP_PROXY=http://proxy.example.com:80/"
Environment="NO_PROXY=localhost,127.0.0.0/8,docker-registry.somecorporation.com"

Flush changes:

$ sudo systemctl daemon-reload

Verify that the configuration has been loaded:

$ sudo systemctl show --property Environment docker
Environment=HTTP_PROXY=http://proxy.example.com:80/

Restart Docker:

$ sudo systemctl restart docker

jeudi 30 janvier 2020

Managing entries/password in LDAP from command line

First you have to install the utilities of openLDAP to connect to your LDAP. LDAP is a protocol, it means that you can use the openldap library to connect to other ldap like apache ds for instance:

yum install openldap-clients.x86_64

You can change the password using the admin bind account

ldappasswd -H ldap://:10399 -x -D "uid=admin,ou=system" -W -S "uid=bhashya_avula,ou=Persons,dc=mycompany,dc=com"

Or you can use the user account to change the password

ldappasswd -H ldap://:10399 -x -D "uid=xcvvsd,ou=Persons,dc=,dc=com" -W -A -S

If the account is locked, you can unlock it using apache ds. First, make sure that your connection use display the operational information by checking the below box.

ldapsearch -H ldap://localhost:10389 -b "uid=admin,ou=system" -wXXXXX -s sub "(cn=Soullard)" +

Then remove the pwdAccountLockedTime attribute